CN110083168A - Small-sized depopulated helicopter based on enhancing study determines high control method - Google Patents
Small-sized depopulated helicopter based on enhancing study determines high control method Download PDFInfo
- Publication number
- CN110083168A CN110083168A CN201910369215.3A CN201910369215A CN110083168A CN 110083168 A CN110083168 A CN 110083168A CN 201910369215 A CN201910369215 A CN 201910369215A CN 110083168 A CN110083168 A CN 110083168A
- Authority
- CN
- China
- Prior art keywords
- control
- state
- formula
- helicopter
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/0088—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/04—Control of altitude or depth
- G05D1/042—Control of altitude or depth specially adapted for aircraft
- G05D1/046—Control of altitude or depth specially adapted for aircraft to counteract a perturbation, e.g. gust of wind
Abstract
The present invention relates to indoor small unmanned helicopter smart altitude control methods, and to propose a kind of continuous control method based on offline enhancing study, realization small-sized depopulated helicopter is in the case where there is external disturbance, the height-lock control in localizing environment indoors.Thus, the technical solution adopted by the present invention is that, small-sized depopulated helicopter based on enhancing study determines high control method, helicopter model and Reward Program based on Markov sequence are constructed first, the controller parameter optimized again using the method training iteration of stochastic approximation, is finally brought the controller after training into true Helicopter System and implements control.Present invention is mainly applied to indoor small unmanned aerial vehicle design occasions that manufactures.
Description
Technical field
The present invention relates to a kind of intelligent control method of indoor small unmanned helicopter height control, specially one kind is based on
The enhancing learning control method of offline flying quality.
Background technique
When small-sized depopulated helicopter is executed such as hovering, low-speed forward flight and cruising flight, needs to have and stablize itself
The ability of height, fixed high control are also the optimized integration of helicopter stabilized flight and other many complex controls.But height is controlled
The design of device processed is also faced with the problem of many sternnesses.First, the height holding of aircraft cannot be by the control of pitch angle come complete
At target.When aircraft on vertical passage by constant value disturbance torque when, flying speed vector can be gradually deviated from former direction, therefore
Height can be induced to generate drift.Second, in the dynamic process that pitch angle tends towards stability, when the variable quantity at ship trajectory inclination angle
When average value is not zero, it can also cause the change of flying height.Third, helicopter is in the case where determining high state, due to main rotor itself
Biggish air stream effect can be generated, biggish drift can be generated to the stable state of fuselage and interfered.4th, ground effect,
The factors such as aircraft electricity can also make helicopter change in short transverse.5th, it is analyzed from control operability, although can
By control elevator or the size of motor power is controlled to control flying height, but controls flight by means of controlling thrust
Height inertia is very big, and reaction is slow.
Therefore, based on the observer and controller for determining modelling, there is larger with real system in actually control
Deviation, control effect is bad.State the performance characteristics of system and by way of directly acquiring and analyzing flying quality with this
Basic engineering controller is a relatively effective thinking, and data-driven is a kind of method taken extensively.Hou Zhongsheng etc.
People is made that detailed theoretical explanation and experimental verification (periodical: automation journal to the method;Author: Xu Jianxin, Hou Zhongsheng;
It publishes days: 2009;Article title: data-driven system method is summarized;The page number: 668-675).Although this method can be realized
The stability contorting of system, but there is no restrict for the speed and precision of error convergence, control effect is more general.And it is same
Sample is the intensified learning policy control method based on system data by the Reward Program by observing system, so that system
Stability contorting is realized along the mode of total optimization, can obtain more accurate control effect.
In recent years, enhancing learning control techniques have obtained certain application in unmanned aerial vehicle (UAV) control field.Wu Enda et al. is adopted
Collect the posture and the offline flying quality of control amount of helicopter, designs offline enhancing learning control method.And it is advised using local linear
The method of drawing constructs the Markov model of system, and the alternative manner declined by gradient finally converges to height controller
Optimization control parameter (periodical: Communications of the Acm;Author: Coates A, Abbeel P, Ng A Y;Out
Version days: 2009;Article title: Apprenticeship learning for helicopter control;The page number: 97-
105).This method does not need the prior information of known models, is a kind of ideal for complicated small-sized depopulated helicopter system
Control method selection.
Summary of the invention
In order to overcome the deficiencies of the prior art, the present invention is directed to propose a kind of continuous control side based on offline enhancing study
Method, realization small-sized depopulated helicopter is in the case where there is external disturbance, the height-lock control in localizing environment indoors.For this purpose, this
Invention adopts the technical scheme that the small-sized depopulated helicopter based on enhancing study determines high control method, and building first is based on horse
The helicopter model and Reward Program of Er Kefu sequence, then the controller for using the method training iteration of stochastic approximation to be optimized
Parameter finally brings the controller after training into true Helicopter System and implements control.
Specific step is as follows:
Step 1) defines Markov (Markov) decision process:
Markoff process is expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R), wherein S indicate environment is all can
The state set of energy;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates shape
State transfer function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, arrive
Up to the probability distribution of subsequent time state s (t+1);γ is discount factor, and range is between 0 to 1;R is that corresponding actions is taken to obtain
The Reward Program obtained, the P in the case where having known each statesaUnder the conditions of the environmental model of () and R, the method for applied dynamic programming is acquired
Optimal policy;
Step 2) data processing and modeling:
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, rz(t),Successively indicate height, the speed of vertical direction and plus
Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether
Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode
Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in data
MatrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, ns
The dimension of expression state, nuThe dimension for indicating control amount, adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system is defeated
Data out are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state at next moment
S (t+1) defines weight matrix W ∈ R according to definitionm×mFor diagonal matrix, most common gaussian kernel function is used here, and is
One diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u
(t)T];Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data, at above-mentioned place
After the completion of reason, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula of foundation
(4):
V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process
Factor enables v=0;
The continuous Robust Control Law design of step 3):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t)
Such as formula (5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed
And acceleration;
The design for carrying out control iterative algorithm after Reward Program has been determined, has been controlled using reference input and current state
The value of u (t) is measured, i.e. u (t)=π (s (t), ω) realizes expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that in a period of time
Total Return value RtotalMaximum value is obtained, control amount expression formula weight at this time is desired optimal weights vector
ωbest, strategy of the stochastic approximation strategy as parameter iteration is used here.
Stochastic approximation strategy comprises the concrete steps that, first init state, movement, control weight initial value, according to off-line data
Obtained LWLR model obtains the state and Reward Program value at this time of subsequent time, according to subsequent time state and controls weight
The movement of subsequent time is obtained, the process is constantly recycled, records the return value at each state moment, which is returned
The summation of report value.And update is iterated to control weight accordingly, it constantly looks for until finding minimum Reward Program and corresponding control
Weight processed, as training gained.
The features of the present invention and beneficial effect are:
1. the present invention using Helicopter System generate in offline flying quality, by designing based on Markov sequence
Mathematical model realizes the intensified learning more new algorithm of control amount weight.It is not true Helicopter Dynamic Model has been effectively compensated for
Qualitative and influence of the external disturbance to control performance, while control precision is improved compared to data-driven method.
2. the present invention devises the auto-flare system system based on micro indoor unmanned helicopter, flight is preferably realized
The coordination of control performance, flight function and aircraft load limit.Intensified learning height control algolithm is applied in this system simultaneously
In.
3. the stochastic approximation algorithm used when the perfect control amount weight iteration of the present invention is realized, so that Reward Program is being instructed
It avoids falling into as gradient descent method in local optimum region in white silk, can be realized and find optimal solution in a wider context,
Weight in optimization process can constantly be examined again simultaneously, find and be more suitable the suboptimum weight of Helicopter System.
Detailed description of the invention:
Fig. 1 is that control amount weight of the present invention updates algorithm logic block diagram;
Fig. 2 is experiment porch of the present invention;
Fig. 3 is the height error curve graph under simulated environment under unmanned helicopter PID control;
Fig. 4 is the height error curve graph under simulated environment under the control of unmanned helicopter intensified learning;
Fig. 5 is the altitude curve figure of the lower fixed high experiment unmanned helicopter of enhancing study control;
Fig. 6 is total screw pitch control amount curve graph of the lower fixed high experiment unmanned helicopter of enhancing study control.
Specific embodiment
In existing enhancing study control case, controlled device is mostly medium-sized helicopter, and is mostly outdoor control.This
Invention is intended to provide a kind of continuous control method based on offline enhancing study, realizes that small-sized depopulated helicopter is having external disturbance
In the case where, the height-lock control in localizing environment indoors.The technical solution adopted by the present invention is that building is based on Ma Erke first
The helicopter model and Reward Program of husband's sequence, then the controller ginseng for using the method training iteration of stochastic approximation to be optimized
Number, finally brings the controller after training into true Helicopter System and verifies.The following steps are included:
Step 1) defines Markov (Markov) decision process:
Markov approach extensive application, main concept in many Sequence Decision problems have decision moment, system
State, behavior, return value and transition probability.It is specifically exactly to choose an action under a certain state of system to generate a report
It fulfills, and determines the state at next moment by transition probability.Policymaker needs the plan for selecting to optimize by certain mode
Slightly.The process can be expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R) wherein S indicates all possible shape of environment to
State set;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates state transfer
Function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, reaches next
The probability distribution of moment state s (t+1), wherein t represents current time;γ is discount factor, and range is between 0 to 1;R is to adopt
The Reward Program for taking corresponding actions to obtain.It is general to be that current state is shifted to NextState for the essence of Markov decision process in fact
Rate and return value are solely dependent upon current state and i.e. actions to be taken, and unrelated with historic state and movement.Therefore it is obtaining
P under each state of cicadasaUnder the conditions of the environmental model of () and R, the method for capableing of applied dynamic programming acquires optimal policy.
Step 2) micro helicopter flies control hardware design:
Experimental duties and environment condition analysis design requirement are primarily based on, mainly there is the following.1. due to unmanned plane control
System real-time control with higher processed requires, therefore main control chip needs to have higher system dominant frequency.2. due to needing
Carry out posture and position control, it is therefore desirable to using the Position and attitude sensor having compared with high measurement accuracy.3. since helicopter needs
Own location information is obtained under the indoor environment based on optitrack positioning capturing system, it is therefore desirable to carry data transmission
Zigbee in module.4. due to needing to collect the online posture information of helicopter as Offline training data, it is therefore desirable to carry number
Transmission module ground station is transmitted.5. in order to save Air Diary data, design on piece eeprom chip carries out data
Storage.6. considering the weight bearing requirement and cruising ability of the micro helicopter system when design, selection volume and weight as far as possible is small
Chip and design corresponding winged control plate.
Consider the above demand, final master control borad designed size is 35mm × 35mm, using the Freescale K60 of 100 pins
Chip, dominant frequency 100MHz;Using single MPU6500 chip as Position and attitude sensor, directly connected by SPI mode with main control chip
It connects, the precision and frequency of the chip meet actual demand, small in size;The zigbee module of selection embedded antenna is communicated, greatly
Weight bearing is reduced greatly.EEPROM module selects AT24C256, possesses 256K memory;Voltage stabilizing chip uses two XC6206P332 cores
Piece generates 3.3V voltage, powers respectively to receiver and other sensors.In addition to this, the external serial ports of UART there are two design, one
A I2C external tapping is ready for use on connection barometer.5 external 5v contact pins are finally designed to be used for 3 steering engines and 2 motor input controls
Signal processed.The weight of complete machine is 94g, can satisfy the weight bearing requirement that helicopter takes off.
Step 3) data processing and modeling:
Due to the non-linear relation of helicopter lift and main rotor speed, blade vibration and surrounding flow that main rotor generates
The series of factors such as interference, so that altitude channel has bigger fluctuation compared to horizontal attitude channel.This makes dynamic based on height
The controller design of mechanical model and the no small error of physical presence.On the other hand, although it is highly logical in practical implementation
Road can reach preferable control precision using PID controller, but it is necessary to have experiences abundant to change for tune ginseng process complexity
Parameter area and trend cause the development cycle slow.Using the optimisation strategy of offline intensified learning, above-mentioned ask can be preferably solved
Topic.
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, rz(t),Successively indicate height, the speed of vertical direction and plus
Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether
Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode
Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in matrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, nsIt indicates
The dimension of state, nuThe dimension for indicating control amount adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system output
Data are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state at next moment
S (t+1) defines weight matrix W ∈ R according to definitionm×mFor diagonal matrix, most common gaussian kernel function is used here, and is
One diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u
(t)T];Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data, at above-mentioned place
After the completion of reason, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula of foundation
(4):
V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process
Factor, such as ambient wind are disturbed.It needs to match a stochastic variable for each stator channel in principle, which passes through
The method of maximal possibility estimation acquires corresponding parameter.But it is found in actual experiment, the system effect after stochastic variable is added
There is no promotions, or even not as good as the system model that random signal is not added.Analysis reason may be in the collection process of data
There are deviations, if adding stochastic variable, can further increase offset, the discreet value beyond system, therefore enable v=here
0。
The continuous Robust Control Law design of step 4):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t)
Such as formula (5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed
And acceleration;
The design for carrying out control iterative algorithm after Reward Program has been determined, has been controlled using reference input and current state
The value of u (t) is measured, i.e. u (t)=π (s (t), ω) realizes expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that Total Return Rtotal
Maximum value is obtained, even if the desired optimal weights vector ω of control amount expression formula weight at this timebest.The plan of parameter iteration
There are two types of slightly general, stochastic gradient descent and stochastic approximation strategy, the former fast speed, but it is easily trapped into local optimum;Afterwards
There are certain uncertainties in practical operation by person, but are easier to break through the weight that local optimum searches out better effect
Parameter, therefore second of parameter iteration strategy is used herein.Specifically, first at the beginning of init state, movement, control weight
Value, the LWLR model obtained according to off-line data obtain the state and Reward Program value at this time of subsequent time.According to lower a period of time
Quarter state and control restore the movement of subsequent time, constantly recycle the process, record the return value at each state moment,
The return value summation that the sequence is generated.And update is iterated to control weight accordingly, it constantly looks for until finding minimum return
Report function and corresponding control weight.As training gained.Detailed renewal process such as Fig. 1.
The present invention is intended to provide a kind of control method based on offline enhancing study, realization have in small-sized no helicopter
In the case where system parameter uncertainty and external disturbance, fixed high stabilized flight is realized.Below with reference to experimental verification example pair
The invention is described in detail.
Markov sequence is based on the technical solution adopted by the present invention is that establishing by the history flying quality for acquiring helicopter
The mathematics nonlinear model of column then designs the expression formula of Reward Program and control amount, using stochastic approximation iterative strategy to control
The weight of amount processed is updated, and achievees the purpose that optimal control effect.The following steps are included:
Step 1) defines Markov (Markov) decision process:
Markov approach extensive application, main concept in many Sequence Decision problems have decision moment, system
State, behavior, return value and transition probability.It is specifically exactly to choose an action under a certain state of system to generate a report
It fulfills, and determines the state at next moment by transition probability.Policymaker needs the plan for selecting to optimize by certain mode
Slightly.The process can be expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R) wherein S indicates all possible shape of environment to
State set;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates state transfer
Function means for each moment state s (t) ∈ S, acts a (t) ∈ A, take the movement in this state, reaches next
The probability distribution of moment state s (t+1), wherein t represents current time;γ is discount factor, and range is between 0 to 1;R is to adopt
The Reward Program for taking corresponding actions to obtain.It is general to be that current state is shifted to NextState for the essence of Markov decision process in fact
Rate and return value are solely dependent upon current state and i.e. actions to be taken, and unrelated with historic state and movement.Therefore it is obtaining
P under each state of cicadasaUnder the conditions of the environmental model of () and R, the method for capableing of applied dynamic programming acquires optimal policy.
Step 2) micro helicopter flies control hardware design:
Experimental duties and environment condition analysis design requirement are primarily based on, mainly there is the following.1. due to unmanned plane control
System real-time control with higher processed requires, therefore main control chip needs to have higher system dominant frequency.2. due to needing
Carry out posture and position control, it is therefore desirable to using the Position and attitude sensor having compared with high measurement accuracy.3. since helicopter needs
Own location information is obtained under the indoor environment based on optitrack positioning capturing system, it is therefore desirable to carry data transmission
Zigbee in module.4. due to needing to collect the online posture information of helicopter as Offline training data, it is therefore desirable to carry number
Transmission module ground station is transmitted.5. in order to save Air Diary data, design on piece eeprom chip carries out data
Storage.6. considering the weight bearing requirement and cruising ability of the micro helicopter system when design, selection volume and weight as far as possible is small
Chip and design corresponding winged control plate.
Consider the above demand, final master control borad designed size is 35mm × 35mm, using the Freescale K60 of 100 pins
Chip, dominant frequency 100MHz;Using single MPU6500 chip as Position and attitude sensor, directly connected by SPI mode with main control chip
It connects, the precision and frequency of the chip meet actual demand, small in size;The zigbee module of selection embedded antenna is communicated, greatly
Weight bearing is reduced greatly.EEPROM module selects AT24C256, possesses 256K memory;Voltage stabilizing chip uses two XC6206P332 cores
Piece generates 3.3V voltage, powers respectively to receiver and other sensors.In addition to this, the external serial ports of UART there are two design, one
A I2C external tapping is ready for use on connection barometer.5 external 5v contact pins are finally designed to be used for 3 steering engines and 2 motor input controls
Signal processed.The weight of complete machine is 94g, can satisfy the weight bearing requirement that helicopter takes off.
Step 3) data processing and modeling:
Due to the non-linear relation of helicopter lift and main rotor speed, blade vibration and surrounding flow that main rotor generates
The series of factors such as interference, so that altitude channel has bigger fluctuation compared to horizontal attitude channel.This makes dynamic based on height
The controller design of mechanical model and the no small error of physical presence.On the other hand, although it is highly logical in practical implementation
Road can reach preferable control precision using PID controller, but it is necessary to have experiences abundant to change for tune ginseng process complexity
Parameter area and trend cause the development cycle slow.Using the optimisation strategy of offline intensified learning, above-mentioned ask can be preferably solved
Topic.
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system, rz(t),Successively indicate height, the speed of vertical direction and plus
Speed, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups altogether
Each two minutes flying qualities of Desired Height, the control including three variables and corresponding steering engine vertical direction in system mode
Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in matrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, nsIt indicates
The dimension of state, nuThe dimension for indicating control amount adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system output
Data are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state at next moment
S (t+1) defines weight matrix W ∈ R according to definitionm×mFor diagonal matrix, most common gaussian kernel function is used here, and is
One diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u
(t)T];Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data, at above-mentioned place
After the completion of reason, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula of foundation
(4):
V in formula is the noise bias compensation rate between real system, is exactly other not accounted in modeling process
Factor, such as ambient wind are disturbed.It needs to match a stochastic variable for each stator channel in principle, which passes through
The method of maximal possibility estimation acquires corresponding parameter.But it is found in actual experiment, the system effect after stochastic variable is added
There is no promotions, or even not as good as the system model that random signal is not added.Analysis reason may be in the collection process of data
There are deviations, if adding stochastic variable, can further increase offset, the discreet value beyond system, therefore enable v=here
0。
The continuous Robust Control Law design of step 4):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t)
Such as formula (5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed
And acceleration;
The design for carrying out control iterative algorithm after Reward Program has been determined, has been controlled using reference input and current state
The value of u (t) is measured, i.e. u (t)=π (s (t), ω) realizes expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that Total Return Rtotal
Maximum value is obtained, even if the desired optimal weights vector ω of control amount expression formula weight at this timebest.The plan of parameter iteration
There are two types of slightly general, stochastic gradient descent and stochastic approximation strategy, the former fast speed, but it is easily trapped into local optimum;Afterwards
There are certain uncertainties in practical operation by person, but are easier to break through the weight that local optimum searches out better effect
Parameter, therefore second of parameter iteration strategy is used herein.Specifically, first at the beginning of init state, movement, control weight
Value, the LWLR model obtained according to off-line data obtain the state and Reward Program value at this time of subsequent time.According to lower a period of time
Quarter state and control restore the movement of subsequent time, constantly recycle the process, record the return value at each state moment,
The return value summation that the sequence is generated.And update is iterated to control weight accordingly, it constantly looks for until finding minimum return
Report function and corresponding control weight.As training gained.Detailed renewal process such as Fig. 1.
Lower part is specific experiment porch introduction and experimental procedure.
One, experiment porch brief introduction
The present invention is carried out using flight experiment platform in the full freedom degree room of small-sized depopulated helicopter of this research group design
Real-time height-lock control experimental verification.Experiment porch is as shown in Fig. 2, it is controlled pair that the experiment porch, which opens up 150x helicopter with Asia,
As airborne winged control plate is using Freescale k60 chip as main control chip, frequency 100MHz.Position and attitude sensor uses MPU6500 core
Piece, the roll angle and pitch angle error range of measurement are ± 0.2 °, and the error range of yaw angle is ± 0.5 °.Meanwhile equipped with
The zigbee chip of embedded antenna is used to the optitrack location information that satellite receiver is sent, and number passes chip to the ground
It stands and sends fused posture information.
Two, flight experiment is verified
In order to verify the validity and practicability of controller in the present invention, the exploitation of this study group autonomous Design is utilized herein
Unmanned helicopter posture flight experiment platform has carried out the emulation and real-time experimental verification of fixed high control.In simulations, from Fig. 3
It can be seen that there are overshoot during PID control, stablizing the time is about 10 seconds.See in Fig. 4, intensified learning controller energy
Error, is directly reduced to the range of a very little, not the generation of overshoot, and after 5 seconds by enough modes based on optimal control
Stable state is had reached, control speed is faster.In the experimental verification of intensified learning control, see from Fig. 5, Desired Height
In the case where 120cm, height controls error substantially within 15cm.See from Fig. 6, corresponding control amount stabilization will not generate
Fall in high range.In conclusion height error and control input in good reasonable range, to demonstrate this paper base
In the reasonability of the controller of enhancing study.
Claims (3)
1. a kind of small-sized depopulated helicopter based on enhancing study determines high control method, characterized in that building is based on Ma Er first
The helicopter model and Reward Program of section's husband's sequence, then the controller ginseng for using the method training iteration of stochastic approximation to be optimized
Number finally brings the controller after training into true Helicopter System and implements control.
2. the small-sized depopulated helicopter as described in claim 1 based on enhancing study determines high control method, characterized in that specific
Steps are as follows:
Step 1) defines Markov (Markov) decision process:
Markoff process is expressed as hexa-atomic group of (S, J, A, { a Psa() }, γ, R), wherein S indicates that environment is all possible
State set;Objective function when J is decision;A indicates the set of actions in motion space;Psa: S × A → S' indicates that state turns
Function is moved, is meant for each moment state s (t) ∈ S, is acted a (t) ∈ A, the movement is taken in this state, under arrival
The probability distribution of one moment state s (t+1);γ is discount factor, and range is between 0 to 1;R is that corresponding actions is taken to obtain
Reward Program, the P in the case where having known each statesaUnder the conditions of the environmental model of () and R, the method for applied dynamic programming acquires optimal
Strategy;
Step 2) data processing and modeling:
In the height controller based on intensified learning, regards helicopter control as Markov random sequence, selectFor the state of system,Successively indicate height, the speed and acceleration of vertical direction
Degree, variable z represent short transverse.The acquisition of off-line data is obtained by the fixed high control process of simulation manually, acquires three groups of phases altogether
Hope each two minutes flying qualities of height, the control including three variables and corresponding steering engine vertical direction in system mode
Amount;After the completion of acquisition, data handling procedure is carried out, the corresponding data of current state s (t) and control amount u (t) is to being stored in data
MatrixIn, wherein t represents the current time in discrete system, and m indicates the data sample quantity of training, ns
The dimension of expression state, nuThe dimension for indicating control amount, adds the coefficient of 1 intercept being intended to indicate that in fitting formula, system is defeated
Data out are that the state s (t+1) of subsequent time is stored in matrixIn, specific expression formula such as formula (1), (2):
Local weighted linear regression is carried out, current state s (t) and control amount u (t) are mapped as to the state s (t+ at next moment
1), according to definition, weight matrix W ∈ R is definedm×mFor diagonal matrix, most common gaussian kernel function is used here, and is one
Diagonal matrix, the weight value expression such as formula (3) of matrix:
X in formula(i)It is the i-th row in priori data X matrix, x is state dominant vector [1 s (t) at current timeT u(t)T];
Parameter τ is the scale measurement amount of Gaussian kernel, determines weighted value ωiiThe range influenced by priori data is completed in above-mentioned processing
Afterwards, according to the process of differentiating of regression fit error least squares thinking, the data system model tormulation such as formula (4) of foundation:
V in formula is the noise bias compensation rate between real system, is exactly the other factors not accounted in modeling process,
Enable v=0;
The continuous Robust Control Law design of step 3):
First have to determine the Reward Program at current time, the Quadratic Function Optimization of adoption status variable constructs Reward Program R (t) such as formula
(5):
Wherein, Q is positive definite parameter matrix,Expression formula such as formula (6):
R in the formulaz,ref(t),Indicate the Desired Height of helicopter in vertical direction, desired speed and acceleration
Degree;
The design for carrying out control iterative algorithm after Reward Program has been determined, has obtained control amount u using reference input and current state
(t) value, i.e. u (t)=π (s (t), ω) realize expression formula such as formula (7):
Final purpose using intensified learning is exactly to find optimal control strategy π (s (t), ω), so that total in a period of time
Return value RtotalMaximum value is obtained, control amount expression formula weight at this time is desired optimal weights vector ωbest, this
In use strategy of the stochastic approximation strategy as parameter iteration.
3. the small-sized depopulated helicopter as described in claim 1 based on enhancing study determines high control method, characterized in that random
Approximation Strategy comprises the concrete steps that, first init state, movement, control weight initial value, the LWLR mould obtained according to off-line data
Type obtains the state and Reward Program value at this time of subsequent time, restores subsequent time according to subsequent time state and control
Movement, constantly recycle the process, record the return value at each state moment, to the sequence generate return value sum.And
Update is iterated to control weight accordingly, is constantly looked for until finding minimum Reward Program and corresponding control weight, as
Training gained.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910369215.3A CN110083168A (en) | 2019-05-05 | 2019-05-05 | Small-sized depopulated helicopter based on enhancing study determines high control method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910369215.3A CN110083168A (en) | 2019-05-05 | 2019-05-05 | Small-sized depopulated helicopter based on enhancing study determines high control method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110083168A true CN110083168A (en) | 2019-08-02 |
Family
ID=67418677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910369215.3A Pending CN110083168A (en) | 2019-05-05 | 2019-05-05 | Small-sized depopulated helicopter based on enhancing study determines high control method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110083168A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026147A (en) * | 2019-12-25 | 2020-04-17 | 北京航空航天大学 | Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning |
CN113049202A (en) * | 2021-03-08 | 2021-06-29 | 中国地震局工程力学研究所 | Local weighted regression correction method and system for acceleration integral displacement |
CN113423060A (en) * | 2021-06-22 | 2021-09-21 | 广东工业大学 | Online optimization method for flight route of unmanned aerial communication platform |
CN113892070A (en) * | 2020-04-30 | 2022-01-04 | 乐天集团股份有限公司 | Learning device, information processing device, and control model for completing learning |
CN114967729A (en) * | 2022-03-28 | 2022-08-30 | 广东工业大学 | Multi-rotor unmanned aerial vehicle height control method and system |
CN113892070B (en) * | 2020-04-30 | 2024-04-26 | 乐天集团股份有限公司 | Learning device, information processing device, and control model for completing learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357166A (en) * | 2017-04-29 | 2017-11-17 | 天津大学 | The model-free adaption robust control method of small-sized depopulated helicopter |
CN109683624A (en) * | 2019-01-31 | 2019-04-26 | 天津大学 | Nonlinear robust control method for small-sized depopulated helicopter gesture stability |
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
-
2019
- 2019-05-05 CN CN201910369215.3A patent/CN110083168A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107357166A (en) * | 2017-04-29 | 2017-11-17 | 天津大学 | The model-free adaption robust control method of small-sized depopulated helicopter |
CN109683624A (en) * | 2019-01-31 | 2019-04-26 | 天津大学 | Nonlinear robust control method for small-sized depopulated helicopter gesture stability |
CN109696830A (en) * | 2019-01-31 | 2019-04-30 | 天津大学 | The reinforcement learning adaptive control method of small-sized depopulated helicopter |
Non-Patent Citations (3)
Title |
---|
GAO W 等: "Sampled-data-based adaptive optimal output feedback control of a 2-degree-of-freedom helicopter", 《IET CONTROL THEORY & APPLICATIONS》 * |
苏立军 等: "基于强化学习的四旋翼高度控制器设计", 《测控技术》 * |
蔡文澜: "基于增强学习的小型无人直升机控制方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)工程科技II辑》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111026147A (en) * | 2019-12-25 | 2020-04-17 | 北京航空航天大学 | Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning |
CN111026147B (en) * | 2019-12-25 | 2021-01-08 | 北京航空航天大学 | Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning |
CN113892070A (en) * | 2020-04-30 | 2022-01-04 | 乐天集团股份有限公司 | Learning device, information processing device, and control model for completing learning |
CN113892070B (en) * | 2020-04-30 | 2024-04-26 | 乐天集团股份有限公司 | Learning device, information processing device, and control model for completing learning |
CN113049202A (en) * | 2021-03-08 | 2021-06-29 | 中国地震局工程力学研究所 | Local weighted regression correction method and system for acceleration integral displacement |
CN113049202B (en) * | 2021-03-08 | 2022-07-12 | 中国地震局工程力学研究所 | Local weighted regression correction method and system for acceleration integral displacement |
CN113423060A (en) * | 2021-06-22 | 2021-09-21 | 广东工业大学 | Online optimization method for flight route of unmanned aerial communication platform |
CN113423060B (en) * | 2021-06-22 | 2022-05-10 | 广东工业大学 | Online optimization method for flight route of unmanned aerial communication platform |
CN114967729A (en) * | 2022-03-28 | 2022-08-30 | 广东工业大学 | Multi-rotor unmanned aerial vehicle height control method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110083168A (en) | Small-sized depopulated helicopter based on enhancing study determines high control method | |
CN110806759B (en) | Aircraft route tracking method based on deep reinforcement learning | |
CN110502033B (en) | Fixed-wing unmanned aerial vehicle cluster control method based on reinforcement learning | |
CN105607473B (en) | The attitude error Fast Convergent self-adaptation control method of small-sized depopulated helicopter | |
CN109625333A (en) | A kind of space non-cooperative target catching method based on depth enhancing study | |
Nie et al. | Three-dimensional path-following control of a robotic airship with reinforcement learning | |
Lu et al. | Real-time simulation system for UAV based on Matlab/Simulink | |
Moshayedi et al. | The quadrotor dynamic modeling and study of meta-heuristic algorithms performance on optimization of PID controller index to control angles and tracking the route | |
dos Santos et al. | Design of attitude and path tracking controllers for quad-rotor robots using reinforcement learning | |
Zhang et al. | Recurrent neural network-based model predictive control for multiple unmanned quadrotor formation flight | |
CN111221346A (en) | Method for optimizing PID (proportion integration differentiation) control four-rotor aircraft flight by crowd search algorithm | |
CN107065897A (en) | Three Degree Of Freedom helicopter explicit model forecast Control Algorithm | |
CN110135076A (en) | A kind of holder mechanical structure multiple target integrated optimization method based on ISIGHT associative simulation | |
Salamat et al. | Adaptive nonlinear PID control for a quadrotor UAV using particle swarm optimization | |
Kose et al. | Simultaneous design of morphing hexarotor and autopilot system by using deep neural network and SPSA | |
Grauer | A learn-to-fly approach for adaptively tuning flight control systems | |
Zhou et al. | Nonlinear system identification and trajectory tracking control for a flybarless unmanned helicopter: theory and experiment | |
Ferdaus et al. | Fuzzy clustering based modelling and adaptive controlling of a flapping wing micro air vehicle | |
CN116301007A (en) | Intensive task path planning method for multi-quad-rotor unmanned helicopter based on reinforcement learning | |
Flores et al. | Implementation of a neural network for nonlinearities estimation in a tail-sitter aircraft | |
Zhou et al. | Parameter Optimization on FNN/PID compound controller for a three-axis inertially stabilized platform for aerial remote sensing applications | |
CN115407661A (en) | Multi-unmanned aerial vehicle system nonlinear robust tracking control method based on azimuth measurement information | |
CN114757086A (en) | Multi-rotor unmanned aerial vehicle real-time remaining service life prediction method and system | |
Ahsan et al. | Grey box modeling of lateral-directional dynamics of a uav through system identification | |
Xu et al. | UAV swarm communication aware formation control via deep Q network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190802 |