CN109143855B - Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning - Google Patents

Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning Download PDF

Info

Publication number
CN109143855B
CN109143855B CN201810855339.8A CN201810855339A CN109143855B CN 109143855 B CN109143855 B CN 109143855B CN 201810855339 A CN201810855339 A CN 201810855339A CN 109143855 B CN109143855 B CN 109143855B
Authority
CN
China
Prior art keywords
target
learning
contour
aerial vehicle
unmanned aerial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810855339.8A
Other languages
Chinese (zh)
Other versions
CN109143855A (en
Inventor
徐梦
史豪斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Liuyi FeiMeng Information Technology Co.,Ltd.
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN201810855339.8A priority Critical patent/CN109143855B/en
Publication of CN109143855A publication Critical patent/CN109143855A/en
Application granted granted Critical
Publication of CN109143855B publication Critical patent/CN109143855B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Remote Sensing (AREA)
  • Multimedia (AREA)
  • Astronomy & Astrophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning; the rotor unmanned aerial vehicle acquires image information through a camera, extracts the contour characteristics of a target based on a target contour extraction algorithm of a Firman chain code, and performs contour compensation on the edge information of the target in the image acquisition process; the parameter servo gain parameters of the visual servo are trained by using a reinforcement learning algorithm, so that the rotor unmanned aerial vehicle obtains the self-adaptive servo gain adjusting capability, and the learning rate is adjusted by combining a fuzzy control method. The rotor unmanned aerial vehicle obtains experience through training under the different scenes by using reinforcement learning, can change gain by oneself, and can obtain faster convergence rate for the learning rate memory adjustment of reinforcement learning through fuzzy control simultaneously. The target contour extraction algorithm based on the Ferman chain code is used, so that the error of the extraction algorithm for extracting the central feature point and the actual central feature point is effectively reduced, and the accuracy of feature extraction is improved.

Description

Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning
Technical Field
The invention relates to the field of machine learning and robot automatic control, in particular to a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning.
Background
Today, the current artificial intelligence and machine learning technology are rapidly developed, and the machine learning technology is applied to the aspects of production and life of people. The control method of the rotor unmanned aerial vehicle always uses a classical automatic control method, such as a PID control method or a visual servo control method, but with the increasingly complex tasks born by the rotor unmanned aerial vehicle at present, the environment of the rotor unmanned aerial vehicle is unpredictable, and the classical control method cannot meet the control requirement of the rotor unmanned aerial vehicle; the method aims at the problems that the stability of a PID control method and a visual servo control method based on images and the like adopted by the traditional rotor unmanned aerial vehicle control is not high in a complex scene, the convergence rate is low, and the rotor unmanned aerial vehicle is difficult to efficiently realize a work task in a specific application scene. Therefore, there is a need for a method of intelligent control of a rotorcraft that improves visual servoing in conjunction with machine learning.
Disclosure of Invention
In order to avoid the defects in the prior art, the invention provides a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning; the unmanned gyroplane acquires image information through a bottom camera, and then extracts the contour features of a target through a target contour extraction algorithm based on a Firman chain code, but usually edge information of the target is lost in the image acquisition process, so that contour compensation needs to be performed once; the parameter servo gain parameters of the visual servo are trained by using a reinforcement learning algorithm, so that the rotor unmanned aerial vehicle obtains the capability of adaptively adjusting the servo gain, and the learning rate is adjusted by combining a fuzzy control method. On the basis of the vision servo control method of the rotor unmanned aerial vehicle based on fuzzy SARSA learning, the rotor unmanned aerial vehicle can obtain experience through continuous training by using reinforcement learning under different scenes, so that the gain can be automatically changed, meanwhile, the learning rate memorability adjustment of the reinforcement learning can be realized through the fuzzy control, the operating efficiency of the classical reinforcement learning is accelerated, and the faster convergence speed can be obtained. The target contour extraction algorithm based on the Ferman chain code is used, and the contour is completed by using the contour compensation algorithm, so that the error of the central feature point extraction and the actual central feature point extraction caused by edge deletion of the classical image feature extraction algorithm is effectively reduced, and the feature extraction accuracy is improved.
The invention solves the technical problem by adopting the technical scheme that a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning is characterized by comprising the following steps:
step 1, performing edge extraction on an image by using a Canny algorithm, obtaining a set of N contour coordinates through filtering and noise reduction operations, wherein the set of N contour coordinates is described by using a Firman chain code, and a contour pixel using the Firman chain code is marked as C ═ C { (C)i1., N }; carrying out rotation normalization on contour pixels of the target to obtain a Firman chain code and respectively calculating a Levenshtein distance with a graph in a standard contour library, wherein the Levenshtein distance is calculated in a mode that a shape A is converted into an operand required by a shape B, and the operations are insertion, deletion and modification; the method can be used for identifying the shape of the object under the condition that the image target loses a certain degree of edge;
and 2, after contour pixels of the image are acquired in the step 1, because the photographed image sometimes has a contour incomplete phenomenon, the system uses a contour compensation algorithm. For the l target, the Firman chain code is processed once to obtain NlA feature point of the outline
Figure GDA0002750850890000021
The feature point set of (1) is:
Figure GDA0002750850890000022
for set FlEach element in the N is subjected to rotation normalization to obtain NstandardA standard contour feature point
Figure GDA0002750850890000023
The set of standard contour feature points is denoted as
Figure GDA0002750850890000024
Set to OlFor the compensated feature point profile set, the compensation profile OlWith the standard profile DlThe conversion relationship between the two is as follows: dl·R+L=Ol. Wherein R and L are respectively a rotation matrix and a conversion matrix; compensating contour OlWherein the jth element is PjTotal of N in the setstandardElement, is marked as Ol={Pj|j=1,...,Nstandard}. Taking the central feature point of the target l as a feature point used for visual servo control, wherein the coordinate of the central feature point is obtained by calculating the average of the sum of the coordinates of the compensation contour feature points and is recorded as:
Figure GDA0002750850890000025
step 3, after the central characteristic point of the target is obtained in the step 2, establishing a bottom visual model of the rotor unmanned aerial vehicle, namely a conversion relation from a three-dimensional space to an image pixel plane;
step 4, constructing a decoupled visual servo control model of the rotor unmanned aerial vehicle through the obtained visual model of the rotor unmanned aerial vehicle, wherein the decoupled visual servo control model comprises a visual servo gain value;
step 5, establishing a single-step SARSA learning and adjusting servo gain model; using SARSA to learn and adjust the visual servo gain value of the rotor unmanned aerial vehicle in the step 4;
1) setting a state space; after the contour of the target is extracted through an image feature extraction algorithm, simplifying the target by adopting a central feature point, calculating the absolute value of the error between the current feature point and the target feature point, and summing the absolute values to obtain a certain range as a state;
2) setting an action space; selecting an initial value lambda by analyzing the difference of the selected servo gains as an action*As an initial value of the servo gain; let the size of the action set be 2 x na+1, the action set forms an arithmetic progression with a set tolerance of daIf the action set A is { a ═ a }i|i=1,2,3...,2naAdjusting servo gain on the linear speed and the angular speed respectively;
3) setting a reward function; the reward function is divided into three parts, namely, the reward function reaches an expected target, and the reward function tracks the loss of the target and other conditions; if the characteristic error of each dimension
Figure GDA0002750850890000031
δ is a threshold value, then the quad-rotor drone is considered to have reached the target position, the highest reward may be given; if the feature points are missing compared with the feature points of the target image after the real-time image shot by the quad-rotor unmanned aerial vehicle is subjected to feature extraction, the unmanned aerial vehicle is considered to lose the target, and the return value is a negative value; other situations give rewards depending on how close the quad-rotor drone is to the target;
4) setting a single-step SARSA learning iterative algorithm; setting an iterative formula of servo gain, and setting the iterative formula in two spaces of linear velocity gain and angular velocity gain respectively; an iterative process of a servo gain iterative algorithm is set according to Q learning, the iterative process uses the set servo gain iterative formula, and the iterative updating of the servo gain is completed through the iterative algorithm;
5) setting a learning rule; the maximum spent time unit in one learning round of single-step SARSA learning is set as 400 time slices, the placing positions of the four rotors in each round are within a feasible range, all targets can be seen randomly after the four rotors take off for 1.0m, and 5000 rounds are trained once. If the four rotors still do not reach the designated position from the initial position after 400 time slices, forcibly returning to the starting point again for the next round; if the characteristic points are lost due to the four-rotor motion, the current turn is ended, and the next turn is restarted; if the distance between the four rotors and the target position is kept within 5 pixels for a certain time in the movement process of the four rotors, the target point is reached, and the next turn is finished; fourthly, updating the servo gain after each round is finished;
step 6, fuzzy control rule, after establishing SARSA learning regulation visual servo gain model in step 5, using fuzzy control to carry out self-adaptive regulation of learning rate, wherein the basic rule of the self-adaptive regulation of learning rate is as follows, if the intelligent agent adopts the learned gain to increase the characteristic error, the learning rate is reduced, otherwise, the learning rate is increased; changing the learning rate of reinforcement learning by using fuzzy control, taking the change rate of characteristic errors as observed quantity, fuzzifying the observed quantity, setting a fuzzy control rule of 'maximum-minimum synthesis operation', inputting the observed quantity into the fuzzy control rule to obtain a controlled quantity learning rate, and finally obtaining the learning rate by defuzzification.
Advantageous effects
The invention provides a visual servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning. The rotor unmanned aerial vehicle is controlled by a visual servo control method based on images, and the visual servo based on the images can form closed-loop feedback adjustment based on errors so as to control reasonable movement of the rotor unmanned aerial vehicle. The self-adaptive adjustment of the servo gain is carried out by using a reinforcement learning method, the rotor unmanned aerial vehicle is trained under different scenes, and the rotor unmanned aerial vehicle learns the gain changing capability under different scenes after multiple times of training. Changing the learning rate of reinforcement learning by using fuzzy control, taking the change rate of characteristic errors as observed quantity, fuzzifying the observed quantity, setting a fuzzy control rule of 'maximum-minimum synthesis operation', inputting the observed quantity into the fuzzy control rule to obtain a controlled quantity learning rate, and finally obtaining the learning rate by defuzzification. The unmanned gyroplane learns the skill of self-adaptive gain adjustment after multiple training.
Drawings
The following describes in detail a method for controlling a visual servo of a rotary wing drone based on fuzzy SARSA learning according to the present invention with reference to the accompanying drawings and embodiments.
Fig. 1 is a flow chart of a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning according to the present invention.
Detailed Description
The embodiment is a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning.
Aiming at the problem of loss of the outline part in the traditional visual feature extraction algorithm, the embodiment provides an outline extraction algorithm based on a Kalman chain code, the outline compensation algorithm is used for compensating the outline, and then a weighted average method is used for calculating the central feature point of the target. To rotor unmanned aerial vehicle underactuation, nonlinear dynamics characteristic, the rotor unmanned aerial vehicle visual servo control model of a decoupling zero is proposed to this embodiment. The present embodiment provides a method for adjusting the visual servo gain by using the SARSA learning, aiming at the problem that the fixed visual servo gain value is inefficient and cannot adapt to the complex environment. Aiming at the problem that learning efficiency is not high due to the fact that the learning rate value of SARSA learning is fixed, the embodiment provides that the learning rate of SARSA learning is adjusted by using a fuzzy control method.
Referring to fig. 1, the method for controlling the vision servo of the unmanned rotorcraft based on the fuzzy SARSA learning in the embodiment includes two aspects of image feature extraction and intelligent control method of the unmanned rotorcraft, and includes the following steps:
the method comprises the following steps of performing edge extraction on an image by using a Canny algorithm, and then obtaining a set of N contour coordinates through filtering and noise reduction operations, wherein the set of N contour coordinates is described by using a Firman chain code, and contour pixels using the Firman chain code are marked as C ═ C { (C {)i1., N }; the visual feature extraction algorithm for the unmanned rotorcraft provided by the embodiment needs to establish a graph library, and assumes that the established standard profile library has M graphs corresponding to M actual objects, and the profiles of the M graphs are represented as D ═ DiI 1., N }, first of all, the method is to usePerforming rotation normalization on the Ferman chain code by using first-order difference according to a formula:
Figure GDA0002750850890000051
the method comprises the steps of obtaining a Lelman chain code after carrying out rotation normalization on contour pixels of a target and respectively calculating a Levenshtein distance with a graph in a standard contour library, wherein the Levenshtein distance is calculated in the mode that an operand is needed for converting a shape A into a shape B, the operation can be only insertion, deletion and modification, and the method can be used for identifying the shape of an object under the condition that the image target loses a certain degree of edge.
And secondly, because the shot picture sometimes has incomplete outline, the system uses an outline compensation algorithm. For the l target, the ferman chain code is obtained through one processing:
Figure GDA0002750850890000052
the set of standard contour feature points obtained after the set is subjected to rotation normalization is
Figure GDA0002750850890000053
The contours of the identified objects are represented as X ═ X using the ferman chain codek1.,. q }, where q is the number of feature points, and x is the number of feature pointskIs the coordinates of the kth feature point. The contours in the standard contour library are
Figure GDA0002750850890000054
Wherein q is the number of the characteristic points,
Figure GDA0002750850890000055
is the coordinates of the kth feature point. Establishing X and X*Relation of X*.R+L=X;
Let R be rotation matrix R ═ cos β, sin β, sin β, cos β ]; the formula is obtained through derivation:
Figure GDA0002750850890000056
wherein R ═ H*)+H,L=X-X*(H*) + H; thus, for a standard profile DlAnd compensation profile OlThe relationship between D andl·R+L=Ol. Note Ol={Pj|j=1,...,Nstan dard}; taking the central feature point of the target l as a feature point used for visual servo control, wherein the coordinate of the central feature point is obtained by calculating the average of the sum of the coordinates of the compensation contour feature points and is recorded as:
Figure GDA0002750850890000057
establishing a conversion relation from a three-dimensional space to an image pixel plane;
let P be (X)p,Yp,Zp)TIs a point in space, Pi=(x,y,z)TFor the projection of the point P on the image plane, a formula can be obtained according to the principle of pinhole imaging:
Figure GDA0002750850890000058
wherein, KsIs constant, f is focal length;
the image collected by the vision sensor is stored in a computer by using a binary function, and is marked as f (u, v), wherein the (u, v) is a coordinate on an image plane, and the f (u, v) is a pixel value at the point; the relationship of a point (u, v) in the image plane coordinate system to a point (x, y) in the coordinate system of the vision sensor plane is according to the formula:
Figure GDA0002750850890000061
recording origin O of coordinate plane of vision sensor1The coordinate in the image plane is (u)q,vq),dx,dyFor scaling from image plane to vision sensor planeAnd (4) proportion.
Establishing a rotor unmanned aerial vehicle control model based on visual servo; e represents a characteristic error vector, which can be expressed as e ═ fc-f*,fcRepresenting the current coordinates (u) of the feature points in the image planec,vc),f*As coordinates (u) of desired feature points*,v*) (ii) a Calculating the change rate of the characteristic error along with time and the angular speed according to the dynamic relation, wherein the functional relation between the linear speeds is as follows:
Figure GDA0002750850890000062
wherein the content of the first and second substances,
Figure GDA0002750850890000063
is the image Jacobian matrix, v ═ vx,vy,vz)TIs the linear velocity vector of the unmanned aerial vehicle, fc=(uc,vc)TAs the current position coordinate vector of the feature point, ω ═ ω (ω ═ ω)xyz)TThe vector of the pitch angle speed, the roll angle speed and the yaw angle speed of the unmanned aerial vehicle. In order to ensure that the characteristic error is exponentially decoupled and reduced,
Figure GDA0002750850890000064
substituting into equation (5) yields:
Figure GDA0002750850890000065
wherein the content of the first and second substances,
Figure GDA0002750850890000066
is the pseudo-inverse of the matrix J, if J is a square matrix, the pseudo-inverse of the matrix is the inverse of the matrix and is denoted as J-1If the matrix is a matrix with different rows and columns, then
Figure GDA0002750850890000067
λv,ωFor servo gain, takeThe value range is (0, 1).
Considering the dynamics of a quad-rotor drone, the formula (X) is obtained from the kinematicsp,Yp,Zp) The relationship between the rate of change with time and the linear and angular velocities is according to the formula:
Figure GDA0002750850890000068
the formula can be derived in conjunction with formula (7):
Figure GDA0002750850890000069
if the number of the characteristic points is N, the coordinate set of the characteristic points is { fiI ═ 1,2.. N }, so the characteristic point error is
Figure GDA00027508508900000610
The image jacobian matrix is:
Figure GDA00027508508900000611
the decoupling is performed using servo gains independent of linear velocity and angular velocity,
Figure GDA00027508508900000612
conversion to the formula:
Figure GDA0002750850890000071
wherein λ isvAnd λωThe servo gains for linear velocity and angular velocity respectively,
Figure GDA0002750850890000072
is composed of
Figure GDA0002750850890000073
The first three rows of the matrix of the sub-matrix,
Figure GDA0002750850890000074
is composed of
Figure GDA0002750850890000075
The fourth row of (a) constitutes a sub-matrix.
Establishing a single-step SARSA learning and adjusting servo gain model;
1. setting a state space:
after the contour of the target is extracted through an image feature extraction algorithm, the target is simplified by adopting the central feature point, and after the absolute value of the error between the current feature point and the target feature point is calculated, the absolute value is summed to obtain a certain range as a state.
2. Setting an action space:
selecting an initial value lambda by analyzing the difference of the selected servo gains as an action*As an initial value of the servo gain; let the size of the action set be 2 x na+1, the action set forms an arithmetic progression with a set tolerance of daIf the action set A is { a ═ a }i|i=1,2,3...,2naThe action set is unfolded to form { -nada,-(na-1)da,...,-da,0,da,2da,...,(na-1)da,nada}; the servo gain adjustment formula is:
Figure GDA0002750850890000076
the adjustment formula of the servo gain in the linear velocity and angular velocity directions according to formula (11) is:
Figure GDA0002750850890000077
wherein the content of the first and second substances,
Figure GDA0002750850890000078
to adjust the previous linear velocity servo gain
Figure GDA0002750850890000079
Selection actions
Figure GDA00027508508900000710
Then the obtained linear velocity servo gain is adjusted,
Figure GDA00027508508900000711
in the selecting action, the servo gain of the yaw angle before the adjustment
Figure GDA00027508508900000712
Then the servo gain is adjusted to
Figure GDA00027508508900000713
3. Setting a reward function:
the reward function is divided into three parts, the expected target is reached, and the target loss and other conditions are tracked;
if the characteristic error of each dimension
Figure GDA00027508508900000714
δ is a threshold value, then it is assumed that the quad-rotor drone has reached the target location, and the highest reward may be given.
If the feature points are missing compared with the feature points of the target image after the real-time images shot by the quad-rotor unmanned aerial vehicle pass through feature extraction, the unmanned aerial vehicle is considered to have lost the target, and the return value is a negative value.
Other situations are awarded based on how close the quad-rotor drone is to the target.
Thus, using a function-analytic expression to describe the reward function follows the formula:
Figure GDA0002750850890000081
where row and col represent the length and width of the image plane, respectively.
4. Setting a single-step SARSA learning iterative algorithm:
the iterative formula for the servo gain is:
Figure GDA0002750850890000082
iterative process of algorithm using Q learning:
1) initialization
Figure GDA0002750850890000083
And
Figure GDA0002750850890000084
status of state
Figure GDA0002750850890000085
Movement of
Figure GDA0002750850890000086
Q indicates that all are initialized to 0.
2) The current state is
Figure GDA0002750850890000087
Randomly generating two random numbers respectively denoted as rand1And rand2If rand1If epsilon is less than epsilon, an action is randomly selected
Figure GDA0002750850890000088
Otherwise, selecting action according to formula (15); similar reason rand2< epsilon random selection action
Figure GDA0002750850890000089
Otherwise, selecting action according to formula (15);
Figure GDA00027508508900000810
3) taking action
Figure GDA00027508508900000811
And
Figure GDA00027508508900000812
then, the next state is obtained respectively
Figure GDA00027508508900000813
And
Figure GDA00027508508900000814
a prize r.
4) Repeating the step 2 to obtain
Figure GDA00027508508900000815
Updating Q according to equation (14)vAnd Qω
5) And returning to the step 2, and repeating the steps for multiple times.
5. Setting learning rules:
the maximum spent time unit in one learning round of single-step SARSA learning is set as 400 time slices, the placing positions of the four rotors in each round are within a feasible range, all targets can be seen randomly after the four rotors take off for 1.0m, and 5000 rounds are trained once. If the four rotors still do not reach the designated position from the initial position after 400 time slices, forcibly returning to the starting point again for the next round; if the characteristic points are lost due to the four-rotor motion, the current turn is ended, and the next turn is restarted; if the distance between the four rotors and the target position is kept within 5 pixels for a certain time in the movement process of the four rotors, the target point is reached, and the next turn is finished; fourthly, updating the servo gain after each round is finished.
And a fuzzy control rule, namely using fuzzy control to perform adaptive adjustment on the learning rate, wherein the basic rule of the adaptive adjustment on the learning rate is that if the intelligent agent adopts the gain after learning to increase the characteristic error, the learning rate is reduced, and otherwise, the learning rate is increased. In the embodiment, the characteristic error change rate is used as the observed quantity, the controlled quantity is the learning rate, and the specific fuzzy control rule steps are as follows:
1. distance sum of characteristic point and its expected position
Figure GDA0002750850890000091
Is changed by
Figure GDA0002750850890000092
As observed quantity, and
Figure GDA0002750850890000093
fuzzification is described as "fast decrease, slow decrease, substantially unchanged, slow increase, fast increase"; the learning rate output result is taken as a control amount of the fuzzy control, i.e., an output amount, and learning rate fuzzification is described as "large, medium, small".
2. Setting the membership function of the input quantity described by the { DR, DS, RU, IS, IR } as
Figure GDA0002750850890000094
The output quantity is sequentially the membership function described by L, LL, M, LS and S
Figure GDA0002750850890000095
Setting the membership function of the input quantity described by the { DR, DS, RU, IS, IR } as
Figure GDA0002750850890000096
The output quantity is sequentially the membership function described by L, LL, M, LS and S
Figure GDA0002750850890000097
Taking the output DR ambiguity membership functions as an example, the expression of each membership function can be obtained as:
Figure GDA0002750850890000098
3. uniformly selecting n discrete points from input quantity value range
Figure DEST_PATH_GDA0001877057160000089
Each point has 5 corresponding membership values relative to the input quantity fuzzy description, so that an input quantity membership discrete matrix can be constructed
Figure DEST_PATH_GDA00018770571600000810
The calculation formula is as follows:
Figure GDA00027508508900000911
uniformly selecting m discrete points from output theory domain
Figure DEST_PATH_GDA00018770571600000812
Then construct an output quantity membership degree discrete matrix as
Figure DEST_PATH_GDA0001877057160000091
The calculation formula is as follows:
Figure GDA00027508508900000914
4. setting a fuzzy rule as if DR, L; if DR, LL; if RU, M; if IS, LS; if IR, S "; combining the maximum-minimum synthesis operation to deduce a fuzzy inference engine (R ═ R)ij) ∈ Rn × m ,i = 1,...,n,j = 1,...,m} :
Figure GDA00027508508900000916
Wherein, Λ represents the selected minimum value and v represents the selected maximum value.
5. For a particular observed quantity
Figure GDA00027508508900000917
Firstly, 5 membership values of the fuzzy description relative to the input quantity are obtained
Figure GDA00027508508900000918
Obtained by weighting
Figure GDA00027508508900000919
Is (p) the fuzzified input quantity ρ ═j ) ∈ R1 × n The calculation formula is:
Figure GDA00027508508900000921
6. will be provided with
Figure GDA0002750850890000101
The fuzzified input quantity rho and the inference engine R obtain a corresponding output quantity fuzzy vector beta through a maximum-minimum synthesis operation (beta)j ) ∈ R1 × m The calculation formula is:
Figure GDA0002750850890000103
7. and calculating the final learning rate control result according to the fuzzy control quantity beta, and performing defuzzification operation by using a weighted average method, wherein the final output result, namely the learning rate is as follows:
Figure GDA0002750850890000104
thereby, for a determined observed quantity
Figure GDA0002750850890000105
The corresponding learning rate α can be obtained by fuzzy control0(ii) a Through the variable learning rate design based on the fuzzy control, the learning time can be reduced to a certain extent, and the algorithm operation efficiency is improvedAnd (4) rate.

Claims (1)

1. A vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning is characterized by comprising the following steps:
step 1, performing edge extraction on an image by using a Canny algorithm, obtaining a set of N contour coordinates through filtering and noise reduction operations, wherein the set of N contour coordinates is described by using a Firman chain code, and a contour pixel using the Firman chain code is marked as C ═ C { (C)i1., N }; carrying out rotation normalization on contour pixels of the target to obtain a Firman chain code and respectively calculating a Levenshtein distance with a graph in a standard contour library, wherein the Levenshtein distance is calculated in a mode that a shape A is converted into an operand required by a shape B, and the operations are insertion, deletion and modification; the method can be used for identifying the shape of the object under the condition that the image target loses a certain degree of edge;
step 2, after contour pixels of the image are obtained in the step 1, because the shot image sometimes has a contour incomplete phenomenon, a contour compensation algorithm is used; for the l target, the Firman chain code is processed once to obtain NlA feature point of the outline
Figure FDA0002835065100000011
The feature point set of (1) is:
Figure FDA0002835065100000012
for set FlEach element in the N is subjected to rotation normalization to obtain NstandardA standard contour feature point
Figure FDA0002835065100000013
The set of standard contour feature points is denoted as
Figure FDA0002835065100000014
Set to OlFor the compensated feature point profile set, the compensation profile OlWith the standard profile DlThe conversion relationship between the two is as follows: dl.R+L=Ol(ii) a Wherein R and L are respectively a rotation matrix and a conversion matrix; compensating contour OlWherein the jth element is PjTotal of N in the setstandardElement, is marked as Ol={Pj|j=1,...,Nstandard}; taking the central feature point of the target l as a feature point used for visual servo control, wherein the coordinate of the central feature point is obtained by calculating the average of the sum of the coordinates of the compensation contour feature points and is recorded as:
Figure FDA0002835065100000015
step 3, after the central characteristic point of the target is obtained in the step 2, establishing a bottom visual model of the rotor unmanned aerial vehicle, namely a conversion relation from a three-dimensional space to an image pixel plane;
step 4, constructing a decoupled visual servo control model of the rotor unmanned aerial vehicle through the obtained visual model of the rotor unmanned aerial vehicle, wherein the decoupled visual servo control model comprises a visual servo gain value;
step 5, establishing a single-step SARSA learning and adjusting servo gain model; using SARSA to learn and adjust the visual servo gain value of the rotor unmanned aerial vehicle in the step 4;
1) setting a state space; after the contour of the target is extracted through an image feature extraction algorithm, simplifying the target by adopting a central feature point, calculating the absolute value of the error between the current feature point and the target feature point, and summing the absolute values to obtain a certain range as a state;
2) setting an action space; selecting an initial value lambda by analyzing the difference of the selected servo gains as an action*As an initial value of the servo gain; let the size of the action set be 2 x na+1, the action set forms an arithmetic progression with a set tolerance of daIf the action set A is { a ═ a }i|i=1,2,3...,2naAdjusting the servo gain of the linear velocity and the angular velocity respectively;
3) setting a reward function; the reward function is divided into three parts, namely, the reward function reaches an expected target, and the reward function tracks the loss of the target and other conditions; if the characteristic error of each dimension
Figure FDA0002835065100000021
δ is a threshold value, then the quad-rotor drone is considered to have reached the target position, the highest reward may be given; if the feature points are missing compared with the feature points of the target image after the real-time image shot by the quad-rotor unmanned aerial vehicle is subjected to feature extraction, the unmanned aerial vehicle is considered to lose the target, and the return value is a negative value; other situations give rewards depending on how close the quad-rotor drone is to the target;
4) setting a single-step SARSA learning iterative algorithm; setting an iterative formula of servo gain, and setting the iterative formula in two spaces of linear velocity gain and angular velocity gain respectively; an iterative process of a servo gain iterative algorithm is set according to Q learning, the iterative process uses the iterative formula for setting the servo gain, and the iterative updating of the servo gain is completed through the iterative algorithm;
5) setting a learning rule; setting the maximum spent time unit in one learning round of single-step SARSA learning as 400 time slices, wherein the placing positions of the four rotors in each round are in a feasible range, all targets can be seen randomly after the four rotors take off for 1.0m, and 5000 rounds are trained once; if the four rotors still do not reach the designated position from the initial position after 400 time slices, forcibly returning to the starting point again for the next round; if the characteristic points are lost due to the four-rotor motion, the current turn is ended, and the next turn is restarted; if the distance between the four rotors and the target position is kept within 5 pixels for a certain time in the movement process of the four rotors, the target point is reached, and the next turn is finished; fourthly, updating the servo gain after each round is finished;
step 6, fuzzy control rule, after establishing SARSA learning regulation visual servo gain model in step 5, using fuzzy control to carry out self-adaptive regulation of learning rate, wherein the basic rule of the self-adaptive regulation of learning rate is as follows, if the intelligent agent adopts the learned gain to increase the characteristic error, the learning rate is reduced, otherwise, the learning rate is increased; changing the learning rate of reinforcement learning by using fuzzy control, taking the change rate of characteristic errors as observed quantity, fuzzifying the observed quantity, setting a fuzzy control rule of 'maximum-minimum synthesis operation', inputting the observed quantity into the fuzzy control rule to obtain a controlled quantity learning rate, and finally obtaining the learning rate by defuzzification.
CN201810855339.8A 2018-07-31 2018-07-31 Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning Active CN109143855B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810855339.8A CN109143855B (en) 2018-07-31 2018-07-31 Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810855339.8A CN109143855B (en) 2018-07-31 2018-07-31 Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning

Publications (2)

Publication Number Publication Date
CN109143855A CN109143855A (en) 2019-01-04
CN109143855B true CN109143855B (en) 2021-04-02

Family

ID=64798489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810855339.8A Active CN109143855B (en) 2018-07-31 2018-07-31 Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning

Country Status (1)

Country Link
CN (1) CN109143855B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109696830B (en) * 2019-01-31 2021-12-03 天津大学 Reinforced learning self-adaptive control method of small unmanned helicopter
CN111026146B (en) * 2019-12-24 2021-04-06 西北工业大学 Attitude control method for composite wing vertical take-off and landing unmanned aerial vehicle
CN112612212B (en) * 2020-12-30 2021-11-23 上海大学 Heterogeneous multi-unmanned system formation and cooperative target driving-away method
CN114609976A (en) * 2022-04-12 2022-06-10 天津航天机电设备研究所 Non-calibration visual servo control method based on homography and Q learning
CN114859971A (en) * 2022-05-07 2022-08-05 北京卓翼智能科技有限公司 Intelligent unmanned aerial vehicle for monitoring wind turbine
CN116700348B (en) * 2023-07-12 2024-03-19 湖南文理学院 Visual servo control method and system for four-rotor aircraft with limited vision

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3494772B2 (en) * 1995-09-08 2004-02-09 カヤバ工業株式会社 Fuzzy control device
TWI334066B (en) * 2007-03-05 2010-12-01 Univ Nat Taiwan Science Tech Method of fuzzy logic control with combined sliding mode concept for ideal dynamic responses
CN101588480B (en) * 2009-05-27 2010-09-08 北京航空航天大学 Multi-agent visual servo-coordination control method
CN102135761B (en) * 2011-01-10 2013-08-21 穆科明 Fuzzy self-adaptive control system for parameters of visual sensor
CN105353772B (en) * 2015-11-16 2018-11-09 中国航天时代电子公司 A kind of Visual servoing control method in UAV Maneuver target locating
CN107894709A (en) * 2017-04-24 2018-04-10 长春工业大学 Controlled based on Adaptive critic network redundancy Robot Visual Servoing

Also Published As

Publication number Publication date
CN109143855A (en) 2019-01-04

Similar Documents

Publication Publication Date Title
CN109143855B (en) Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning
US9019278B2 (en) Systems and methods for animating non-humanoid characters with human motion data
Shi et al. Decoupled visual servoing with fuzzy Q-learning
CN111240356B (en) Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning
JP2021518622A (en) Self-location estimation, mapping, and network training
Zhao et al. A brain-inspired decision making model based on top-down biasing of prefrontal cortex to basal ganglia and its application in autonomous UAV explorations
CN111260026B (en) Navigation migration method based on meta reinforcement learning
Passalis et al. Deep reinforcement learning for controlling frontal person close-up shooting
Hoang et al. Vision-based target tracking and autonomous landing of a quadrotor on a ground vehicle
CN113281999A (en) Unmanned aerial vehicle autonomous flight training method based on reinforcement learning and transfer learning
Srivastava et al. Least square policy iteration for ibvs based dynamic target tracking
Kim et al. Learning and generalization of dynamic movement primitives by hierarchical deep reinforcement learning from demonstration
CN116501168A (en) Unmanned aerial vehicle gesture control method and control system based on chaotic sparrow search and fuzzy PID parameter optimization
Amirkhani et al. Fuzzy cognitive map for visual servoing of flying robot
Jones et al. Using neural networks to learn hand-eye co-ordination
Xing et al. Contrastive learning for enhancing robust scene transfer in vision-based agile flight
Becerra Fuzzy visual control for memory-based navigation using the trifocal tensor
CN109542094B (en) Mobile robot vision stabilization control without desired images
Lopez-Franco et al. Neural control for a differential drive wheeled mobile robot integrating stereo vision feedback
CN112051733B (en) Rigid mechanical arm composite learning control method based on image classification
CN110703792A (en) Underwater robot attitude control method based on reinforcement learning
Kumar et al. Benchmarking Deep Reinforcement Learning Algorithms for Vision-based Robotics
Chen Adaptive Shape-Servoing for Vision-based Robotic Manipulation with Model Estimation and Performance Regulation
Maeda et al. View-based programming with reinforcement learning for robotic manipulation
Chen et al. Vision-assisted Arm Motion Planning for Freeform 3D Printing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20210926

Address after: 710072 No. 4689, block B, Feixiang Zhongchuang space, West University of technology, 4 / F, innovation building, No. 127, Youyi West Road, Beilin District, Xi'an City, Shaanxi Province

Patentee after: Xi'an Liuyi FeiMeng Information Technology Co.,Ltd.

Address before: 710072 No. 127 Youyi West Road, Shaanxi, Xi'an

Patentee before: Northwestern Polytechnical University

TR01 Transfer of patent right