CN109143855B

CN109143855B - Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning

Info

Publication number: CN109143855B
Application number: CN201810855339.8A
Authority: CN
Inventors: 徐梦; 史豪斌
Original assignee: Northwestern Polytechnical University
Current assignee: Xi'an Liuyi FeiMeng Information Technology Co.,Ltd.
Priority date: 2018-07-31
Filing date: 2018-07-31
Publication date: 2021-04-02
Anticipated expiration: 2038-07-31
Also published as: CN109143855A

Abstract

The invention discloses a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning; the rotor unmanned aerial vehicle acquires image information through a camera, extracts the contour characteristics of a target based on a target contour extraction algorithm of a Firman chain code, and performs contour compensation on the edge information of the target in the image acquisition process; the parameter servo gain parameters of the visual servo are trained by using a reinforcement learning algorithm, so that the rotor unmanned aerial vehicle obtains the self-adaptive servo gain adjusting capability, and the learning rate is adjusted by combining a fuzzy control method. The rotor unmanned aerial vehicle obtains experience through training under the different scenes by using reinforcement learning, can change gain by oneself, and can obtain faster convergence rate for the learning rate memory adjustment of reinforcement learning through fuzzy control simultaneously. The target contour extraction algorithm based on the Ferman chain code is used, so that the error of the extraction algorithm for extracting the central feature point and the actual central feature point is effectively reduced, and the accuracy of feature extraction is improved.

Description

Visual servo control method of unmanned gyroplane based on fuzzy SARSA learning

Technical Field

The invention relates to the field of machine learning and robot automatic control, in particular to a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning.

Background

Today, the current artificial intelligence and machine learning technology are rapidly developed, and the machine learning technology is applied to the aspects of production and life of people. The control method of the rotor unmanned aerial vehicle always uses a classical automatic control method, such as a PID control method or a visual servo control method, but with the increasingly complex tasks born by the rotor unmanned aerial vehicle at present, the environment of the rotor unmanned aerial vehicle is unpredictable, and the classical control method cannot meet the control requirement of the rotor unmanned aerial vehicle; the method aims at the problems that the stability of a PID control method and a visual servo control method based on images and the like adopted by the traditional rotor unmanned aerial vehicle control is not high in a complex scene, the convergence rate is low, and the rotor unmanned aerial vehicle is difficult to efficiently realize a work task in a specific application scene. Therefore, there is a need for a method of intelligent control of a rotorcraft that improves visual servoing in conjunction with machine learning.

Disclosure of Invention

In order to avoid the defects in the prior art, the invention provides a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning; the unmanned gyroplane acquires image information through a bottom camera, and then extracts the contour features of a target through a target contour extraction algorithm based on a Firman chain code, but usually edge information of the target is lost in the image acquisition process, so that contour compensation needs to be performed once; the parameter servo gain parameters of the visual servo are trained by using a reinforcement learning algorithm, so that the rotor unmanned aerial vehicle obtains the capability of adaptively adjusting the servo gain, and the learning rate is adjusted by combining a fuzzy control method. On the basis of the vision servo control method of the rotor unmanned aerial vehicle based on fuzzy SARSA learning, the rotor unmanned aerial vehicle can obtain experience through continuous training by using reinforcement learning under different scenes, so that the gain can be automatically changed, meanwhile, the learning rate memorability adjustment of the reinforcement learning can be realized through the fuzzy control, the operating efficiency of the classical reinforcement learning is accelerated, and the faster convergence speed can be obtained. The target contour extraction algorithm based on the Ferman chain code is used, and the contour is completed by using the contour compensation algorithm, so that the error of the central feature point extraction and the actual central feature point extraction caused by edge deletion of the classical image feature extraction algorithm is effectively reduced, and the feature extraction accuracy is improved.

The invention solves the technical problem by adopting the technical scheme that a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning is characterized by comprising the following steps:

step 1, performing edge extraction on an image by using a Canny algorithm, obtaining a set of N contour coordinates through filtering and noise reduction operations, wherein the set of N contour coordinates is described by using a Firman chain code, and a contour pixel using the Firman chain code is marked as C ═ C { (C)_i1., N }; carrying out rotation normalization on contour pixels of the target to obtain a Firman chain code and respectively calculating a Levenshtein distance with a graph in a standard contour library, wherein the Levenshtein distance is calculated in a mode that a shape A is converted into an operand required by a shape B, and the operations are insertion, deletion and modification; the method can be used for identifying the shape of the object under the condition that the image target loses a certain degree of edge;

and 2, after contour pixels of the image are acquired in the step 1, because the photographed image sometimes has a contour incomplete phenomenon, the system uses a contour compensation algorithm. For the l target, the Firman chain code is processed once to obtain N_lA feature point of the outline

The feature point set of (1) is:

for set F_lEach element in the N is subjected to rotation normalization to obtain N_standardA standard contour feature point

The set of standard contour feature points is denoted as

Set to O_lFor the compensated feature point profile set, the compensation profile O_lWith the standard profile D_lThe conversion relationship between the two is as follows: d_l·R+L＝O_l. Wherein R and L are respectively a rotation matrix and a conversion matrix; compensating contour O_lWherein the jth element is P_jTotal of N in the set_standardElement, is marked as O_l＝{P_j|j＝1,...,N_standard}. Taking the central feature point of the target l as a feature point used for visual servo control, wherein the coordinate of the central feature point is obtained by calculating the average of the sum of the coordinates of the compensation contour feature points and is recorded as:

step 3, after the central characteristic point of the target is obtained in the step 2, establishing a bottom visual model of the rotor unmanned aerial vehicle, namely a conversion relation from a three-dimensional space to an image pixel plane;

step 4, constructing a decoupled visual servo control model of the rotor unmanned aerial vehicle through the obtained visual model of the rotor unmanned aerial vehicle, wherein the decoupled visual servo control model comprises a visual servo gain value;

step 5, establishing a single-step SARSA learning and adjusting servo gain model; using SARSA to learn and adjust the visual servo gain value of the rotor unmanned aerial vehicle in the step 4;

1) setting a state space; after the contour of the target is extracted through an image feature extraction algorithm, simplifying the target by adopting a central feature point, calculating the absolute value of the error between the current feature point and the target feature point, and summing the absolute values to obtain a certain range as a state;

2) setting an action space; selecting an initial value lambda by analyzing the difference of the selected servo gains as an action^*As an initial value of the servo gain; let the size of the action set be 2 x n^a+1, the action set forms an arithmetic progression with a set tolerance of d^aIf the action set A is { a ═ a }_i|i＝1,2,3...，2n^aAdjusting servo gain on the linear speed and the angular speed respectively;

3) setting a reward function; the reward function is divided into three parts, namely, the reward function reaches an expected target, and the reward function tracks the loss of the target and other conditions; if the characteristic error of each dimension

δ is a threshold value, then the quad-rotor drone is considered to have reached the target position, the highest reward may be given; if the feature points are missing compared with the feature points of the target image after the real-time image shot by the quad-rotor unmanned aerial vehicle is subjected to feature extraction, the unmanned aerial vehicle is considered to lose the target, and the return value is a negative value; other situations give rewards depending on how close the quad-rotor drone is to the target;

4) setting a single-step SARSA learning iterative algorithm; setting an iterative formula of servo gain, and setting the iterative formula in two spaces of linear velocity gain and angular velocity gain respectively; an iterative process of a servo gain iterative algorithm is set according to Q learning, the iterative process uses the set servo gain iterative formula, and the iterative updating of the servo gain is completed through the iterative algorithm;

5) setting a learning rule; the maximum spent time unit in one learning round of single-step SARSA learning is set as 400 time slices, the placing positions of the four rotors in each round are within a feasible range, all targets can be seen randomly after the four rotors take off for 1.0m, and 5000 rounds are trained once. If the four rotors still do not reach the designated position from the initial position after 400 time slices, forcibly returning to the starting point again for the next round; if the characteristic points are lost due to the four-rotor motion, the current turn is ended, and the next turn is restarted; if the distance between the four rotors and the target position is kept within 5 pixels for a certain time in the movement process of the four rotors, the target point is reached, and the next turn is finished; fourthly, updating the servo gain after each round is finished;

step 6, fuzzy control rule, after establishing SARSA learning regulation visual servo gain model in step 5, using fuzzy control to carry out self-adaptive regulation of learning rate, wherein the basic rule of the self-adaptive regulation of learning rate is as follows, if the intelligent agent adopts the learned gain to increase the characteristic error, the learning rate is reduced, otherwise, the learning rate is increased; changing the learning rate of reinforcement learning by using fuzzy control, taking the change rate of characteristic errors as observed quantity, fuzzifying the observed quantity, setting a fuzzy control rule of 'maximum-minimum synthesis operation', inputting the observed quantity into the fuzzy control rule to obtain a controlled quantity learning rate, and finally obtaining the learning rate by defuzzification.

Advantageous effects

The invention provides a visual servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning. The rotor unmanned aerial vehicle is controlled by a visual servo control method based on images, and the visual servo based on the images can form closed-loop feedback adjustment based on errors so as to control reasonable movement of the rotor unmanned aerial vehicle. The self-adaptive adjustment of the servo gain is carried out by using a reinforcement learning method, the rotor unmanned aerial vehicle is trained under different scenes, and the rotor unmanned aerial vehicle learns the gain changing capability under different scenes after multiple times of training. Changing the learning rate of reinforcement learning by using fuzzy control, taking the change rate of characteristic errors as observed quantity, fuzzifying the observed quantity, setting a fuzzy control rule of 'maximum-minimum synthesis operation', inputting the observed quantity into the fuzzy control rule to obtain a controlled quantity learning rate, and finally obtaining the learning rate by defuzzification. The unmanned gyroplane learns the skill of self-adaptive gain adjustment after multiple training.

Drawings

The following describes in detail a method for controlling a visual servo of a rotary wing drone based on fuzzy SARSA learning according to the present invention with reference to the accompanying drawings and embodiments.

Fig. 1 is a flow chart of a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning according to the present invention.

Detailed Description

The embodiment is a vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning.

Aiming at the problem of loss of the outline part in the traditional visual feature extraction algorithm, the embodiment provides an outline extraction algorithm based on a Kalman chain code, the outline compensation algorithm is used for compensating the outline, and then a weighted average method is used for calculating the central feature point of the target. To rotor unmanned aerial vehicle underactuation, nonlinear dynamics characteristic, the rotor unmanned aerial vehicle visual servo control model of a decoupling zero is proposed to this embodiment. The present embodiment provides a method for adjusting the visual servo gain by using the SARSA learning, aiming at the problem that the fixed visual servo gain value is inefficient and cannot adapt to the complex environment. Aiming at the problem that learning efficiency is not high due to the fact that the learning rate value of SARSA learning is fixed, the embodiment provides that the learning rate of SARSA learning is adjusted by using a fuzzy control method.

Referring to fig. 1, the method for controlling the vision servo of the unmanned rotorcraft based on the fuzzy SARSA learning in the embodiment includes two aspects of image feature extraction and intelligent control method of the unmanned rotorcraft, and includes the following steps:

the method comprises the following steps of performing edge extraction on an image by using a Canny algorithm, and then obtaining a set of N contour coordinates through filtering and noise reduction operations, wherein the set of N contour coordinates is described by using a Firman chain code, and contour pixels using the Firman chain code are marked as C ═ C { (C {)_i1., N }; the visual feature extraction algorithm for the unmanned rotorcraft provided by the embodiment needs to establish a graph library, and assumes that the established standard profile library has M graphs corresponding to M actual objects, and the profiles of the M graphs are represented as D ═ D_iI 1., N }, first of all, the method is to usePerforming rotation normalization on the Ferman chain code by using first-order difference according to a formula:

the method comprises the steps of obtaining a Lelman chain code after carrying out rotation normalization on contour pixels of a target and respectively calculating a Levenshtein distance with a graph in a standard contour library, wherein the Levenshtein distance is calculated in the mode that an operand is needed for converting a shape A into a shape B, the operation can be only insertion, deletion and modification, and the method can be used for identifying the shape of an object under the condition that the image target loses a certain degree of edge.

And secondly, because the shot picture sometimes has incomplete outline, the system uses an outline compensation algorithm. For the l target, the ferman chain code is obtained through one processing:

the set of standard contour feature points obtained after the set is subjected to rotation normalization is

The contours of the identified objects are represented as X ═ X using the ferman chain code_k1.,. q }, where q is the number of feature points, and x is the number of feature points_kIs the coordinates of the kth feature point. The contours in the standard contour library are

Wherein q is the number of the characteristic points,

is the coordinates of the kth feature point. Establishing X and X^*Relation of X^*.R+L＝X；

Let R be rotation matrix R ═ cos β, sin β, sin β, cos β ]; the formula is obtained through derivation:

wherein R ═ H^*)+H,L＝X-X^*(H^*) + H; thus, for a standard profile D_lAnd compensation profile O_lThe relationship between D and_l·R+L＝O_l. Note O_l＝{P_j|j＝1,...,N_{stan dard}}; taking the central feature point of the target l as a feature point used for visual servo control, wherein the coordinate of the central feature point is obtained by calculating the average of the sum of the coordinates of the compensation contour feature points and is recorded as:

establishing a conversion relation from a three-dimensional space to an image pixel plane;

let P be (X)_p,Y_p,Z_p)^TIs a point in space, P_i＝(x,y,z)^TFor the projection of the point P on the image plane, a formula can be obtained according to the principle of pinhole imaging:

wherein, K_sIs constant, f is focal length;

the image collected by the vision sensor is stored in a computer by using a binary function, and is marked as f (u, v), wherein the (u, v) is a coordinate on an image plane, and the f (u, v) is a pixel value at the point; the relationship of a point (u, v) in the image plane coordinate system to a point (x, y) in the coordinate system of the vision sensor plane is according to the formula:

recording origin O of coordinate plane of vision sensor₁The coordinate in the image plane is (u)_q,v_q)，d_x,d_yFor scaling from image plane to vision sensor planeAnd (4) proportion.

Establishing a rotor unmanned aerial vehicle control model based on visual servo; e represents a characteristic error vector, which can be expressed as e ═ f_c-f_*，f_cRepresenting the current coordinates (u) of the feature points in the image plane_c,v_c)，f_*As coordinates (u) of desired feature points_*,v_*) (ii) a Calculating the change rate of the characteristic error along with time and the angular speed according to the dynamic relation, wherein the functional relation between the linear speeds is as follows:

wherein the content of the first and second substances,

is the image Jacobian matrix, v ═ v_x,v_y,v_z)^TIs the linear velocity vector of the unmanned aerial vehicle, f_c＝(u_c,v_c)^TAs the current position coordinate vector of the feature point, ω ═ ω (ω ═ ω)_x,ω_y,ω_z)^TThe vector of the pitch angle speed, the roll angle speed and the yaw angle speed of the unmanned aerial vehicle. In order to ensure that the characteristic error is exponentially decoupled and reduced,

substituting into equation (5) yields:

wherein the content of the first and second substances,

is the pseudo-inverse of the matrix J, if J is a square matrix, the pseudo-inverse of the matrix is the inverse of the matrix and is denoted as J^-1If the matrix is a matrix with different rows and columns, then

λ_v,ωFor servo gain, takeThe value range is (0, 1).

Considering the dynamics of a quad-rotor drone, the formula (X) is obtained from the kinematics_p,Y_p,Z_p) The relationship between the rate of change with time and the linear and angular velocities is according to the formula:

the formula can be derived in conjunction with formula (7):

if the number of the characteristic points is N, the coordinate set of the characteristic points is { f_iI ═ 1,2.. N }, so the characteristic point error is

The image jacobian matrix is:

the decoupling is performed using servo gains independent of linear velocity and angular velocity,

conversion to the formula:

wherein λ is_vAnd λ_ωThe servo gains for linear velocity and angular velocity respectively,

is composed of

The first three rows of the matrix of the sub-matrix,

is composed of

The fourth row of (a) constitutes a sub-matrix.

Establishing a single-step SARSA learning and adjusting servo gain model;

1. setting a state space:

after the contour of the target is extracted through an image feature extraction algorithm, the target is simplified by adopting the central feature point, and after the absolute value of the error between the current feature point and the target feature point is calculated, the absolute value is summed to obtain a certain range as a state.

2. Setting an action space:

selecting an initial value lambda by analyzing the difference of the selected servo gains as an action^*As an initial value of the servo gain; let the size of the action set be 2 x n^a+1, the action set forms an arithmetic progression with a set tolerance of d^aIf the action set A is { a ═ a }_i|i＝1,2,3...，2n^aThe action set is unfolded to form { -n^ad^a,-(n^a-1)d^a,...,-d^a,0,d^a,2d^a,...,(n^a-1)d^a,n^ad^a}; the servo gain adjustment formula is:

the adjustment formula of the servo gain in the linear velocity and angular velocity directions according to formula (11) is:

wherein the content of the first and second substances,

to adjust the previous linear velocity servo gain

Selection actions

Then the obtained linear velocity servo gain is adjusted,

in the selecting action, the servo gain of the yaw angle before the adjustment

Then the servo gain is adjusted to

3. Setting a reward function:

the reward function is divided into three parts, the expected target is reached, and the target loss and other conditions are tracked;

if the characteristic error of each dimension

δ is a threshold value, then it is assumed that the quad-rotor drone has reached the target location, and the highest reward may be given.

If the feature points are missing compared with the feature points of the target image after the real-time images shot by the quad-rotor unmanned aerial vehicle pass through feature extraction, the unmanned aerial vehicle is considered to have lost the target, and the return value is a negative value.

Other situations are awarded based on how close the quad-rotor drone is to the target.

Thus, using a function-analytic expression to describe the reward function follows the formula:

where row and col represent the length and width of the image plane, respectively.

4. Setting a single-step SARSA learning iterative algorithm:

the iterative formula for the servo gain is:

iterative process of algorithm using Q learning:

1) initialization

And

status of state

Movement of

Q indicates that all are initialized to 0.

2) The current state is

Randomly generating two random numbers respectively denoted as rand₁And rand₂If rand₁If epsilon is less than epsilon, an action is randomly selected

Otherwise, selecting action according to formula (15); similar reason rand₂< epsilon random selection action

Otherwise, selecting action according to formula (15);

3) taking action

And

then, the next state is obtained respectively

And

a prize r.

4) Repeating the step 2 to obtain

Updating Q according to equation (14)_vAnd Q_ω。

5) And returning to the step 2, and repeating the steps for multiple times.

5. Setting learning rules:

the maximum spent time unit in one learning round of single-step SARSA learning is set as 400 time slices, the placing positions of the four rotors in each round are within a feasible range, all targets can be seen randomly after the four rotors take off for 1.0m, and 5000 rounds are trained once. If the four rotors still do not reach the designated position from the initial position after 400 time slices, forcibly returning to the starting point again for the next round; if the characteristic points are lost due to the four-rotor motion, the current turn is ended, and the next turn is restarted; if the distance between the four rotors and the target position is kept within 5 pixels for a certain time in the movement process of the four rotors, the target point is reached, and the next turn is finished; fourthly, updating the servo gain after each round is finished.

And a fuzzy control rule, namely using fuzzy control to perform adaptive adjustment on the learning rate, wherein the basic rule of the adaptive adjustment on the learning rate is that if the intelligent agent adopts the gain after learning to increase the characteristic error, the learning rate is reduced, and otherwise, the learning rate is increased. In the embodiment, the characteristic error change rate is used as the observed quantity, the controlled quantity is the learning rate, and the specific fuzzy control rule steps are as follows:

1. distance sum of characteristic point and its expected position

Is changed by

As observed quantity, and

fuzzification is described as "fast decrease, slow decrease, substantially unchanged, slow increase, fast increase"; the learning rate output result is taken as a control amount of the fuzzy control, i.e., an output amount, and learning rate fuzzification is described as "large, medium, small".

2. Setting the membership function of the input quantity described by the { DR, DS, RU, IS, IR } as

The output quantity is sequentially the membership function described by L, LL, M, LS and S

Setting the membership function of the input quantity described by the { DR, DS, RU, IS, IR } as

Taking the output DR ambiguity membership functions as an example, the expression of each membership function can be obtained as:

3. uniformly selecting n discrete points from input quantity value range

Each point has 5 corresponding membership values relative to the input quantity fuzzy description, so that an input quantity membership discrete matrix can be constructed

Figure DEST_PATH_GDA00018770571600000810

The calculation formula is as follows:

uniformly selecting m discrete points from output theory domain

Figure DEST_PATH_GDA00018770571600000812

Then construct an output quantity membership degree discrete matrix as

The calculation formula is as follows:

4. setting a fuzzy rule as if DR, L; if DR, LL; if RU, M; if IS, LS; if IR, S "; combining the maximum-minimum synthesis operation to deduce a fuzzy inference engine (R ═ R)_ij) ∈ Rⁿ ^{× m},i ＝ 1,...,n,j ＝ 1,...,m} ：

Wherein, Λ represents the selected minimum value and v represents the selected maximum value.

5. For a particular observed quantity

Firstly, 5 membership values of the fuzzy description relative to the input quantity are obtained

Obtained by weighting

Is (p) the fuzzified input quantity ρ ═_j) ∈ R¹ ^{× n}The calculation formula is:

6. will be provided with

The fuzzified input quantity rho and the inference engine R obtain a corresponding output quantity fuzzy vector beta through a maximum-minimum synthesis operation (beta)_j) ∈ R¹ ^{× m}The calculation formula is:

7. and calculating the final learning rate control result according to the fuzzy control quantity beta, and performing defuzzification operation by using a weighted average method, wherein the final output result, namely the learning rate is as follows:

thereby, for a determined observed quantity

The corresponding learning rate α can be obtained by fuzzy control₀(ii) a Through the variable learning rate design based on the fuzzy control, the learning time can be reduced to a certain extent, and the algorithm operation efficiency is improvedAnd (4) rate.

Claims

1. A vision servo control method of a rotor unmanned aerial vehicle based on fuzzy SARSA learning is characterized by comprising the following steps:

step 2, after contour pixels of the image are obtained in the step 1, because the shot image sometimes has a contour incomplete phenomenon, a contour compensation algorithm is used; for the l target, the Firman chain code is processed once to obtain N_lA feature point of the outline

The feature point set of (1) is:

The set of standard contour feature points is denoted as

Set to O_lFor the compensated feature point profile set, the compensation profile O_lWith the standard profile D_lThe conversion relationship between the two is as follows: d_l.R+L＝O_l(ii) a Wherein R and L are respectively a rotation matrix and a conversion matrix; compensating contour O_lWherein the jth element is P_jTotal of N in the set_standardElement, is marked as O_l＝{P_j|j＝1,...,N_standard}; taking the central feature point of the target l as a feature point used for visual servo control, wherein the coordinate of the central feature point is obtained by calculating the average of the sum of the coordinates of the compensation contour feature points and is recorded as:

2) setting an action space; selecting an initial value lambda by analyzing the difference of the selected servo gains as an action^*As an initial value of the servo gain; let the size of the action set be 2 x n^a+1, the action set forms an arithmetic progression with a set tolerance of d^aIf the action set A is { a ═ a }_i|i＝1,2,3...，2n^aAdjusting the servo gain of the linear velocity and the angular velocity respectively;

4) setting a single-step SARSA learning iterative algorithm; setting an iterative formula of servo gain, and setting the iterative formula in two spaces of linear velocity gain and angular velocity gain respectively; an iterative process of a servo gain iterative algorithm is set according to Q learning, the iterative process uses the iterative formula for setting the servo gain, and the iterative updating of the servo gain is completed through the iterative algorithm;

5) setting a learning rule; setting the maximum spent time unit in one learning round of single-step SARSA learning as 400 time slices, wherein the placing positions of the four rotors in each round are in a feasible range, all targets can be seen randomly after the four rotors take off for 1.0m, and 5000 rounds are trained once; if the four rotors still do not reach the designated position from the initial position after 400 time slices, forcibly returning to the starting point again for the next round; if the characteristic points are lost due to the four-rotor motion, the current turn is ended, and the next turn is restarted; if the distance between the four rotors and the target position is kept within 5 pixels for a certain time in the movement process of the four rotors, the target point is reached, and the next turn is finished; fourthly, updating the servo gain after each round is finished;