CN115562330A

CN115562330A - Unmanned aerial vehicle control method for restraining wind disturbance of similar field

Info

Publication number: CN115562330A
Application number: CN202211381428.6A
Authority: CN
Inventors: 李湛; 宋罘林; 于兴虎; 郑晓龙; 高会军
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-01-03
Anticipated expiration: 2042-11-04
Also published as: CN115562330B

Abstract

An unmanned aerial vehicle control method for restraining wind disturbance of a similar field belongs to the technical field of unmanned aerial vehicle disturbance rejection. The invention comprises the following steps: s1, acquiring a wind source image through a camera carried on an autonomous unmanned aerial vehicle, tracking a target interference source in the image by using a tracking network to acquire the position of the target interference source, and acquiring an action compensation amount according to the position of the target interference source by using a compensation network; the tracking network comprises a feature extractor and a convolutional layer; the wind source image tracking network obtains a disturbance source characteristic diagram; the compensation network is realized by adopting a deep reinforcement learning algorithm; s2, adding the control compensation quantity and the control quantity output by the unmanned aerial vehicle controller to serve as the input of the controlled autonomous unmanned aerial vehicle; s3, updating network parameters of the wind disturbance compensation network of the similar field; s4, repeating S1 to S3 until the unmanned aerial vehicle flies away from the wind field area, and solving the problem that the autonomous unmanned aerial vehicle is easy to crash when flying under the urban crowded environment and is disturbed by wind in an artificial field.

Description

Unmanned aerial vehicle control method for restraining wind disturbance of similar field

Technical Field

The invention relates to an unmanned aerial vehicle control method for inhibiting wind disturbance of a similar field, and belongs to the technical field of unmanned aerial vehicle disturbance rejection.

Background

In recent years, in view of the rapid development of urban logistics, unmanned planes capable of rapidly shuttling through buildings crowded in modern cities are important research and development targets for many units. In the face of narrow passable space between buildings, most researchers focus on obstacle avoidance and path planning of unmanned aerial vehicles. However, the crowded urban space and the large number of obstacles are not the only difficulties that urban autonomous drones need to face. In modern cities, artificial wind sources such as ventilation fans and air conditioner outdoor units are widely available. The wind disturbance generated by the artificial wind source is greatly different from natural wind disturbance, has obvious directivity and has strong correlation with the space position relative to the wind source, and the wind disturbance generated by the artificial wind source is similar to field wind disturbance. Will receive the wind field interference of impact nature when autonomic unmanned aerial vehicle passes through their air outlet, this will produce devastating strike to unmanned aerial vehicle between crowded building, has very big probability to lead to unmanned aerial vehicle to bump into upstairs or other obstacles and crash.

According to the existing research, most researchers studying the immunity algorithm of drones use feedback compensation methods that enable drones to suppress completely unknown disturbances. For example, ADRC and LADRC algorithms are widely used in drone anti-jamming control. In addition, the adaptive neural network backstepping controller and various instruction filters are also used for estimating unknown disturbance on line, and become a common method in the field of unmanned aerial vehicle disturbance rejection control. However, most of these methods are directed to completely unknown disturbances, so that only feedback compensation methods can be used.

Model-free reinforcement learning provides a new idea for approaching unknown models by means of interaction with the environment. Reinforcement learning has received increasing attention from researchers in recent years and has been applied to the study of some robots, including the four rotor control problem. The autonomous unmanned aerial vehicle can well inhibit the wind disturbance of the quasi-field by using the model-free reinforcement learning algorithm, but the method has the problem of low data utilization rate, and convergence becomes difficult as the abstraction level of the neural network is deepened.

Disclosure of Invention

Aiming at the problem of how to inhibit the influence of artificial wind interference on the unmanned aerial vehicle, the invention provides an unmanned aerial vehicle control method for inhibiting wind interference of a similar field.

The invention discloses an unmanned aerial vehicle control method for inhibiting wind disturbance of a similar field, which comprises the following steps:

s1, acquiring a wind source image through a camera carried on an autonomous unmanned aerial vehicle, tracking a target interference source in the image by using a tracking network to acquire the position of the target interference source, and acquiring an action compensation amount u' according to the position of the target interference source by using a compensation network;

the tracking network comprises a feature extractor and a convolutional layer;

the wind source images are sequentially input into the feature extractor and the convolution layer to obtain a disturbance source feature map;

the compensation network is realized by adopting a deep reinforcement learning network, the input of the compensation network is a disturbance source characteristic diagram, the output is an action a output by a behavior network, the action a represents the normalization result of the control compensation quantity of the unmanned aerial vehicle in the x, y and z directions, and the normalized action a is mapped into an action compensation quantity u';

s2, adding the control compensation amount u' and the control amount output by the unmanned aerial vehicle controller to obtain u which is used as the input of the controlled autonomous unmanned aerial vehicle;

s3, updating network parameters of the deep reinforcement learning network;

and S4, repeating the steps from S1 to S3 until the unmanned aerial vehicle flies away from the wind field area.

Preferably, in S1, the industrial fan is identified by a camera carried by the autonomous unmanned aerial vehicle, a group of image samples labeled to a boundary frame of the industrial fan constitutes a training set, a feature extractor and a model predictor are trained by the training set, after training is completed, the feature extractor is used as a feature extractor of the tracking network, and a convolutional layer is extracted from the trained model predictor and used as a convolutional layer of the tracking network.

Preferably, the feature extractor is a ResNet-50 backbone network.

Preferably, the compensation network comprises a behavior network, a target behavior network, an evaluation network and a target evaluation network, which are all fully connected layers;

action values output by a behavior network

Feature diagram of disturbance source at time t, W _a Weights representing the behavioral network;

action values output by the target behavior network

A disturbance source feature map W representing the disturbance source at time t +1 after the action at time t is executed _a ' represents the weight of the target behavior network;

evaluating an action State value output by a network

W _c A weight representing an evaluation network;

action state value output by target evaluation network

W _c ' denotes the weight of the target evaluation network.

Preferably, in S3, the gradient of the behavior network is:

m is the minimum batch of each data sample, and the weight of the behavior network is updated:

wherein

Is an intermediate variable, and the superscript represents time,

is a hyperparameter of _a Is the learning rate, ξ is the infinitesimal quantity;

represents to W _a The gradient is calculated, and the gradient is calculated,

the gradient of a, J (W) is determined _a ) An objective function representing a behavioral network;

the weights of the target behavior network are updated as follows:

W _a ′←τW _a +(1-τ)W _a ′

τ is the soft update rate.

Preferably, in S3, the gradient of the evaluation network is:

wherein ,

is shown to W _c Finding the gradient, L (W) _c ) An objective function representing the evaluation network;

r _i represents the reward, γ is the decay coefficient, M is the minimum batch per data sample;

updating the weight of the evaluation network:

wherein

Is an intermediate variable, and the superscript represents time,

is a hyperparameter of _c Is the learning rate, ξ is the infinitesimal quantity;

the weight of the target evaluation network is updated as follows:

W _c ′←τW _c +(1-τ)W _c ′

τ is the soft update rate.

Preferably, the prize r _i Comprises the following steps:

e ₁ 、e ₃ 、e ₅ respectively representing the x, y and z axis errors of the target track point and the current position, and C is a constant.

Preferably, u = [ u ] _x ,u _y ,u _z ] ^T ：

Where m is the mass of the drone, target trajectory r _d ＝[x _d ,y _d ,z _d ] ^T ，

k ₁ ,k ₂ ,k ₃ ,k ₄ ,k ₅ ,k ₆ Is a controller parameter;

error variable

x ₁ ,x ₃ ,x ₅ Is the position component of the x, y and z axes of the unmanned aerial vehicle in the inertial coordinate system, x ₂ ,x ₄ ,x ₆ Respectively representing the speed components of x, y and z axes under an inertial coordinate system of the unmanned aerial vehicle;

g is the acceleration of gravity, F is the resultant force of the four motors, d _x ,d _y ,d _z Is the component of the disturbance on the unmanned plane in three directions; phi, theta, psi is the attitude euler angle of the drone.

The unmanned aerial vehicle tracking system has the beneficial effects that the problem that the unmanned aerial vehicle flying in the urban crowded environment is easy to have the danger of explosion when being disturbed by wind in an artificial field is solved, the position of a target interference source is obtained through a tracking network, action compensation is carried out according to a reinforcement learning compensation network, and the interference of artificial wind on the unmanned aerial vehicle is inhibited. By using the feedforward compensation method, the unmanned aerial vehicle can make a compensation action in advance before the interference comes, and the method has faster response and higher compensation precision compared with the general feedback compensation.

Drawings

FIG. 1 is a schematic diagram of the principles of the present invention;

FIG. 2 is a schematic diagram of an unmanned aerial vehicle suppressing interference of an industrial fan;

FIG. 3 is a schematic diagram of a neural network model;

FIG. 4 is a comparison of the track following effect using the feed forward compensation of the present invention and using PID.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.

According to the unmanned aerial vehicle control method for restraining the wind disturbance of the similar field, the wind source is identified through the camera carried by the autonomous unmanned aerial vehicle, and the target disturbance source is tracked by adopting the tracking network. And inputting the output of the trained tracking network into a wind disturbance compensation network, and adding the compensation action generated by the compensation network and the output of the unmanned aerial vehicle feedback controller to be used as the input of the controlled autonomous unmanned aerial vehicle.

Then, updating network parameters of the deep reinforcement learning network;

and repeating the compensation process and the updating process until the unmanned aerial vehicle flies away from the wind field area. And the normal controller without feed forward compensation is recovered when the wind disturbance area is separated.

The structure of the neural network for realizing tracking in the present embodiment is shown in the left half of fig. 3. The neural network is characterized in that an industrial fan is identified through a camera carried on an autonomous unmanned aerial vehicle, a group of image samples marked on a boundary frame of the industrial fan form a training set, a feature extractor and a model predictor are trained through the training set, the feature extractor uses a ResNet-50 backbone network, then feature mapping is output to the model predictor, after training is completed, the feature extractor serves as a feature extractor of a tracking network, and a convolutional layer is extracted from the trained model predictor and serves as a convolutional layer of the tracking network. And applied to the features extracted from the test frames to compute a target confidence score. And the positioning of disturbance sources is realized by combining an architecture based on an overlap maximization strategy introduced in an ATOM algorithm. The effect of finally tracking the network is that an enclosure frame is generated all the time to mark the position of the industrial fan. And connecting the trained tracking network with a compensation network of the wind disturbance of the similar field. The final output of the target tracking problem is a bounding box, and the field-like wind disturbance compensation problem is control compensation generated according to the position of a tracked disturbance source. Although the outputs of the two tasks are different, the knowledge of processing the two-dimensional image signal is similar. This embodiment will be describedThe weights in the neural network for realizing tracking are divided into two parts, the former part is a convolution layer for extracting the characteristics of disturbance sources, and the latter part is a full connection layer for generating a surrounding frame. For both the tasks of target tracking and disturbance compensation, the characteristics of the target are unchanged, so the convolution layer can be frozen, the dimension of the output layer of the full connection layer is replaced to the dimension of the compensation action, and a new full connection layer is transferred and updated to generate the compensation action. As shown in the right half of fig. 3, W is a neural network weight transmitted from the neural network trained during tracking, and is divided into two parts, where Conv is a convolutional layer for extracting image features, and FC is a fully-connected layer for generating a compensation operation. The compensation network is realized by adopting a deep reinforcement learning algorithm, and in the embodiment, the reinforcement learning state space and the action space are designed to realize the compensation network. The state space is the feature map of the artificial wind source acquired by the convolutional layer. The action space is the control compensation quantity of the unmanned aerial vehicle in the x, y and z directions. W as shown on the right half of FIG. 3 _a Is the weight of the behavioral network, W _a ' weight of target behavior network, W _c Is the weight of the evaluation network, W _c ' is the weight of the target evaluation network, a is the action of the action network output, a ' is the action of the target action network output, and Q ' are the action state value function and the target action state value function, respectively. The reinforcement learning state s is represented by a characteristic diagram of convolutional layer output, the action a is a result of normalization of compensation amounts in three directions of x, y, and z, values in each direction are between-1 and 1, and action analysis maps the normalized action to an actual action compensation amount u'.

The specific embodiments of the present invention are described with reference to a practical example:

as shown in fig. 2, the autonomous drone flies under the wind disturbance of the industrial fan. In the figure C _i Is a horizontal inertial coordinate system, x, y, z are C _i Orthogonal coordinate basis satisfying right hand rule; c _b Is a fixed coordinate system of the unmanned plane body, x _b ,y _b ,z _b Is C _b And orthogonal coordinate bases meeting the right-hand rule, wherein phi, theta and psi are attitude Euler angles of the unmanned aerial vehicle. The resolution of the onboard camera is 640 x 480. Unmanned aerial vehicle model under disturbanceComprises the following steps:

wherein m is the unmanned aerial vehicle mass, g is the gravitational acceleration, F is the resultant force of four motors, d _x ,d _y ,d _z Are the components of the disturbance experienced by the drone in three directions.

The basic controller of the drone uses reverse-step control, and the control block is as shown in fig. 1, with u '= [ u' _x ,u′ _y ,u′ _z ] ^T Is a compensation amount, u = [ u = _x ,u _y ,u _z ] ^T Is the final control output, x = [ x ] ₁ ,x ₂ ,x ₃ ,x ₄ ,x ₅ ,x ₆ ] ^T Is the state of the drone, specifically the position and position derivative, r, of the drone under the inertial system _d ＝[x _d ,y _d ,z _d ] ^T Is the target trajectory. The following error variables are defined:

wherein

k ₁ ,k ₂ ,k ₃ ,k ₄ ,k ₅ ,k ₆ Is a controller parameter. Using a back-stepping controller, the formula is as follows:

acquiring a wind source image through a camera carried on the autonomous unmanned aerial vehicle, tracking a target interference source in the image by using a tracking network to obtain the position of the target interference source, and obtaining an action compensation amount u' according to the position of the target interference source by using a compensation network;

adding the control compensation amount u' and the control amount output by the unmanned aerial vehicle controller to obtain u which is used as the input of the controlled autonomous unmanned aerial vehicle;

the compensation network in fig. 3 includes a behavior network, a target behavior network, an evaluation network, and a target evaluation network, all of which are fully connected layers;

action values output by a behavior network

action values output by the target behavior network

Represents a disturbance source feature map, W ', at time t +1 after the execution of the operation at time t' _a Weights representing the target behavior network;

evaluating an action State value output by a network

W _c A weight representing an evaluation network;

action state value output by target evaluation network

W′ _c Representing the weight of the target evaluation network.

The behavioral network gradient is designed as follows:

in the formula ,

is shown to W _a The gradient is calculated and the gradient is calculated,

m is the minimum batch per data sample;

the weight update of the behavioral network is as follows:

wherein

Is an intermediate variable, and the superscript represents time,

evaluation of network gradients the design was as follows:

wherein ,

represents to W _c Finding the gradient, L (W) _c ) An objective function representing the evaluation network;

is the state (new feature map) that results after the action is performed. The weight update of the evaluation network is the same as the behavior network, as follows:

wherein

Is an intermediate variable, and the superscript represents time,

the weights of the target behavior network and the weights of the target evaluation network are updated as follows:

where τ is the soft update rate, the greater τ the faster the update rate.

A reward function is designed in the implementation mode, the unmanned aerial vehicle is not influenced by an industrial fan in the implementation mode, and therefore the smaller the position following error is, the higher the score is. The reward function is designed as a negative correlation function of the trajectory following error:

e ₁ 、e ₃ 、e ₅ respectively representing the x, y and z axis errors of the target locus point and the current position, and C is a constant. The purpose of this is to keep the value of the reward function within a relatively reasonable range.

A simulation scene is set up, the track following effect of feedforward compensation by using a compensation network and without using the compensation network in the scene is shown in figure 4, and it can be clearly seen that the tracking accuracy by using the feedforward compensation is far better by using a PID algorithm.

And (4) conclusion: the example shows that the method can effectively realize feedforward compensation of the similar-field wind disturbance generated by the visual artificial wind source, improve the track tracking precision and greatly increase the flight safety of the autonomous unmanned aerial vehicle among modern urban buildings.

Although the invention herein has been described with reference to particular embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It should be understood that features described in different dependent claims and herein may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with individual embodiments may be used in other described embodiments.

Claims

1. An unmanned aerial vehicle control method for suppressing wind disturbance of a similar field, the method comprising:

the tracking network comprises a feature extractor and a convolutional layer;

the compensation network is realized by adopting a deep reinforcement learning network, the input of the compensation network is a disturbance source characteristic diagram, the output of the compensation network is an action a output by a behavior network, the action a represents the normalization result of the control compensation quantity of the unmanned aerial vehicle in the x, y and z directions, and the normalized action a is mapped into an action compensation quantity u';

s3, updating network parameters of the deep reinforcement learning network;

2. The method according to claim 1, wherein in S1, the industrial wind turbine is identified by a camera carried by the autonomous unmanned aerial vehicle, a group of image samples labeled to the boundary frame of the industrial wind turbine constitutes a training set, a feature extractor and a model predictor are trained by the training set, after the training is completed, the feature extractor is used as a feature extractor of the tracking network, and a convolutional layer is extracted from the trained model predictor and used as a convolutional layer of the tracking network.

3. The drone controlling method for suppressing wind disturbance like farm according to claim 1, wherein the feature extractor is a ResNet-50 backbone network.

4. The unmanned aerial vehicle control method of suppressing field-like wind disturbances according to claim 1 or 2, wherein the compensation network comprises a behavior network, a target behavior network, an evaluation network and a target evaluation network, which are all fully connected layers;

action values output by a behavior network

action values output by the target behavior network

A disturbance source feature map W representing the disturbance source at time t +1 after the action at time t is executed _a ' weight representing target behavior network;

evaluating an action State value output by a network

W _c A weight representing an evaluation network;

action state value output by target evaluation network

W _c ' denotes the weight of the target evaluation network.

5. The unmanned aerial vehicle control method for suppressing wind disturbance like the farm according to claim 4, wherein in S3, the gradient of the behavior network is as follows:

wherein m,v,

is an intermediate variable, and the superscript represents time,

is a hyperparameter, eta _a Is the learning rate, ξ is the infinitesimal quantity;

is shown to W _a The gradient is calculated, and the gradient is calculated,

m is the minimum batch per data sample;

the weights of the target behavior network are updated as follows:

W′ _a ←τW _a +(1-τ)W′ _a

τ is the soft update rate.

6. The unmanned aerial vehicle control method for suppressing wind disturbance like the farm according to claim 4, wherein in S3, the gradient of the evaluation network is:

wherein ,

updating the weight of the evaluation network:

wherein m,v,

is an intermediate variable, and the superscript represents time,

the weight of the target evaluation network is updated as follows:

W′ _c ←τW _c +(1-τ)W′ _c

τ is the soft update rate.

7. The method of controlling a drone for suppressing farm-like wind disturbances according to claim 6, wherein a reward r is _i Comprises the following steps:

e ₁ 、e ₃ 、e ₅ respectively representing the x, y and z axis errors of the target locus point and the current position, and C is a constant.

8. The method of claim 1, wherein u = [ u ] u unmanned aerial vehicle control method for suppressing wind disturbances in the quasi-field _x ,u _y ,u _z ] ^T ：

k ₁ ,k ₂ ,k ₃ ,k ₄ ,k ₅ ,k ₆ Is a controller parameter;

error variable

g is the acceleration of gravity, F is the resultant force of the four motors, d _x ,d _y ,d _z Components of disturbance of the unmanned aerial vehicle in three directions; phi, theta, psi is the attitude euler angle of the drone.

9. A computer-readable storage device storing a computer program, wherein the computer program when executed implements a drone controlling method for suppressing farm-like wind disturbances according to any one of claims 1 to 8.

10. An unmanned aerial vehicle control apparatus for suppressing farm-like wind disturbance, comprising a storage device, a processor, and a computer program stored in the storage device and operable on the processor, wherein the processor executes the computer program to implement the unmanned aerial vehicle control method for suppressing farm-like wind disturbance according to any one of claims 1 to 8.