CN112506210A - Unmanned aerial vehicle control method for autonomous target tracking - Google Patents

Unmanned aerial vehicle control method for autonomous target tracking Download PDF

Info

Publication number
CN112506210A
CN112506210A CN202011402067.XA CN202011402067A CN112506210A CN 112506210 A CN112506210 A CN 112506210A CN 202011402067 A CN202011402067 A CN 202011402067A CN 112506210 A CN112506210 A CN 112506210A
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
reward
robot
environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011402067.XA
Other languages
Chinese (zh)
Other versions
CN112506210B (en
Inventor
徐乐玏
孙长银
陆科林
王腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202011402067.XA priority Critical patent/CN112506210B/en
Publication of CN112506210A publication Critical patent/CN112506210A/en
Application granted granted Critical
Publication of CN112506210B publication Critical patent/CN112506210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0808Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Abstract

The invention relates to an unmanned aerial vehicle control method for autonomous target tracking, which is characterized in that four-dimensional actions are output through a neural network and then converted into actions of a low-level motor through a PID (proportion integration differentiation) controller, so that an unmanned aerial vehicle can fly more stably, and the PID controller can be changed into other more optimized control methods through later improvement. The layered control system can easily transfer the training strategy in the simulation environment to the real environment to operate. Has good generalization ability. The method comprises the steps of firstly carrying out CNN pre-training on collected images in a simulated environment to obtain the relative distance between the unmanned aerial vehicle and a target object, wherein the relative distance comprises three dimensions of x, y and h. And then, taking the attitude of the unmanned aerial vehicle into consideration, selecting a strategy, outputting four-dimensional actions, outputting the four-dimensional actions to a low-level motor of the unmanned aerial vehicle through PID (proportion integration differentiation), obtaining Reward through a DDPG (distributed data group) reinforcement learning method, updating the strategy, and learning and training.

Description

Unmanned aerial vehicle control method for autonomous target tracking
Technical Field
The invention relates to an unmanned aerial vehicle control method for autonomous target tracking, and belongs to the technical field of unmanned aerial vehicle tracking.
Background
The technology of unmanned aerial vehicle tracking moving objects and people is increasingly needed by military affairs, monitoring, inspection and the like. This requires that the drone be implemented by visual perception techniques and control methods. However, the unmanned aerial vehicle is a very fragile system, and the strategy needs to be updated and optimized through model-free reinforcement learning, and meanwhile, the stability of the controller is guaranteed.
Disclosure of Invention
The invention aims to provide a control method of an unmanned aerial vehicle autonomous tracking motion robot, which combines the self improvement performance of a model-free reinforcement learning method with the stability of a conventional PID controller, outputs four-dimensional actions through a neural network, and converts the actions into the actions of a low-level motor through the PID controller, so that the unmanned aerial vehicle can fly more stably.
In order to achieve the purpose, the invention adopts the following technical scheme: a drone control method for autonomous target tracking, the control method comprising the steps of:
s1: building a simulation platform environment, adding models of an unmanned aerial vehicle and a robot by modifying a launch file, and setting initial positions of the unmanned aerial vehicle and the robot in the simulation environment;
s2: the robot and the unmanned aerial vehicle can respectively move through instructions or codes, the laser radar of the robot can be used for realizing the simple obstacle avoidance function of the robot, the movement of the unmanned aerial vehicle and the robot is controlled by setting set command to the robot and the unmanned aerial vehicle, setting linear speeds in the x, y and z directions and angular speeds rotating around the x, y and z axes, and the images of the environment and the robot on the ground are acquired by acquiring the images through a camera of a bottom of the unmanned aerial vehicle;
s3: gather the image, the perception layer in the simulated environment is trained in advance, let unmanned aerial vehicle on the fixed height that is higher than the environment, the pixel value of the picture that fixed bottom camera was gathered is 256, under the fixed circumstances of unmanned aerial vehicle then, the relative size of the environment that the bottom camera can see, regional area is fixed promptly, let the robot when unmanned aerial vehicle at same high different x, y's sight range internal random motion, gather 10000 images, every has the robot in these 10000 images, carry out the training of CNN neural network as shown in figure 1.
S4: the real-time positions of the robot and the unmanned aerial vehicle in the simulated environment are obtained through topic, subscripter, get position, and the like, some operations are carried out to obtain the relative positions of the robot and the unmanned aerial vehicle, and the mode of storing picture names is modified to take the relative positions of the robot and the unmanned aerial vehicle as the names of the images acquired by the unmanned aerial vehicle, so that the pictures and the labels can be in one-to-one correspondence for the subsequent image processing. Meanwhile, when the unmanned aerial vehicle is controlled to acquire images in an inorganic mode, the unmanned aerial vehicle is controlled by the PID, so that the unmanned aerial vehicle can be fixed in a small area range and is not obvious in shaking.
S5: images and labels are known in a simulation environment, supervised pre-training is carried out, so that the relative positions of the images can be predicted by inputting the images in a real environment, the training x and y are about 6m, the loss obtained by 2000 epicodes is 0.03m, and the average distance between the relative positions predicted in a test set and the actual labels is about 0.1 m.
S6: quad-rotor aircraft have complex nonlinear aerodynamics that are difficult to learn by model-less RL methods. Obviously, this challenge can be solved by incorporating a conventional PID controller. Figure 2 shows the proposed hierarchical control system. At each time step t, the policy network generates a four-dimensional high-level reference action u given the observed imaget
S7: by means of the reinforcement learning method of the DDPG, a reward function is shaped, and the reward function simultaneously considers the four-rotor state and the target related state.
S8: and transferring to a real environment. The invention relates to an unmanned aerial vehicle control method for autonomous target tracking. The technology of unmanned aerial vehicle tracking moving objects and people is increasingly needed by military affairs, monitoring, inspection and the like. This requires that the drone be implemented by visual perception techniques and control methods. However, the unmanned aerial vehicle is a very fragile system, and the strategy needs to be updated and optimized through model-free reinforcement learning, and meanwhile, the stability of the controller is guaranteed.
Wherein, the environment in step S1 includes obstacles such as football court, football, cleaning cart, garbage bin, ladder rack, color fence, scooter, warning stake, desk, etc.
Wherein the step S5: the last convolution layer is then merged with a spatial softmax layer to integrate the values for each pixelThe feature map is converted into spatial coordinates in image space. The spatial softmax map layer consists of the spatial softmax function applied to the last convolved feature map and a fixed sparse fully connected layer that computes the expected image position for each feature map. The spatial feature points are then regressed into a three-dimensional vector, i.e., t ═ xt,yt,ht) It represents the 2D position and height of the object on the image plane, where my height is fixed through another fully connected layer. To achieve stable flight, a four-rotor configuration s must be usedq,t=(zt,vt,qt,wt) Including altitude, linear velocity, direction and angular velocity, as additional inputs to the neural network. After the perception layer, the target-related state so,tWith four rotor states sq,tMerged together and then the layer that is fully contiguous with the action.
Wherein, in the step S6,
Figure BDA0002817290690000021
utrepresenting a four-dimensional high-level reference motion, divided into four motions, pxCorresponding to a relative positional shift, p, in the x-directionyCorresponding to a relative positional shift, p, in the y directionzCorresponding to the relative positional shift in the z direction,
Figure BDA0002817290690000031
corresponding to a relative angular offset about the yaw axis.
In step S7, the environment is observed sufficiently, and the state o at time t of the environment is observedtState stThen Q function Qπ(st,at) Is shown in state stExpected benefit after taking action and then following policy π:
Qπ(st,at)=Eπ[Rt|st,at]
wherein Qπ(st,at) Is a Q function, RtIs in a state stAt a moment of timetReward in action, and then performing the desired calculation to obtain the Q value, considering a Q function approximator parameterized by Q, which can be optimized by minimizing the loss, L (theta)Q) Is the loss of the Q function, to minimize, Q(s)t,atQ) Is the Q value of the critic network:
L(θQ)=Eπ[(Q(st,atQ)-yt)2]
wherein
yt=r(st,at)+γmaxaQ(st+1,a|θQ')
Is the target Q value, θ, estimated by the target Q networkQWeight, θ, representing the critic's networkQ'A weight representing a critic's target network, γ being a discount factor, according to the loss L (θ)Q) The resulting gradient is calculated to update the weight of the critic's network.
For the continuous action problem, Q-learning becomes difficult, the continuous domain is usually solved by AC, DDPG is a combination of AC and DQN, making more efficient learning on continuous action, deterministic makes the function deterministically specify that the current policy maps each state to a unique action, and the operator is updated by performing gradient ascent based on the following policy gradient.
To this end, the reward function is designed as a combination of a goal-oriented goal reward and an auxiliary quad-rotor reward, where r represents the total reward and r represents the total rewardg(sg) Representing goal-oriented reward of goals, rq(sq) Representing an auxiliary four-rotor reward:
r=rg(sg)+rq(sq)
for simplicity of notation, the subscripted time step t is omitted. The target prize is expressed as the sum of two components:
rg(sg)=rg(x,y)+rg(h)
corresponding to position reward and scale reward respectivelyLet spartRepresents a state sgWhich corresponds to (x, y) or h, and then the corresponding reward takes the form:
Figure BDA0002817290690000041
wherein
Δspart=||spart-s* part||2
Representing the "2-norm", τ, between the current state and the desired target state1,τ2Representing different thresholds, the desired state is that the drone is kept one h above the robot at all times, the relative distance of x and y is 0, and the reward is calculated comparing the desired state with the current state. Also, in the case of a slight use of symbols, the auxiliary prize is expressed as follows:
rq(sq)=rq(z)+rq(q1,q2,q3,q4);
they correspond to the height and direction (in quaternion) of the quadrotors, respectively, where the height remains unchanged, so considering the quaternion of the drone, i.e. the RPY parameter, unlike the target reward, which is used to impose other constraints on the attitude, only the penalty terms are introduced. By using the same symbols as in the target prize, the auxiliary prize takes the form:
Figure BDA0002817290690000042
wherein, tau1Indicating the same threshold as the above formula, and c represents a penalty weight. Tau is1=0.05,τ20.2 and 0.5. It is desirable that the attitude of the drone does not change much from the previous attitude. In step S8, the positions and state information of the unmanned aerial vehicle and the robot in the whole map cannot be known in the real scene, so training, updating parameters, and determining need to be performed in the simulation environmentThe strategy is finally applied to the real world, and the generalization capability is good.
Compared with the prior art, the invention has the following advantages: the invention combines the self-improvement performance of the model-free reinforcement learning method with the stability of the conventional PID controller, outputs four-dimensional actions through the neural network, and then converts the actions into the actions of the low-level motor through the PID controller, so that the unmanned aerial vehicle can fly more stably, and the later improvement can change the PID controller into other more optimized control methods. The layered control system can easily transfer the training strategy in the simulation environment to the real environment to operate. Has good generalization ability. The method comprises the steps of firstly carrying out CNN pre-training on collected images in a simulated environment to obtain the relative distance between the unmanned aerial vehicle and a target object, wherein the relative distance comprises three dimensions of x, y and h. And then, taking the attitude of the unmanned aerial vehicle into consideration, selecting a strategy, outputting four-dimensional actions, outputting the four-dimensional actions to a low-level motor of the unmanned aerial vehicle through PID (proportion integration differentiation), obtaining Reward through a DDPG (distributed data group) reinforcement learning method, updating the strategy, and learning and training.
Drawings
FIG. 1 is a block diagram of a policy network architecture of the present invention, with a sensing layer estimating target states and a control layer learning control behavior;
FIG. 2 is a hierarchical control system of the present invention incorporating a policy network and a PID controller;
Detailed Description
For the purposes of promoting an understanding and appreciation of the invention, reference will now be made in detail to the embodiments illustrated in the drawings. Example 1: referring to fig. 1-2, a drone control method for autonomous target tracking, the method comprising the steps of:
s1: building a simulation platform environment, adding models of an unmanned aerial vehicle and a robot by modifying a launch file, and setting initial positions of the unmanned aerial vehicle and the robot in the simulation environment;
s2: the robot and the unmanned aerial vehicle can respectively move through instructions or codes. The laser radar of the robot can be used for realizing the simple obstacle avoidance function of the robot, the movement of the unmanned aerial vehicle and the robot is controlled by performing set command on the robot and the unmanned aerial vehicle, setting linear speeds in the x, y and z directions and angular speeds rotating around the x, y and z axes, and images are acquired by using a camera of a bottom of the unmanned aerial vehicle so as to acquire images of the environment and the robot on the ground;
s3: acquiring images, pre-training a perception layer in a simulation environment, enabling an unmanned aerial vehicle to be at a fixed height higher than the environment, fixing the pixel value of a picture acquired by a bottom camera to be 256 × 256, and under the condition that the unmanned aerial vehicle is fixed, enabling the bottom camera to see the relative size of the environment, namely the area of an area to be fixed, enabling a robot to randomly move in the sight ranges of x and y at the same height when the unmanned aerial vehicle is at the same height, and acquiring 10000 images, wherein each 10000 images are provided with the robot, and performing the training of a CNN (neural network) as shown in the attached figure 1;
s4: the real-time positions of the robot and the unmanned aerial vehicle in the simulated environment are obtained through topic, subscripter, get position, and the like, some operations are carried out to obtain the relative positions of the robot and the unmanned aerial vehicle, and the mode of storing picture names is modified to take the relative positions of the robot and the unmanned aerial vehicle as the names of the images acquired by the unmanned aerial vehicle, so that the pictures and the labels can be in one-to-one correspondence for the subsequent image processing. Meanwhile, when the unmanned aerial vehicle is controlled to acquire images in an inorganic mode, the unmanned aerial vehicle is controlled by the PID, so that the unmanned aerial vehicle can be fixed in a small area range and is not obvious in shaking.
S5: images and labels are known in a simulation environment, supervised pre-training is carried out, so that the relative positions of the images can be predicted by inputting the images in a real environment, the training x and y are about 6m, the loss obtained by 2000 epicodes is 0.03m, and the average distance between the relative positions predicted in a test set and the actual labels is about 0.1 m.
S6: quad-rotor aircraft have complex nonlinear aerodynamics that are difficult to learn by model-less RL methods. Obviously, this challenge can be solved by incorporating a conventional PID controller. Figure 2 shows the proposed hierarchical control system. At each time step t, the policy network will generate given the observed imageInto a four-dimensional high-level reference motion ut
S7: by means of the reinforcement learning method of the DDPG, a reward function is shaped, and the reward function simultaneously considers the four-rotor state and the target related state.
S8: and transferring to a real environment.
Wherein, the environment in step S1 includes obstacles such as football court, football, cleaning cart, garbage bin, ladder rack, color fence, scooter, warning stake, desk, etc.
Wherein the step S5: the last convolutional layer is then merged with a spatial softmax layer to convert the feature map for each pixel direction to spatial coordinates in image space. The spatial softmax map layer consists of the spatial softmax function applied to the last convolved feature map and a fixed sparse fully connected layer that computes the expected image position for each feature map. The spatial feature points are then regressed into a three-dimensional vector, i.e., t ═ xt,yt,ht) It represents the 2D position and height of the object on the image plane, where my height is fixed through another fully connected layer. To achieve stable flight, a four-rotor configuration s must be usedq,t=(zt,vt,qt,wt) Including altitude, linear velocity, direction and angular velocity, as additional inputs to the neural network. After the perception layer, the target-related state so,tWith four rotor states sq,tMerged together and then the layer that is fully contiguous with the action.
Wherein, in the step S6,
Figure BDA0002817290690000061
utrepresenting a four-dimensional high-level reference motion, divided into four motions, pxCorresponding to a relative positional shift, p, in the x-directionyCorresponding to a relative positional shift, p, in the y directionzCorresponding to the relative positional shift in the z direction,
Figure BDA0002817290690000062
corresponding to a relative angular offset about the yaw axis.
In step S7, the environment is observed sufficiently, and the state o at time t of the environment is observedtState stThen Q function Qπ(st,at) Is shown in state stExpected benefit after taking action and then following policy π:
Qπ(st,at)=Eπ[Rt|st,at]
wherein Qπ(st,at) Is a Q function, RtIs in a state stAt a moment of timetReward in action, and then performing the desired calculation to obtain the Q value, considering a Q function approximator parameterized by Q, which can be optimized by minimizing the loss, L (theta)Q) Is the loss of the Q function, to minimize, Q(s)t,atQ) Is the Q value of the critic network:
L(θQ)=Eπ[(Q(st,atQ)-yt)2]
wherein
yt=r(st,at)+γmaxaQ(st+1,a|θQ')
Is the target Q value, θ, estimated by the target Q networkQWeight, θ, representing the critic's networkQ'A weight representing a critic's target network, γ being a discount factor, according to the loss L (θ)Q) The resulting gradient is calculated to update the weight of the critic's network.
For the continuous action problem, Q-learning becomes difficult, the continuous domain is usually solved by AC, DDPG is a combination of AC and DQN, making more efficient learning on continuous action, deterministic makes the function deterministically specify that the current policy maps each state to a unique action, and the operator is updated by performing gradient ascent based on the following policy gradient.
To this end, the reward function is designed to be eye-orientedA combination of a target reward and an auxiliary quad-rotor reward, where r represents the total reward and r represents the total rewardg(sg) Representing goal-oriented reward of goals, rq(sq) Representing an auxiliary four-rotor reward:
r=rg(sg)+rq(sq)
for simplicity of notation, the subscripted time step t is omitted. The target prize is expressed as the sum of two components:
rg(sg)=rg(x,y)+rg(h)
corresponding to position reward and scale reward, respectively, spartRepresents a state sgWhich corresponds to (x, y) or h, and then the corresponding reward takes the form:
Figure BDA0002817290690000071
wherein
Δspart=||spart-s* part||2
Representing the "2-norm", τ, between the current state and the desired target state1,τ2Representing different thresholds, the desired state is that the drone is kept one h above the robot at all times, the relative distance of x and y is 0, and the reward is calculated comparing the desired state with the current state. Also, in the case of a slight use of symbols, the auxiliary prize is expressed as follows:
rq(sq)=rq(z)+rq(q1,q2,q3,q4);
they correspond to the height and direction (in quaternion) of the quadrotors, respectively, where the height remains unchanged, so considering the quaternion of the drone, i.e. the RPY parameter, unlike the target reward, which is used to impose other constraints on the attitude, only the penalty terms are introduced. By using the same symbols as in the target prize, the auxiliary prize takes the form:
Figure BDA0002817290690000081
wherein, tau1Indicating the same threshold as the above formula, and c represents a penalty weight. Tau is1=0.05,τ20.2 and 0.5. It is desirable that the attitude of the drone does not change much from the previous attitude. In step S8, the positions and state information of the unmanned aerial vehicle and the robot in the whole map cannot be known in the real scene, so training, updating parameters, determining strategies, and finally applying to the real world are required in the simulation environment, and the method has good generalization capability.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent modifications or substitutions based on the above-mentioned technical solutions are within the scope of the present invention.

Claims (6)

1. An unmanned aerial vehicle control method for autonomous target tracking is characterized in that: the method comprises the following steps:
s1: building a simulation platform environment, adding models of an unmanned aerial vehicle and a robot by modifying a launch file, and setting initial positions of the unmanned aerial vehicle and the robot in the simulation environment;
s2: the robot and the unmanned aerial vehicle can respectively move through instructions or codes, the laser radar of the robot can be used for realizing the simple obstacle avoidance function of the robot, the movement of the unmanned aerial vehicle and the robot is controlled by setting set command to the robot and the unmanned aerial vehicle, setting linear speeds in the x, y and z directions and angular speeds rotating around the x, y and z axes, and the images of the environment and the robot on the ground are acquired by acquiring the images through a camera of a bottom of the unmanned aerial vehicle;
s3: acquiring images, pre-training a perception layer in a simulation environment, enabling an unmanned aerial vehicle to be at a fixed height higher than the environment, fixing the pixel value of a picture acquired by a bottom camera to be 256 × 256, and under the condition that the unmanned aerial vehicle is fixed, enabling the bottom camera to see the relative size of the environment, namely the area of an area to be fixed, enabling a robot to randomly move in the sight ranges of x and y at the same height when the unmanned aerial vehicle is at the same height, and acquiring 10000 images, wherein each 10000 image is provided with the robot to train a CNN (neural network);
s4: the real-time positions of the robot and the unmanned aerial vehicle in the simulated environment are obtained through topic, subscripter, get position, and the like, the operation is carried out to obtain the relative positions of the robot and the unmanned aerial vehicle, the mode of storing picture names is modified to take the relative positions of the robot and the unmanned aerial vehicle as the names of the images acquired by the unmanned aerial vehicle,
s5: the method comprises the steps of knowing images and labels in a simulation environment, carrying out supervised pre-training, enabling input images in a real environment to predict the relative positions of the known images and the labels, wherein x and y of training are about 6m, loss obtained by 2000 epsilon is 0.03m, and the average distance between the relative positions predicted in a test set and the actual labels is about 0.1 m;
s6: the hierarchical control system generates a four-dimensional high-level reference action u in each time step t given the observed imaget
S7: building a reward function by a DDPG (distributed data group) -based reinforcement learning method, wherein the reward function simultaneously considers the four-rotor state and the target related state;
s8: and transferring to a real environment.
2. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: the environment in step S1 includes obstacles such as football stadium, football, cleaning cart, garbage bin, ladder rack, colored enclosure, scooter, warning stake, desk, etc.
3. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: the step S5: the last convolution layer is then merged with a spatial softmax layer to convert the feature map for each pixel direction to spatial coordinates in image spaceConsisting of a spatial softmax function applied to the last convolved feature map and a fixed sparse fully connected layer that computes the expected image position for each feature map and then regresses the spatial feature points into a three-dimensional vector, i.e., t ═ x (x)t,yt,ht) Which represents the 2D position and height of the object in the image plane, through another fully connected layer, using a quad-rotor configuration sq,t=(zt,vt,qt,wt) Including altitude, linear velocity, direction and angular velocity, as additional inputs to the neural network, after the perception layer, the target-related state so,tWith four rotor states sq,tMerged together and then the layer that is fully contiguous with the action.
4. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: in the step S6, in the above step,
Figure FDA0002817290680000021
utrepresenting a four-dimensional high-level reference motion, divided into four motions, pxCorresponding to a relative positional shift, p, in the x-directionyCorresponding to a relative positional shift, p, in the y directionzCorresponding to the relative positional shift in the z direction,
Figure FDA0002817290680000022
corresponding to a relative angular offset about the yaw axis.
5. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: in step S7, it is assumed that the environment is sufficiently observed and the state o at time t of the environment is observedtState stThen Q function Qπ(st,at) Is shown in state stExpected benefit after taking action and then following policy π:
Qπ(st,at)=Eπ[Rt|st,at]
wherein Qπ(st,at) Is a Q function, RtIs in a state stAt a moment of timetReward in action, and then performing the desired calculation to obtain the Q value, considering a Q function approximator parameterized by Q, which can be optimized by minimizing the loss, L (theta)Q) Is the loss of the Q function, to minimize, Q(s)t,atQ) Is the Q value of the critic network:
L(θQ)=Eπ[(Q(st,atQ)-yt)2]
wherein
yt=r(st,at)+γmaxaQ(st+1,a|θQ')
Is the target Q value, θ, estimated by the target Q networkQWeight, θ, representing the critic's networkQ′A weight representing a critic's target network, γ being a discount factor, according to the loss L (θ)Q) The calculated gradient is used for updating the weight of the critic network;
for the continuous action problem, Q-learning becomes difficult, the continuous domain is usually solved by AC, DDPG is a combination of AC and DQN, making more efficient learning on continuous action, deterministic makes the function deterministically specify that the current policy maps each state to a unique action, and the operator is updated by performing gradient ascent based on the following policy gradient.
To this end, the reward function is designed as a combination of a goal-oriented goal reward and an auxiliary quad-rotor reward, where r represents the total reward and r represents the total rewardg(sg) Representing goal-oriented reward of goals, rq(sq) Representing an auxiliary four-rotor reward:
r=rg(sg)+rq(sq)
for simplicity of notation, the subscripted time step t is omitted. The target prize is expressed as the sum of two components:
rg(sg)=rg(x,y)+rg(h)
corresponding to position reward and scale reward, respectively, spartRepresents a state sgWhich corresponds to (x, y) or h, and then the corresponding reward takes the form:
Figure FDA0002817290680000031
wherein
Δspart=||spart-s* part||2
Representing the "2-norm", τ, between the current state and the desired target state1,τ2Representing different thresholds, the desired state is that the drone is kept one h above the robot at all times, the relative distance of x and y is 0, and the reward is calculated comparing the desired state with the current state. Also, in the case of a slight use of symbols, the auxiliary prize is expressed as follows:
rq(sq)=rq(z)+rq(q1,q2,q3,q4)
they correspond to the height and direction (in quaternion) of the quadrotors, respectively, where the height remains unchanged, so considering the quaternion of the drone, i.e. the RPY parameter, unlike the target reward, which is used to impose other constraints on the attitude, only the penalty terms are introduced. By using the same symbols as in the target prize, the auxiliary prize takes the form:
Figure FDA0002817290680000041
wherein, tau1Indicating the same threshold as the above equation, c represents a penalty weight, τ1=0.05,τ20.2 and 0.5. It is desirable that the attitude of the drone does not change much from the previous attitude.
6. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: in step S8, training needs to be performed in a simulation environment, parameters are updated, strategies are determined, and the method is finally applied to the real world, and has a good generalization capability.
CN202011402067.XA 2020-12-04 2020-12-04 Unmanned aerial vehicle control method for autonomous target tracking Active CN112506210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011402067.XA CN112506210B (en) 2020-12-04 2020-12-04 Unmanned aerial vehicle control method for autonomous target tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011402067.XA CN112506210B (en) 2020-12-04 2020-12-04 Unmanned aerial vehicle control method for autonomous target tracking

Publications (2)

Publication Number Publication Date
CN112506210A true CN112506210A (en) 2021-03-16
CN112506210B CN112506210B (en) 2022-12-27

Family

ID=74969812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011402067.XA Active CN112506210B (en) 2020-12-04 2020-12-04 Unmanned aerial vehicle control method for autonomous target tracking

Country Status (1)

Country Link
CN (1) CN112506210B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114721412A (en) * 2022-03-16 2022-07-08 北京理工大学 Unmanned aerial vehicle trajectory tracking obstacle avoidance method based on model predictive control
CN115098941A (en) * 2022-05-31 2022-09-23 复旦大学 Unmanned aerial vehicle digital twin control method and platform for agile deployment of intelligent algorithm
CN116203992A (en) * 2023-04-28 2023-06-02 北京航空航天大学 Tailstock type unmanned aerial vehicle high-dynamic target tracking method for multi-mode flight control

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107479368A (en) * 2017-06-30 2017-12-15 北京百度网讯科技有限公司 A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN109164821A (en) * 2018-09-26 2019-01-08 中科物栖(北京)科技有限责任公司 A kind of UAV Attitude training method and device
US20190325584A1 (en) * 2018-04-18 2019-10-24 Tg-17, Llc Systems and Methods for Real-Time Adjustment of Neural Networks for Autonomous Tracking and Localization of Moving Subject
CN110879595A (en) * 2019-11-29 2020-03-13 江苏徐工工程机械研究院有限公司 Unmanned mine card tracking control system and method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107479368A (en) * 2017-06-30 2017-12-15 北京百度网讯科技有限公司 A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence
US20190325584A1 (en) * 2018-04-18 2019-10-24 Tg-17, Llc Systems and Methods for Real-Time Adjustment of Neural Networks for Autonomous Tracking and Localization of Moving Subject
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN109164821A (en) * 2018-09-26 2019-01-08 中科物栖(北京)科技有限责任公司 A kind of UAV Attitude training method and device
CN110879595A (en) * 2019-11-29 2020-03-13 江苏徐工工程机械研究院有限公司 Unmanned mine card tracking control system and method based on deep reinforcement learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114721412A (en) * 2022-03-16 2022-07-08 北京理工大学 Unmanned aerial vehicle trajectory tracking obstacle avoidance method based on model predictive control
CN115098941A (en) * 2022-05-31 2022-09-23 复旦大学 Unmanned aerial vehicle digital twin control method and platform for agile deployment of intelligent algorithm
CN115098941B (en) * 2022-05-31 2023-08-04 复旦大学 Unmanned aerial vehicle digital twin control method and platform for smart deployment of intelligent algorithm
CN116203992A (en) * 2023-04-28 2023-06-02 北京航空航天大学 Tailstock type unmanned aerial vehicle high-dynamic target tracking method for multi-mode flight control

Also Published As

Publication number Publication date
CN112506210B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN112506210B (en) Unmanned aerial vehicle control method for autonomous target tracking
Singla et al. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge
Chen et al. Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety
Amini et al. Vista 2.0: An open, data-driven simulator for multimodal sensing and policy learning for autonomous vehicles
Ross et al. Learning monocular reactive uav control in cluttered natural environments
Sampedro et al. Image-based visual servoing controller for multirotor aerial robots using deep reinforcement learning
Zhou et al. A deep Q-network (DQN) based path planning method for mobile robots
Hong et al. Energy-efficient online path planning of multiple drones using reinforcement learning
US11561544B2 (en) Indoor monocular navigation method based on cross-sensor transfer learning and system thereof
Kelchtermans et al. How hard is it to cross the room?--Training (Recurrent) Neural Networks to steer a UAV
Olivares-Mendez et al. Vision based fuzzy control autonomous landing with UAVs: From V-REP to real experiments
Bipin et al. Autonomous navigation of generic monocular quadcopter in natural environment
Xu et al. Monocular vision based autonomous landing of quadrotor through deep reinforcement learning
Wang et al. Learning interactive driving policies via data-driven simulation
CN108288038A (en) Night robot motion's decision-making technique based on scene cut
Olivares-Mendez et al. Setting up a testbed for UAV vision based control using V-REP & ROS: A case study on aerial visual inspection
Fu et al. Memory-enhanced deep reinforcement learning for UAV navigation in 3D environment
Doukhi et al. Deep reinforcement learning for autonomous map-less navigation of a flying robot
Chen et al. Deep reinforcement learning of map-based obstacle avoidance for mobile robot navigation
CN111611869B (en) End-to-end monocular vision obstacle avoidance method based on serial deep neural network
Zhou et al. Vision-based navigation of uav with continuous action space using deep reinforcement learning
Walvekar et al. Vision based autonomous navigation of quadcopter using reinforcement learning
CN116679710A (en) Robot obstacle avoidance strategy training and deployment method based on multitask learning
Shi et al. Path Planning of Unmanned Aerial Vehicle Based on Supervised Learning
Prazenica et al. Multiresolution and adaptive path planning for maneuver of micro-air-vehicles in urban environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant