CN112506210A - Unmanned aerial vehicle control method for autonomous target tracking - Google Patents
Unmanned aerial vehicle control method for autonomous target tracking Download PDFInfo
- Publication number
- CN112506210A CN112506210A CN202011402067.XA CN202011402067A CN112506210A CN 112506210 A CN112506210 A CN 112506210A CN 202011402067 A CN202011402067 A CN 202011402067A CN 112506210 A CN112506210 A CN 112506210A
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- reward
- robot
- environment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/08—Control of attitude, i.e. control of roll, pitch, or yaw
- G05D1/0808—Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Abstract
The invention relates to an unmanned aerial vehicle control method for autonomous target tracking, which is characterized in that four-dimensional actions are output through a neural network and then converted into actions of a low-level motor through a PID (proportion integration differentiation) controller, so that an unmanned aerial vehicle can fly more stably, and the PID controller can be changed into other more optimized control methods through later improvement. The layered control system can easily transfer the training strategy in the simulation environment to the real environment to operate. Has good generalization ability. The method comprises the steps of firstly carrying out CNN pre-training on collected images in a simulated environment to obtain the relative distance between the unmanned aerial vehicle and a target object, wherein the relative distance comprises three dimensions of x, y and h. And then, taking the attitude of the unmanned aerial vehicle into consideration, selecting a strategy, outputting four-dimensional actions, outputting the four-dimensional actions to a low-level motor of the unmanned aerial vehicle through PID (proportion integration differentiation), obtaining Reward through a DDPG (distributed data group) reinforcement learning method, updating the strategy, and learning and training.
Description
Technical Field
The invention relates to an unmanned aerial vehicle control method for autonomous target tracking, and belongs to the technical field of unmanned aerial vehicle tracking.
Background
The technology of unmanned aerial vehicle tracking moving objects and people is increasingly needed by military affairs, monitoring, inspection and the like. This requires that the drone be implemented by visual perception techniques and control methods. However, the unmanned aerial vehicle is a very fragile system, and the strategy needs to be updated and optimized through model-free reinforcement learning, and meanwhile, the stability of the controller is guaranteed.
Disclosure of Invention
The invention aims to provide a control method of an unmanned aerial vehicle autonomous tracking motion robot, which combines the self improvement performance of a model-free reinforcement learning method with the stability of a conventional PID controller, outputs four-dimensional actions through a neural network, and converts the actions into the actions of a low-level motor through the PID controller, so that the unmanned aerial vehicle can fly more stably.
In order to achieve the purpose, the invention adopts the following technical scheme: a drone control method for autonomous target tracking, the control method comprising the steps of:
s1: building a simulation platform environment, adding models of an unmanned aerial vehicle and a robot by modifying a launch file, and setting initial positions of the unmanned aerial vehicle and the robot in the simulation environment;
s2: the robot and the unmanned aerial vehicle can respectively move through instructions or codes, the laser radar of the robot can be used for realizing the simple obstacle avoidance function of the robot, the movement of the unmanned aerial vehicle and the robot is controlled by setting set command to the robot and the unmanned aerial vehicle, setting linear speeds in the x, y and z directions and angular speeds rotating around the x, y and z axes, and the images of the environment and the robot on the ground are acquired by acquiring the images through a camera of a bottom of the unmanned aerial vehicle;
s3: gather the image, the perception layer in the simulated environment is trained in advance, let unmanned aerial vehicle on the fixed height that is higher than the environment, the pixel value of the picture that fixed bottom camera was gathered is 256, under the fixed circumstances of unmanned aerial vehicle then, the relative size of the environment that the bottom camera can see, regional area is fixed promptly, let the robot when unmanned aerial vehicle at same high different x, y's sight range internal random motion, gather 10000 images, every has the robot in these 10000 images, carry out the training of CNN neural network as shown in figure 1.
S4: the real-time positions of the robot and the unmanned aerial vehicle in the simulated environment are obtained through topic, subscripter, get position, and the like, some operations are carried out to obtain the relative positions of the robot and the unmanned aerial vehicle, and the mode of storing picture names is modified to take the relative positions of the robot and the unmanned aerial vehicle as the names of the images acquired by the unmanned aerial vehicle, so that the pictures and the labels can be in one-to-one correspondence for the subsequent image processing. Meanwhile, when the unmanned aerial vehicle is controlled to acquire images in an inorganic mode, the unmanned aerial vehicle is controlled by the PID, so that the unmanned aerial vehicle can be fixed in a small area range and is not obvious in shaking.
S5: images and labels are known in a simulation environment, supervised pre-training is carried out, so that the relative positions of the images can be predicted by inputting the images in a real environment, the training x and y are about 6m, the loss obtained by 2000 epicodes is 0.03m, and the average distance between the relative positions predicted in a test set and the actual labels is about 0.1 m.
S6: quad-rotor aircraft have complex nonlinear aerodynamics that are difficult to learn by model-less RL methods. Obviously, this challenge can be solved by incorporating a conventional PID controller. Figure 2 shows the proposed hierarchical control system. At each time step t, the policy network generates a four-dimensional high-level reference action u given the observed imaget。
S7: by means of the reinforcement learning method of the DDPG, a reward function is shaped, and the reward function simultaneously considers the four-rotor state and the target related state.
S8: and transferring to a real environment. The invention relates to an unmanned aerial vehicle control method for autonomous target tracking. The technology of unmanned aerial vehicle tracking moving objects and people is increasingly needed by military affairs, monitoring, inspection and the like. This requires that the drone be implemented by visual perception techniques and control methods. However, the unmanned aerial vehicle is a very fragile system, and the strategy needs to be updated and optimized through model-free reinforcement learning, and meanwhile, the stability of the controller is guaranteed.
Wherein, the environment in step S1 includes obstacles such as football court, football, cleaning cart, garbage bin, ladder rack, color fence, scooter, warning stake, desk, etc.
Wherein the step S5: the last convolution layer is then merged with a spatial softmax layer to integrate the values for each pixelThe feature map is converted into spatial coordinates in image space. The spatial softmax map layer consists of the spatial softmax function applied to the last convolved feature map and a fixed sparse fully connected layer that computes the expected image position for each feature map. The spatial feature points are then regressed into a three-dimensional vector, i.e., t ═ xt,yt,ht) It represents the 2D position and height of the object on the image plane, where my height is fixed through another fully connected layer. To achieve stable flight, a four-rotor configuration s must be usedq,t=(zt,vt,qt,wt) Including altitude, linear velocity, direction and angular velocity, as additional inputs to the neural network. After the perception layer, the target-related state so,tWith four rotor states sq,tMerged together and then the layer that is fully contiguous with the action.
Wherein, in the step S6,
utrepresenting a four-dimensional high-level reference motion, divided into four motions, pxCorresponding to a relative positional shift, p, in the x-directionyCorresponding to a relative positional shift, p, in the y directionzCorresponding to the relative positional shift in the z direction,corresponding to a relative angular offset about the yaw axis.
In step S7, the environment is observed sufficiently, and the state o at time t of the environment is observedtState stThen Q function Qπ(st,at) Is shown in state stExpected benefit after taking action and then following policy π:
Qπ(st,at)=Eπ[Rt|st,at]
wherein Qπ(st,at) Is a Q function, RtIs in a state stAt a moment of timetReward in action, and then performing the desired calculation to obtain the Q value, considering a Q function approximator parameterized by Q, which can be optimized by minimizing the loss, L (theta)Q) Is the loss of the Q function, to minimize, Q(s)t,at|θQ) Is the Q value of the critic network:
L(θQ)=Eπ[(Q(st,at|θQ)-yt)2]
wherein
yt=r(st,at)+γmaxaQ(st+1,a|θQ')
Is the target Q value, θ, estimated by the target Q networkQWeight, θ, representing the critic's networkQ'A weight representing a critic's target network, γ being a discount factor, according to the loss L (θ)Q) The resulting gradient is calculated to update the weight of the critic's network.
For the continuous action problem, Q-learning becomes difficult, the continuous domain is usually solved by AC, DDPG is a combination of AC and DQN, making more efficient learning on continuous action, deterministic makes the function deterministically specify that the current policy maps each state to a unique action, and the operator is updated by performing gradient ascent based on the following policy gradient.
To this end, the reward function is designed as a combination of a goal-oriented goal reward and an auxiliary quad-rotor reward, where r represents the total reward and r represents the total rewardg(sg) Representing goal-oriented reward of goals, rq(sq) Representing an auxiliary four-rotor reward:
r=rg(sg)+rq(sq)
for simplicity of notation, the subscripted time step t is omitted. The target prize is expressed as the sum of two components:
rg(sg)=rg(x,y)+rg(h)
corresponding to position reward and scale reward respectivelyLet spartRepresents a state sgWhich corresponds to (x, y) or h, and then the corresponding reward takes the form:
wherein
Δspart=||spart-s* part||2
Representing the "2-norm", τ, between the current state and the desired target state1,τ2Representing different thresholds, the desired state is that the drone is kept one h above the robot at all times, the relative distance of x and y is 0, and the reward is calculated comparing the desired state with the current state. Also, in the case of a slight use of symbols, the auxiliary prize is expressed as follows:
rq(sq)=rq(z)+rq(q1,q2,q3,q4);
they correspond to the height and direction (in quaternion) of the quadrotors, respectively, where the height remains unchanged, so considering the quaternion of the drone, i.e. the RPY parameter, unlike the target reward, which is used to impose other constraints on the attitude, only the penalty terms are introduced. By using the same symbols as in the target prize, the auxiliary prize takes the form:
wherein, tau1Indicating the same threshold as the above formula, and c represents a penalty weight. Tau is1=0.05,τ20.2 and 0.5. It is desirable that the attitude of the drone does not change much from the previous attitude. In step S8, the positions and state information of the unmanned aerial vehicle and the robot in the whole map cannot be known in the real scene, so training, updating parameters, and determining need to be performed in the simulation environmentThe strategy is finally applied to the real world, and the generalization capability is good.
Compared with the prior art, the invention has the following advantages: the invention combines the self-improvement performance of the model-free reinforcement learning method with the stability of the conventional PID controller, outputs four-dimensional actions through the neural network, and then converts the actions into the actions of the low-level motor through the PID controller, so that the unmanned aerial vehicle can fly more stably, and the later improvement can change the PID controller into other more optimized control methods. The layered control system can easily transfer the training strategy in the simulation environment to the real environment to operate. Has good generalization ability. The method comprises the steps of firstly carrying out CNN pre-training on collected images in a simulated environment to obtain the relative distance between the unmanned aerial vehicle and a target object, wherein the relative distance comprises three dimensions of x, y and h. And then, taking the attitude of the unmanned aerial vehicle into consideration, selecting a strategy, outputting four-dimensional actions, outputting the four-dimensional actions to a low-level motor of the unmanned aerial vehicle through PID (proportion integration differentiation), obtaining Reward through a DDPG (distributed data group) reinforcement learning method, updating the strategy, and learning and training.
Drawings
FIG. 1 is a block diagram of a policy network architecture of the present invention, with a sensing layer estimating target states and a control layer learning control behavior;
FIG. 2 is a hierarchical control system of the present invention incorporating a policy network and a PID controller;
Detailed Description
For the purposes of promoting an understanding and appreciation of the invention, reference will now be made in detail to the embodiments illustrated in the drawings. Example 1: referring to fig. 1-2, a drone control method for autonomous target tracking, the method comprising the steps of:
s1: building a simulation platform environment, adding models of an unmanned aerial vehicle and a robot by modifying a launch file, and setting initial positions of the unmanned aerial vehicle and the robot in the simulation environment;
s2: the robot and the unmanned aerial vehicle can respectively move through instructions or codes. The laser radar of the robot can be used for realizing the simple obstacle avoidance function of the robot, the movement of the unmanned aerial vehicle and the robot is controlled by performing set command on the robot and the unmanned aerial vehicle, setting linear speeds in the x, y and z directions and angular speeds rotating around the x, y and z axes, and images are acquired by using a camera of a bottom of the unmanned aerial vehicle so as to acquire images of the environment and the robot on the ground;
s3: acquiring images, pre-training a perception layer in a simulation environment, enabling an unmanned aerial vehicle to be at a fixed height higher than the environment, fixing the pixel value of a picture acquired by a bottom camera to be 256 × 256, and under the condition that the unmanned aerial vehicle is fixed, enabling the bottom camera to see the relative size of the environment, namely the area of an area to be fixed, enabling a robot to randomly move in the sight ranges of x and y at the same height when the unmanned aerial vehicle is at the same height, and acquiring 10000 images, wherein each 10000 images are provided with the robot, and performing the training of a CNN (neural network) as shown in the attached figure 1;
s4: the real-time positions of the robot and the unmanned aerial vehicle in the simulated environment are obtained through topic, subscripter, get position, and the like, some operations are carried out to obtain the relative positions of the robot and the unmanned aerial vehicle, and the mode of storing picture names is modified to take the relative positions of the robot and the unmanned aerial vehicle as the names of the images acquired by the unmanned aerial vehicle, so that the pictures and the labels can be in one-to-one correspondence for the subsequent image processing. Meanwhile, when the unmanned aerial vehicle is controlled to acquire images in an inorganic mode, the unmanned aerial vehicle is controlled by the PID, so that the unmanned aerial vehicle can be fixed in a small area range and is not obvious in shaking.
S5: images and labels are known in a simulation environment, supervised pre-training is carried out, so that the relative positions of the images can be predicted by inputting the images in a real environment, the training x and y are about 6m, the loss obtained by 2000 epicodes is 0.03m, and the average distance between the relative positions predicted in a test set and the actual labels is about 0.1 m.
S6: quad-rotor aircraft have complex nonlinear aerodynamics that are difficult to learn by model-less RL methods. Obviously, this challenge can be solved by incorporating a conventional PID controller. Figure 2 shows the proposed hierarchical control system. At each time step t, the policy network will generate given the observed imageInto a four-dimensional high-level reference motion ut。
S7: by means of the reinforcement learning method of the DDPG, a reward function is shaped, and the reward function simultaneously considers the four-rotor state and the target related state.
S8: and transferring to a real environment.
Wherein, the environment in step S1 includes obstacles such as football court, football, cleaning cart, garbage bin, ladder rack, color fence, scooter, warning stake, desk, etc.
Wherein the step S5: the last convolutional layer is then merged with a spatial softmax layer to convert the feature map for each pixel direction to spatial coordinates in image space. The spatial softmax map layer consists of the spatial softmax function applied to the last convolved feature map and a fixed sparse fully connected layer that computes the expected image position for each feature map. The spatial feature points are then regressed into a three-dimensional vector, i.e., t ═ xt,yt,ht) It represents the 2D position and height of the object on the image plane, where my height is fixed through another fully connected layer. To achieve stable flight, a four-rotor configuration s must be usedq,t=(zt,vt,qt,wt) Including altitude, linear velocity, direction and angular velocity, as additional inputs to the neural network. After the perception layer, the target-related state so,tWith four rotor states sq,tMerged together and then the layer that is fully contiguous with the action.
Wherein, in the step S6,
utrepresenting a four-dimensional high-level reference motion, divided into four motions, pxCorresponding to a relative positional shift, p, in the x-directionyCorresponding to a relative positional shift, p, in the y directionzCorresponding to the relative positional shift in the z direction,corresponding to a relative angular offset about the yaw axis.
In step S7, the environment is observed sufficiently, and the state o at time t of the environment is observedtState stThen Q function Qπ(st,at) Is shown in state stExpected benefit after taking action and then following policy π:
Qπ(st,at)=Eπ[Rt|st,at]
wherein Qπ(st,at) Is a Q function, RtIs in a state stAt a moment of timetReward in action, and then performing the desired calculation to obtain the Q value, considering a Q function approximator parameterized by Q, which can be optimized by minimizing the loss, L (theta)Q) Is the loss of the Q function, to minimize, Q(s)t,at|θQ) Is the Q value of the critic network:
L(θQ)=Eπ[(Q(st,at|θQ)-yt)2]
wherein
yt=r(st,at)+γmaxaQ(st+1,a|θQ')
Is the target Q value, θ, estimated by the target Q networkQWeight, θ, representing the critic's networkQ'A weight representing a critic's target network, γ being a discount factor, according to the loss L (θ)Q) The resulting gradient is calculated to update the weight of the critic's network.
For the continuous action problem, Q-learning becomes difficult, the continuous domain is usually solved by AC, DDPG is a combination of AC and DQN, making more efficient learning on continuous action, deterministic makes the function deterministically specify that the current policy maps each state to a unique action, and the operator is updated by performing gradient ascent based on the following policy gradient.
To this end, the reward function is designed to be eye-orientedA combination of a target reward and an auxiliary quad-rotor reward, where r represents the total reward and r represents the total rewardg(sg) Representing goal-oriented reward of goals, rq(sq) Representing an auxiliary four-rotor reward:
r=rg(sg)+rq(sq)
for simplicity of notation, the subscripted time step t is omitted. The target prize is expressed as the sum of two components:
rg(sg)=rg(x,y)+rg(h)
corresponding to position reward and scale reward, respectively, spartRepresents a state sgWhich corresponds to (x, y) or h, and then the corresponding reward takes the form:
wherein
Δspart=||spart-s* part||2
Representing the "2-norm", τ, between the current state and the desired target state1,τ2Representing different thresholds, the desired state is that the drone is kept one h above the robot at all times, the relative distance of x and y is 0, and the reward is calculated comparing the desired state with the current state. Also, in the case of a slight use of symbols, the auxiliary prize is expressed as follows:
rq(sq)=rq(z)+rq(q1,q2,q3,q4);
they correspond to the height and direction (in quaternion) of the quadrotors, respectively, where the height remains unchanged, so considering the quaternion of the drone, i.e. the RPY parameter, unlike the target reward, which is used to impose other constraints on the attitude, only the penalty terms are introduced. By using the same symbols as in the target prize, the auxiliary prize takes the form:
wherein, tau1Indicating the same threshold as the above formula, and c represents a penalty weight. Tau is1=0.05,τ20.2 and 0.5. It is desirable that the attitude of the drone does not change much from the previous attitude. In step S8, the positions and state information of the unmanned aerial vehicle and the robot in the whole map cannot be known in the real scene, so training, updating parameters, determining strategies, and finally applying to the real world are required in the simulation environment, and the method has good generalization capability.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent modifications or substitutions based on the above-mentioned technical solutions are within the scope of the present invention.
Claims (6)
1. An unmanned aerial vehicle control method for autonomous target tracking is characterized in that: the method comprises the following steps:
s1: building a simulation platform environment, adding models of an unmanned aerial vehicle and a robot by modifying a launch file, and setting initial positions of the unmanned aerial vehicle and the robot in the simulation environment;
s2: the robot and the unmanned aerial vehicle can respectively move through instructions or codes, the laser radar of the robot can be used for realizing the simple obstacle avoidance function of the robot, the movement of the unmanned aerial vehicle and the robot is controlled by setting set command to the robot and the unmanned aerial vehicle, setting linear speeds in the x, y and z directions and angular speeds rotating around the x, y and z axes, and the images of the environment and the robot on the ground are acquired by acquiring the images through a camera of a bottom of the unmanned aerial vehicle;
s3: acquiring images, pre-training a perception layer in a simulation environment, enabling an unmanned aerial vehicle to be at a fixed height higher than the environment, fixing the pixel value of a picture acquired by a bottom camera to be 256 × 256, and under the condition that the unmanned aerial vehicle is fixed, enabling the bottom camera to see the relative size of the environment, namely the area of an area to be fixed, enabling a robot to randomly move in the sight ranges of x and y at the same height when the unmanned aerial vehicle is at the same height, and acquiring 10000 images, wherein each 10000 image is provided with the robot to train a CNN (neural network);
s4: the real-time positions of the robot and the unmanned aerial vehicle in the simulated environment are obtained through topic, subscripter, get position, and the like, the operation is carried out to obtain the relative positions of the robot and the unmanned aerial vehicle, the mode of storing picture names is modified to take the relative positions of the robot and the unmanned aerial vehicle as the names of the images acquired by the unmanned aerial vehicle,
s5: the method comprises the steps of knowing images and labels in a simulation environment, carrying out supervised pre-training, enabling input images in a real environment to predict the relative positions of the known images and the labels, wherein x and y of training are about 6m, loss obtained by 2000 epsilon is 0.03m, and the average distance between the relative positions predicted in a test set and the actual labels is about 0.1 m;
s6: the hierarchical control system generates a four-dimensional high-level reference action u in each time step t given the observed imaget;
S7: building a reward function by a DDPG (distributed data group) -based reinforcement learning method, wherein the reward function simultaneously considers the four-rotor state and the target related state;
s8: and transferring to a real environment.
2. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: the environment in step S1 includes obstacles such as football stadium, football, cleaning cart, garbage bin, ladder rack, colored enclosure, scooter, warning stake, desk, etc.
3. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: the step S5: the last convolution layer is then merged with a spatial softmax layer to convert the feature map for each pixel direction to spatial coordinates in image spaceConsisting of a spatial softmax function applied to the last convolved feature map and a fixed sparse fully connected layer that computes the expected image position for each feature map and then regresses the spatial feature points into a three-dimensional vector, i.e., t ═ x (x)t,yt,ht) Which represents the 2D position and height of the object in the image plane, through another fully connected layer, using a quad-rotor configuration sq,t=(zt,vt,qt,wt) Including altitude, linear velocity, direction and angular velocity, as additional inputs to the neural network, after the perception layer, the target-related state so,tWith four rotor states sq,tMerged together and then the layer that is fully contiguous with the action.
4. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: in the step S6, in the above step,
utrepresenting a four-dimensional high-level reference motion, divided into four motions, pxCorresponding to a relative positional shift, p, in the x-directionyCorresponding to a relative positional shift, p, in the y directionzCorresponding to the relative positional shift in the z direction,corresponding to a relative angular offset about the yaw axis.
5. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: in step S7, it is assumed that the environment is sufficiently observed and the state o at time t of the environment is observedtState stThen Q function Qπ(st,at) Is shown in state stExpected benefit after taking action and then following policy π:
Qπ(st,at)=Eπ[Rt|st,at]
wherein Qπ(st,at) Is a Q function, RtIs in a state stAt a moment of timetReward in action, and then performing the desired calculation to obtain the Q value, considering a Q function approximator parameterized by Q, which can be optimized by minimizing the loss, L (theta)Q) Is the loss of the Q function, to minimize, Q(s)t,at|θQ) Is the Q value of the critic network:
L(θQ)=Eπ[(Q(st,at|θQ)-yt)2]
wherein
yt=r(st,at)+γmaxaQ(st+1,a|θQ')
Is the target Q value, θ, estimated by the target Q networkQWeight, θ, representing the critic's networkQ′A weight representing a critic's target network, γ being a discount factor, according to the loss L (θ)Q) The calculated gradient is used for updating the weight of the critic network;
for the continuous action problem, Q-learning becomes difficult, the continuous domain is usually solved by AC, DDPG is a combination of AC and DQN, making more efficient learning on continuous action, deterministic makes the function deterministically specify that the current policy maps each state to a unique action, and the operator is updated by performing gradient ascent based on the following policy gradient.
To this end, the reward function is designed as a combination of a goal-oriented goal reward and an auxiliary quad-rotor reward, where r represents the total reward and r represents the total rewardg(sg) Representing goal-oriented reward of goals, rq(sq) Representing an auxiliary four-rotor reward:
r=rg(sg)+rq(sq)
for simplicity of notation, the subscripted time step t is omitted. The target prize is expressed as the sum of two components:
rg(sg)=rg(x,y)+rg(h)
corresponding to position reward and scale reward, respectively, spartRepresents a state sgWhich corresponds to (x, y) or h, and then the corresponding reward takes the form:
wherein
Δspart=||spart-s* part||2
Representing the "2-norm", τ, between the current state and the desired target state1,τ2Representing different thresholds, the desired state is that the drone is kept one h above the robot at all times, the relative distance of x and y is 0, and the reward is calculated comparing the desired state with the current state. Also, in the case of a slight use of symbols, the auxiliary prize is expressed as follows:
rq(sq)=rq(z)+rq(q1,q2,q3,q4)
they correspond to the height and direction (in quaternion) of the quadrotors, respectively, where the height remains unchanged, so considering the quaternion of the drone, i.e. the RPY parameter, unlike the target reward, which is used to impose other constraints on the attitude, only the penalty terms are introduced. By using the same symbols as in the target prize, the auxiliary prize takes the form:
wherein, tau1Indicating the same threshold as the above equation, c represents a penalty weight, τ1=0.05,τ20.2 and 0.5. It is desirable that the attitude of the drone does not change much from the previous attitude.
6. The drone controlling method for autonomous target tracking according to claim 1, characterized in that: in step S8, training needs to be performed in a simulation environment, parameters are updated, strategies are determined, and the method is finally applied to the real world, and has a good generalization capability.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011402067.XA CN112506210B (en) | 2020-12-04 | 2020-12-04 | Unmanned aerial vehicle control method for autonomous target tracking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011402067.XA CN112506210B (en) | 2020-12-04 | 2020-12-04 | Unmanned aerial vehicle control method for autonomous target tracking |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112506210A true CN112506210A (en) | 2021-03-16 |
CN112506210B CN112506210B (en) | 2022-12-27 |
Family
ID=74969812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011402067.XA Active CN112506210B (en) | 2020-12-04 | 2020-12-04 | Unmanned aerial vehicle control method for autonomous target tracking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112506210B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114721412A (en) * | 2022-03-16 | 2022-07-08 | 北京理工大学 | Unmanned aerial vehicle trajectory tracking obstacle avoidance method based on model predictive control |
CN115098941A (en) * | 2022-05-31 | 2022-09-23 | 复旦大学 | Unmanned aerial vehicle digital twin control method and platform for agile deployment of intelligent algorithm |
CN116203992A (en) * | 2023-04-28 | 2023-06-02 | 北京航空航天大学 | Tailstock type unmanned aerial vehicle high-dynamic target tracking method for multi-mode flight control |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107479368A (en) * | 2017-06-30 | 2017-12-15 | 北京百度网讯科技有限公司 | A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence |
CN108803321A (en) * | 2018-05-30 | 2018-11-13 | 清华大学 | Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study |
CN109164821A (en) * | 2018-09-26 | 2019-01-08 | 中科物栖(北京)科技有限责任公司 | A kind of UAV Attitude training method and device |
US20190325584A1 (en) * | 2018-04-18 | 2019-10-24 | Tg-17, Llc | Systems and Methods for Real-Time Adjustment of Neural Networks for Autonomous Tracking and Localization of Moving Subject |
CN110879595A (en) * | 2019-11-29 | 2020-03-13 | 江苏徐工工程机械研究院有限公司 | Unmanned mine card tracking control system and method based on deep reinforcement learning |
-
2020
- 2020-12-04 CN CN202011402067.XA patent/CN112506210B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107479368A (en) * | 2017-06-30 | 2017-12-15 | 北京百度网讯科技有限公司 | A kind of method and system of the training unmanned aerial vehicle (UAV) control model based on artificial intelligence |
US20190325584A1 (en) * | 2018-04-18 | 2019-10-24 | Tg-17, Llc | Systems and Methods for Real-Time Adjustment of Neural Networks for Autonomous Tracking and Localization of Moving Subject |
CN108803321A (en) * | 2018-05-30 | 2018-11-13 | 清华大学 | Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study |
CN109164821A (en) * | 2018-09-26 | 2019-01-08 | 中科物栖(北京)科技有限责任公司 | A kind of UAV Attitude training method and device |
CN110879595A (en) * | 2019-11-29 | 2020-03-13 | 江苏徐工工程机械研究院有限公司 | Unmanned mine card tracking control system and method based on deep reinforcement learning |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114721412A (en) * | 2022-03-16 | 2022-07-08 | 北京理工大学 | Unmanned aerial vehicle trajectory tracking obstacle avoidance method based on model predictive control |
CN115098941A (en) * | 2022-05-31 | 2022-09-23 | 复旦大学 | Unmanned aerial vehicle digital twin control method and platform for agile deployment of intelligent algorithm |
CN115098941B (en) * | 2022-05-31 | 2023-08-04 | 复旦大学 | Unmanned aerial vehicle digital twin control method and platform for smart deployment of intelligent algorithm |
CN116203992A (en) * | 2023-04-28 | 2023-06-02 | 北京航空航天大学 | Tailstock type unmanned aerial vehicle high-dynamic target tracking method for multi-mode flight control |
Also Published As
Publication number | Publication date |
---|---|
CN112506210B (en) | 2022-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112506210B (en) | Unmanned aerial vehicle control method for autonomous target tracking | |
Singla et al. | Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge | |
Chen et al. | Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety | |
Amini et al. | Vista 2.0: An open, data-driven simulator for multimodal sensing and policy learning for autonomous vehicles | |
Ross et al. | Learning monocular reactive uav control in cluttered natural environments | |
Sampedro et al. | Image-based visual servoing controller for multirotor aerial robots using deep reinforcement learning | |
Zhou et al. | A deep Q-network (DQN) based path planning method for mobile robots | |
Hong et al. | Energy-efficient online path planning of multiple drones using reinforcement learning | |
US11561544B2 (en) | Indoor monocular navigation method based on cross-sensor transfer learning and system thereof | |
Kelchtermans et al. | How hard is it to cross the room?--Training (Recurrent) Neural Networks to steer a UAV | |
Olivares-Mendez et al. | Vision based fuzzy control autonomous landing with UAVs: From V-REP to real experiments | |
Bipin et al. | Autonomous navigation of generic monocular quadcopter in natural environment | |
Xu et al. | Monocular vision based autonomous landing of quadrotor through deep reinforcement learning | |
Wang et al. | Learning interactive driving policies via data-driven simulation | |
CN108288038A (en) | Night robot motion's decision-making technique based on scene cut | |
Olivares-Mendez et al. | Setting up a testbed for UAV vision based control using V-REP & ROS: A case study on aerial visual inspection | |
Fu et al. | Memory-enhanced deep reinforcement learning for UAV navigation in 3D environment | |
Doukhi et al. | Deep reinforcement learning for autonomous map-less navigation of a flying robot | |
Chen et al. | Deep reinforcement learning of map-based obstacle avoidance for mobile robot navigation | |
CN111611869B (en) | End-to-end monocular vision obstacle avoidance method based on serial deep neural network | |
Zhou et al. | Vision-based navigation of uav with continuous action space using deep reinforcement learning | |
Walvekar et al. | Vision based autonomous navigation of quadcopter using reinforcement learning | |
CN116679710A (en) | Robot obstacle avoidance strategy training and deployment method based on multitask learning | |
Shi et al. | Path Planning of Unmanned Aerial Vehicle Based on Supervised Learning | |
Prazenica et al. | Multiresolution and adaptive path planning for maneuver of micro-air-vehicles in urban environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |