CN111026147A - Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning - Google Patents

Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning Download PDF

Info

Publication number
CN111026147A
CN111026147A CN201911363490.0A CN201911363490A CN111026147A CN 111026147 A CN111026147 A CN 111026147A CN 201911363490 A CN201911363490 A CN 201911363490A CN 111026147 A CN111026147 A CN 111026147A
Authority
CN
China
Prior art keywords
aerial vehicle
unmanned aerial
control
speed
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911363490.0A
Other languages
Chinese (zh)
Other versions
CN111026147B (en
Inventor
单光存
张一楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Everlasting Technology Hangzhou Co ltd
Beihang University
Original Assignee
Everlasting Technology Hangzhou Co ltd
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Everlasting Technology Hangzhou Co ltd, Beihang University filed Critical Everlasting Technology Hangzhou Co ltd
Priority to CN201911363490.0A priority Critical patent/CN111026147B/en
Publication of CN111026147A publication Critical patent/CN111026147A/en
Application granted granted Critical
Publication of CN111026147B publication Critical patent/CN111026147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0808Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for aircraft
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/10Simultaneous control of position or course in three dimensions
    • G05D1/101Simultaneous control of position or course in three dimensions specially adapted for aircraft

Abstract

The utility model provides a zero overshoot unmanned aerial vehicle position control method based on deep reinforcement learning, including: s1, constructing a reinforcement learning training frame aiming at unmanned aerial vehicle speed control based on a near-end strategy optimization algorithm, and training an unmanned aerial vehicle control model by combining a feature extraction network to obtain the unmanned aerial vehicle speed control model; s2, control unmanned aerial vehicle add PID control ring in the unmanned aerial vehicle control model outside, carry out the optimal search to the PID parameter, utilize PID control algorithm to turn into unmanned aerial vehicle speed control 'S control model into unmanned aerial vehicle position control' S control model to overshoot among the elimination position control. This is disclosed can realize allowing the effective speed control of quiet tolerance within range unmanned aerial vehicle to according to unmanned aerial vehicle's effective speed control, can further realize zero overshoot unmanned aerial vehicle position control.

Description

Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
Technical Field
The disclosure relates to the field of unmanned aerial vehicles, in particular to a zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning.
Background
The unmanned aerial vehicle as a typical scene of an unmanned system has many practical application examples in various fields after years of development, and can help people to solve many practical problems under the trend of increasing intelligence of the current society. And four rotor unmanned aerial vehicle are because its low price, and mobility is strong, and the advantage that the structure is light obtains considerable attention in recent years, and the inevitable great progress has been had in the aspect of realizing in theory, application and industry actual production etc..
In military terms, the quad-rotor unmanned aerial vehicle has very high application priority in special scenes such as personnel search and rescue, detection and exploration and the like due to the advantages of high maneuverability and small volume. And in civilian aspect, the four rotor unmanned aerial vehicle through standardized volume production also has remarkable performance in the aspects such as commodity circulation transportation, conflagration early warning, crop protection, high altitude shooting.
Disclosure of Invention
Technical problem to be solved
The present disclosure provides a zero overshoot drone position control method and apparatus based on deep reinforcement learning to at least partially solve the above-mentioned technical problems.
(II) technical scheme
According to one aspect of the disclosure, a zero overshoot unmanned aerial vehicle position control method based on deep reinforcement learning is provided, which includes:
s1, constructing a reinforcement learning training frame aiming at unmanned aerial vehicle speed control based on a near-end strategy optimization algorithm, and training an unmanned aerial vehicle control model by combining a feature extraction network to obtain the unmanned aerial vehicle speed control model;
s2, control unmanned aerial vehicle add PID control ring in the unmanned aerial vehicle control model outside, carry out the optimal search to the PID parameter, utilize PID control algorithm to turn into unmanned aerial vehicle speed control 'S control model into unmanned aerial vehicle position control' S control model to overshoot among the elimination position control.
In some embodiments, in step S1, the control model for controlling the speed of the drone is a markov model, and the observation states are:
Figure BDA0002336096800000021
wherein
Figure BDA0002336096800000022
The velocities in the x, y, z directions, respectively, and phit,θt,ψtThe unmanned aerial vehicle posture expressed by Euler angle is adopted,
Figure BDA0002336096800000023
the corresponding angular velocity of the Euler angle of the unmanned aerial vehicle.
In some embodiments, the step S1 includes:
and alternately training an unmanned aerial vehicle control algorithm unit and a state evaluation unit of a reinforcement learning training framework aiming at the speed control of the unmanned aerial vehicle, and fitting out the optimal mapping from the current state of the unmanned aerial vehicle to the control signal of the unmanned aerial vehicle.
In some embodiments, the drone control algorithm unit maps the current state of the drone to a control signal, the control signal being a drone speed control signal;
the state evaluation unit evaluates the current state of the unmanned aerial vehicle, and the evaluation standard is the difference between the current state and the target state.
In some embodiments, the state evaluation unit determines a difference between the current speed and the target control speed, and if the difference is larger or the time required for implementing the speed state transition is longer, the output value of the state evaluation function is smaller, otherwise, the output value is larger.
In some embodiments, the determining the difference between the current speed and the target control speed comprises:
and calculating the integral of the difference between the speed of the unmanned aerial vehicle and the target speed of the unmanned aerial vehicle in a time range, wherein the faster the unmanned aerial vehicle reaches the target speed, the faster the effective control can be realized.
In some embodiments, the state evaluation unit employs a reward mechanism, and the final control model converges to speed optimal control at the incentive of the reward mechanism, and the reward mechanism employs a reward function of:
Figure BDA0002336096800000024
wherein the content of the first and second substances,
Figure BDA0002336096800000025
the differences between the current speed and the target speed of the unmanned plane on three coordinate axes of x, y and z respectively,
Figure BDA0002336096800000026
a penalty function is constructed according to the speed difference of the unmanned aerial vehicle; k is a proportionality coefficient between the speed gap and the penalty amount; r isscalarA normal amount such that the reward is positive; the bonus term awards the control agent for speed control within the allowable error.
In some embodiments, the unmanned aerial vehicle control algorithm unit and the state evaluation unit adopt a feature extraction network structure, the feature extraction networks of the unmanned aerial vehicle control algorithm unit and the state evaluation unit respectively comprise an input layer, a first fully-hidden layer, a second fully-hidden layer and an output layer, the dimensions of the input layer, the first fully-hidden layer and the second fully-hidden layer are the same, and the dimension output by the unmanned aerial vehicle control algorithm unit is determined according to the control strategy parameters of the unmanned aerial vehicle power device.
In some embodiments, the step S2 includes:
providing the desired target position for the drone, resolving the target speed of the drone by means of a PID method, i.e.
Figure BDA0002336096800000031
Wherein, Kp,KdFor proportional and derivative term coefficients of the drone, etIs the difference between the unmanned plane position and the target position,
Figure BDA0002336096800000032
is the target speed of the unmanned aerial vehicle;
obtaining the target speed by reinforcement learningDegree of rotation
Figure BDA0002336096800000033
Mapping to the speed of the power device of the unmanned aerial vehicle, so that a control model for controlling the speed of the unmanned aerial vehicle is converted into a control model for controlling the position of the unmanned aerial vehicle; the control model of the unmanned aerial vehicle position control is combined with the PID algorithm model, the integral term of the PID algorithm model is deleted, the proportional term and the differential term are reserved, and the unmanned aerial vehicle zero overshoot position control is achieved.
According to another aspect of the present disclosure, there is provided a zero overshoot drone position control device based on deep reinforcement learning, including:
a readable storage medium to store executable instructions;
one or more processors executing the control method as described above according to the executable instructions.
(III) advantageous effects
According to the technical scheme, the zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning at least have one of the following beneficial effects:
(1) by constructing a reinforcement learning training frame aiming at the speed control of the unmanned aerial vehicle, after a specific allowable static range is set, the effective speed control of the unmanned aerial vehicle within the allowable static range can be realized;
(2) according to the effective speed control of the unmanned aerial vehicle, the position control of the unmanned aerial vehicle with zero overshoot can be further realized by combining a PID algorithm model for removing an integral term.
Drawings
Fig. 1 is a flowchart of a method for controlling a position of a zero overshoot unmanned aerial vehicle based on deep reinforcement learning according to an embodiment of the present disclosure.
FIG. 2 is a block diagram of a reinforcement learning algorithm of the present disclosure.
Fig. 3 is a schematic structural diagram of a feature extraction network of the unmanned aerial vehicle control algorithm unit and the state evaluation unit in the embodiment of the present disclosure.
Fig. 4 is a flowchart of the unmanned aerial vehicle position control according to the embodiment of the present disclosure.
Fig. 5 is a schematic diagram of a three-axis response curve controlled by the drone according to an embodiment of the present disclosure.
Fig. 6 is a flowchart of a method for arranging and placing articles by using an unmanned aerial vehicle according to an embodiment of the present disclosure.
Detailed Description
The unmanned aerial vehicle is used as a highly nonlinear system, solving the optimal control solution is a very complex problem, and only an approximate solution can be solved under most conditions. On the problem of extremely long time period, for solving the optimal solution under the long period by the discrete system, the deviation problem caused by the approximate solution is more serious, so that the control problem of the complex system is difficult to solve effectively. In recent years, the method for deep learning has excellent performance in the fields of images, voice and the like for complex nonlinear mapping problems, provides a very good tool for solving complex nonlinear system problems, represents the complex mapping of a nonlinear system through the excellent fitting performance of a neural network, and converts the process of solving the optimal solution of the nonlinear system into a data sample-driven fitting problem.
On the application problem of the control of the unmanned aerial vehicle, many learners already do corresponding work, such as the problems of stable hand throwing, speed control and obstacle avoidance of the unmanned aerial vehicle. The application of reinforcement learning to drone control problems is therefore efficient and feasible. The study of scholars and a large number of training experiments show that the overshoot of the unmanned aerial vehicle in the flight control process cannot be limited by training the unmanned aerial vehicle control agent by adopting a reinforcement learning algorithm. Under this kind of condition, often the higher overshoot problem has in the unmanned aerial vehicle control agent who obtains through reinforcement learning training, can't satisfy unmanned aerial vehicle descending and unmanned aerial vehicle and keep away the sensitive problem of position control overshoot such as barrier, easily the dangerous situation of unmanned aerial vehicle appears.
In order to solve the problems, the disclosure provides a zero overshoot unmanned aerial vehicle position control method based on deep reinforcement learning combined with a PID control algorithm.
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
Certain embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the disclosure are shown. Indeed, various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements.
In one exemplary embodiment of the disclosure, a zero overshoot drone position control method based on deep reinforcement learning is provided.
Firstly, the speed control of the unmanned aerial vehicle is constructed into a Markov model, and the observation state of the Markov model can be written as follows:
Figure BDA0002336096800000051
wherein the content of the first and second substances,
Figure BDA0002336096800000052
the velocities in the x, y, z directions, respectively, and phit,θt,ψtThe unmanned aerial vehicle posture expressed by Euler angle is adopted,
Figure BDA0002336096800000053
the corresponding angular velocity of the Euler angle of the unmanned aerial vehicle. Complete information for unmanned aerial vehicle speed control can be acquired after the state exists, and then the unmanned aerial vehicle control model is trained through reinforcement learning by setting an evaluation mechanism. The evaluation mechanism that this embodiment adopted mainly is the integral of the gap of unmanned aerial vehicle speed and unmanned aerial vehicle target speed in time range, consequently if unmanned aerial vehicle reaches target speed sooner then can realize effective control sooner.
The unmanned aerial vehicle control model of this embodiment outputs for four screw rotational speeds of unmanned aerial vehicle:
1,Ω2,Ω3,Ω4]
by comparing the difference between the current state and the target state, the rotating speed of the controlled propeller is output by means of the action output network, and the network is optimized by means of the reinforcement learning algorithm, so that the expected target, namely the speed is effectively controlled. After the speed is effectively controlled, the position of the unmanned aerial vehicle can be effectively controlled through a PID method.
Fig. 1 is a flowchart of a method for controlling a position of a zero overshoot unmanned aerial vehicle based on deep reinforcement learning according to an embodiment of the present disclosure. As shown in fig. 1, the zero overshoot unmanned aerial vehicle position control method based on deep reinforcement learning of the present disclosure includes:
s1, constructing a reinforcement learning training frame aiming at unmanned aerial vehicle speed control based on a near-end strategy optimization algorithm, and training an unmanned aerial vehicle control model by combining a feature extraction network to obtain the unmanned aerial vehicle speed control model;
s2, control unmanned aerial vehicle add PID control ring in the unmanned aerial vehicle control model outside, carry out the optimal search to the PID parameter, utilize PID control algorithm to turn into unmanned aerial vehicle speed control 'S control model into unmanned aerial vehicle position control' S control model to overshoot among the elimination position control.
Each step of the zero overshoot unmanned aerial vehicle position control method based on deep reinforcement learning according to the embodiment of the present disclosure is specifically described below.
FIG. 2 is a block diagram of a reinforcement learning algorithm of the present disclosure. The reinforcement learning algorithm is based on a near-end strategy optimization algorithm framework, and as shown in fig. 2, the algorithm framework comprises an unmanned aerial vehicle control algorithm unit and a state evaluation unit.
Wherein, to four rotor unmanned aerial vehicle, unmanned aerial vehicle control algorithm unit is used for mapping unmanned aerial vehicle current state as control signal, and its control signal is four screw rotational speeds in four rotor models.
The state evaluation unit is used for evaluating the current state of the unmanned aerial vehicle, and the evaluation standard is the difference between the current speed and the target control speed. If the difference is large or it takes a long time to achieve the speed state transition, the output value of the state evaluation function will be small, and conversely, it will be large.
Specifically, the reinforcement learning algorithm of the present disclosure includes two steps:
the first step S1 is a training phase, in which the unmanned aerial vehicle control algorithm unit and the state evaluation unit are trained, and the optimal mapping from the current state of the unmanned aerial vehicle to the four control signals of the unmanned aerial vehicle is finally fitted through the alternate training of the unmanned aerial vehicle control algorithm unit and the state evaluation unit, that is, the unmanned aerial vehicle control algorithm unit obtains the control model for controlling the speed of the unmanned aerial vehicle.
In the reinforcement learning algorithm of the present disclosure, because the features that the control algorithm unit needs to extract are similar to the features that the state evaluation unit needs to extract, a feature extraction network with a similar structure is designed for the unmanned aerial vehicle control algorithm unit and the state evaluation unit, and fig. 3 is a schematic structural diagram of the feature extraction network of the unmanned aerial vehicle control algorithm unit and the state evaluation unit in the embodiment of the present disclosure. As shown in fig. 3, the feature extraction networks of the unmanned aerial vehicle control algorithm unit and the state evaluation unit each include an input layer, a first fully-hidden layer, a second fully-hidden layer, and an output layer. The two feature extraction networks only have difference in the dimension of an output layer, wherein the output dimension of the unmanned aerial vehicle control algorithm unit is 8 layers, and the output dimension of the state evaluation unit is 1 layer. The specific arrangement of the unmanned aerial vehicle control algorithm unit and the state evaluation unit is shown in table 1.
TABLE 1
Figure BDA0002336096800000071
Specifically, the 8-dimensional quantities output by the unmanned aerial vehicle control algorithm unit are parameters of four propeller control strategies respectively, the strategy of each propeller is represented by a Beta distribution, and the Beta distribution needs two quantities a and b to be expressed, so that the total output dimension is 8.
Further, the state evaluation unit adopts a reward mechanism:
Figure BDA0002336096800000072
wherein the content of the first and second substances,
Figure BDA0002336096800000073
the differences between the current speed and the target speed of the unmanned plane on three coordinate axes of x, y and z respectively,
Figure BDA0002336096800000074
the penalty function is constructed according to the speed difference of the unmanned aerial vehicle, wherein k is a proportionality coefficient between the speed difference and the penalty amount; r isscalarThe regular amount is used to make the reward positive, so as to ensure the convergence of the algorithm, and finally the bonus item is used for rewarding the control agent when the speed control reaches the allowable error range. The final control model converges to speed-optimal control at the excitation of the reward function.
The second step S2 of the reinforcement learning algorithm of this embodiment is a drone position control phase, i.e. performing active position control on the drone. At the moment, a state evaluation unit in the algorithm frame is omitted, and the unmanned aerial vehicle is controlled through a speed control model of an unmanned aerial vehicle control algorithm unit; meanwhile, optimal search is carried out on PID parameters of the control algorithm, the speed control algorithm of the unmanned aerial vehicle is converted into a position control algorithm by the PID control algorithm, and overshoot in position control is eliminated.
Specifically, fig. 4 is a flowchart of the unmanned aerial vehicle position control according to the embodiment of the present disclosure. As shown in fig. 4, the drone position control process includes:
firstly, a desired target position is provided for the unmanned aerial vehicle, and the target speed of the unmanned aerial vehicle is solved through a PID method, namely
Figure BDA0002336096800000075
Wherein, Kp,KdFor proportional and derivative term coefficients of the drone, etIs the difference between the unmanned plane position and the target position,
Figure BDA0002336096800000081
is the target speed of the unmanned aerial vehicle.
The target speed
Figure BDA0002336096800000082
The mapping to the unmanned aerial vehicle rotor speed can be obtained by the reinforcement learning model, so that the control model for controlling the speed of the unmanned aerial vehicle can be converted into the control model for controlling the position of the unmanned aerial vehicle.
If position control is directly adopted as a training target of reinforcement learning, the problem of overshoot can occur in the unmanned aerial vehicle control algorithm, and the overshoot is a considerable challenge to the safety problem of the unmanned aerial vehicle, so the scheme of combining unmanned aerial vehicle speed control and a PID model is adopted in the embodiment for improvement. For the PID control algorithm, the integral term causes overshoot of the drone, so here the integral term is deleted, leaving only the proportional and derivative terms. By reducing the proportion term, the integral term existing in the speed control of the unmanned aerial vehicle can be erased, and further the position control of the unmanned aerial vehicle is effectively realized.
Fig. 5 is a schematic diagram of a three-axis response curve controlled by the drone according to an embodiment of the present disclosure. In fig. 5, the abscissa is time t, and the ordinate is a response curve of three coordinate axes, as can be seen from fig. 5, the control method of this embodiment can implement position control of the unmanned aerial vehicle in about 10s, and the problem of overshoot does not occur in any of the three coordinate axes in the position control of the unmanned aerial vehicle.
In a second exemplary embodiment of the present disclosure, a zero overshoot drone position control device based on deep reinforcement learning is provided. The control device comprises a readable storage medium and one or more processors, wherein the readable storage medium is used for storing executable instructions; the one or more processors execute the control method according to the previous embodiment according to the executable instructions.
The following specifically describes the algorithm with an embodiment that applies the unmanned aerial vehicle control algorithm of the present disclosure to an unmanned aerial vehicle article placement (or delivery) scene.
Example one
In this embodiment, carry out position control to four rotor unmanned aerial vehicle, put (or article are delivered) the task and combine together with unmanned aerial vehicle position control algorithm through putting unmanned aerial vehicle article arrangement, realized putting of article according to preset position. Fig. 6 is a flowchart of a method for arranging and placing articles by using an unmanned aerial vehicle according to an embodiment of the present disclosure. As shown in fig. 6, the method for arranging and placing articles by using the unmanned aerial vehicle includes:
s101, acquiring the position of an article to be sorted and placed (or delivered);
s102, controlling the position of the unmanned aerial vehicle by adopting a reinforcement learning unmanned aerial vehicle position control algorithm, so that the unmanned aerial vehicle hovers above the current position of the articles to be sorted and placed;
s103, grabbing the articles to be arranged (or delivered) by the mechanical claws;
s104, controlling the position of the unmanned aerial vehicle by adopting a reinforcement learning unmanned aerial vehicle position control algorithm, so that the unmanned aerial vehicle hovers above the target position of the articles to be sorted and placed (or delivered);
and S105, releasing the mechanical claw to finish the arrangement and placement (or delivery) of the articles.
The rotating speeds of the four propellers of the unmanned aerial vehicle are controlled, and the rotating speeds are converted into the control of the position of the unmanned aerial vehicle by combining a PID algorithm, so that the object can be accurately placed (or delivered) at the target position.
So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.
And the shapes and sizes of the respective components in the drawings do not reflect actual sizes and proportions, but merely illustrate the contents of the embodiments of the present disclosure. Furthermore, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.
Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.
The use of ordinal numbers such as "first," "second," "third," etc., in the specification and claims to modify a corresponding element does not by itself connote any ordinal number of the element or any ordering of one element from another or the order of manufacture, and the use of the ordinal numbers is only used to distinguish one element having a certain name from another element having a same name.
In addition, unless steps are specifically described or must occur in sequence, the order of the steps is not limited to that listed above and may be changed or rearranged as desired by the desired design. The embodiments described above may be mixed and matched with each other or with other embodiments based on design and reliability considerations, i.e., technical features in different embodiments may be freely combined to form further embodiments.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, this disclosure is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the present disclosure as described herein, and any descriptions above of specific languages are provided for disclosure of enablement and best mode of the present disclosure.
The disclosure may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. Various component embodiments of the disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in the relevant apparatus according to embodiments of the present disclosure. The present disclosure may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present disclosure may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also in the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (10)

1. A zero overshoot unmanned aerial vehicle position control method based on deep reinforcement learning comprises the following steps:
s1, constructing a reinforcement learning training frame aiming at unmanned aerial vehicle speed control based on a near-end strategy optimization algorithm, and training an unmanned aerial vehicle control model by combining a feature extraction network to obtain the unmanned aerial vehicle speed control model;
s2, control unmanned aerial vehicle add PID control ring in unmanned aerial vehicle ' S the control model outside, carry out the optimum search to the PID parameter, utilize PID control algorithm to turn into unmanned aerial vehicle speed control ' S control model into unmanned aerial vehicle position control ' S control model to overshoot among the elimination position control.
2. The position control method of the unmanned aerial vehicle with zero overshoot according to claim 1, wherein in step S1, the control model of the unmanned aerial vehicle speed control is a markov model, and the observation states thereof are:
Figure FDA0002336096790000011
wherein the content of the first and second substances,
Figure FDA0002336096790000012
respectively the velocity in the x, y, z directions, phit,θt,ψtThe unmanned aerial vehicle posture expressed by Euler angle is adopted,
Figure FDA0002336096790000013
for unmanned aerial vehicle EulerAngle corresponds to angular velocity.
3. The method of claim 1, wherein the step S1 includes:
and alternately training the unmanned aerial vehicle control algorithm unit and the state evaluation unit of the reinforcement learning training frame, and fitting out the optimal mapping from the current state of the unmanned aerial vehicle to the unmanned aerial vehicle control signal.
4. The zero overshoot drone position control method of claim 3 wherein,
the unmanned aerial vehicle control algorithm unit maps the current state of the unmanned aerial vehicle into a control signal, and the control signal is an unmanned aerial vehicle speed control signal;
the state evaluation unit evaluates the current state of the unmanned aerial vehicle, and the evaluation standard is the difference between the current state and the target state.
5. The zero overshoot drone position control method of claim 4 wherein,
the state evaluation unit evaluating the current state of the unmanned aerial vehicle comprises: and judging the difference between the current speed and the target control speed, wherein if the difference is larger or the time required for realizing the speed state transition is longer, the output value of the state evaluation function is smaller, otherwise, the output value is larger.
6. The zero overshoot drone position control method of claim 5 wherein said determining the difference between the current speed and the target control speed comprises:
and calculating the integral of the difference between the current speed of the unmanned aerial vehicle and the target speed of the unmanned aerial vehicle in a time range.
7. The zero overshoot drone position control method of claim 4 wherein the state evaluation unit employs a reward mechanism, the final control model converging to speed optimal control under the incentive of which the reward mechanism employs a reward function of:
Figure FDA0002336096790000021
wherein the content of the first and second substances,
Figure FDA0002336096790000022
the differences between the current speed and the target speed of the unmanned plane on three coordinate axes of x, y and z respectively,
Figure FDA0002336096790000023
a penalty function is constructed according to the speed difference of the unmanned aerial vehicle, and k is a proportionality coefficient between the speed difference and a penalty amount; r isscalarA normal amount such that the reward is positive; the bonus term awards the control agent for speed control within the allowable error.
8. The position control method of the unmanned aerial vehicle with the zero overshoot according to claim 2, wherein the unmanned aerial vehicle control algorithm unit and the state evaluation unit adopt a feature extraction network structure, the feature extraction networks of the unmanned aerial vehicle control algorithm unit and the state evaluation unit each comprise an input layer, a first fully-hidden layer, a second fully-hidden layer and an output layer, the dimensions of the input layer, the first fully-hidden layer and the second fully-hidden layer are the same, and the dimension output by the unmanned aerial vehicle control algorithm unit is determined according to the control strategy parameters of the unmanned aerial vehicle power device.
9. The zero overshoot drone position control method of claim 1, the step S2 comprising:
providing the desired target position for the drone, resolving the target speed of the drone by means of a PID method, i.e.
Figure FDA0002336096790000024
Wherein, Kp,KdFor proportional and derivative term coefficients of the drone, etFor the position of the unmanned aerial vehicle andthe difference between the positions of the targets,
Figure FDA0002336096790000025
is the target speed of the unmanned aerial vehicle;
obtaining the target speed by reinforcement learning
Figure FDA0002336096790000031
The mapping to the speed of the power device of the unmanned aerial vehicle turns the control model of the speed control of the unmanned aerial vehicle into the control model of the position control of the unmanned aerial vehicle, and the control model of the position control of the unmanned aerial vehicle is combined with the PID algorithm model, the integral term of the PID algorithm model is deleted, the proportional term and the differential term are reserved, and the position control of the zero overshoot of the unmanned aerial vehicle is realized.
10. The utility model provides a zero overshoot unmanned aerial vehicle position control device based on degree of depth reinforcement learning, includes:
a readable storage medium to store executable instructions;
one or more processors executing the control method of any one of claims 1-9 in accordance with the executable instructions.
CN201911363490.0A 2019-12-25 2019-12-25 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning Active CN111026147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911363490.0A CN111026147B (en) 2019-12-25 2019-12-25 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911363490.0A CN111026147B (en) 2019-12-25 2019-12-25 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN111026147A true CN111026147A (en) 2020-04-17
CN111026147B CN111026147B (en) 2021-01-08

Family

ID=70213642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911363490.0A Active CN111026147B (en) 2019-12-25 2019-12-25 Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN111026147B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN113268081A (en) * 2021-05-31 2021-08-17 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
CN113359703A (en) * 2021-05-13 2021-09-07 浙江工业大学 Mobile robot line-following system suitable for various complex paths
CN113359704A (en) * 2021-05-13 2021-09-07 浙江工业大学 Self-adaptive SAC-PID method suitable for complex unknown environment
CN114237267A (en) * 2021-11-02 2022-03-25 中国人民解放军海军航空大学航空作战勤务学院 Flight maneuver decision auxiliary method based on reinforcement learning
CN114428517A (en) * 2022-01-26 2022-05-03 海南大学 Unmanned aerial vehicle unmanned ship cooperation platform end-to-end autonomous landing control method

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751529B1 (en) * 2002-06-03 2004-06-15 Neural Robotics, Inc. System and method for controlling model aircraft
US9189730B1 (en) * 2012-09-20 2015-11-17 Brain Corporation Modulated stochasticity spiking neuron network controller apparatus and methods
US20170118688A1 (en) * 2015-10-23 2017-04-27 The Florida International University Board Of Trustees Interference and mobility management in uav-assisted wireless networks
CN106843245A (en) * 2016-12-01 2017-06-13 北京京东尚科信息技术有限公司 A kind of UAV Attitude control method, device and unmanned plane
CN107943022A (en) * 2017-10-23 2018-04-20 清华大学 A kind of PID locomotive automatic Pilot optimal control methods based on intensified learning
EP3422130A1 (en) * 2017-06-29 2019-01-02 The Boeing Company Method and system for autonomously operating an aircraft
CN109739090A (en) * 2019-01-15 2019-05-10 哈尔滨工程大学 A kind of autonomous type underwater robot neural network intensified learning control method
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
CN110083168A (en) * 2019-05-05 2019-08-02 天津大学 Small-sized depopulated helicopter based on enhancing study determines high control method
CN110275432A (en) * 2019-05-09 2019-09-24 中国电子科技集团公司电子科学研究院 Unmanned plane based on intensified learning hangs load control system
KR102032067B1 (en) * 2018-12-05 2019-10-14 세종대학교산학협력단 Remote control device and method of uav based on reforcement learning
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6751529B1 (en) * 2002-06-03 2004-06-15 Neural Robotics, Inc. System and method for controlling model aircraft
US9189730B1 (en) * 2012-09-20 2015-11-17 Brain Corporation Modulated stochasticity spiking neuron network controller apparatus and methods
US20170118688A1 (en) * 2015-10-23 2017-04-27 The Florida International University Board Of Trustees Interference and mobility management in uav-assisted wireless networks
CN106843245A (en) * 2016-12-01 2017-06-13 北京京东尚科信息技术有限公司 A kind of UAV Attitude control method, device and unmanned plane
JP2019059461A (en) * 2017-06-29 2019-04-18 ザ・ボーイング・カンパニーThe Boeing Company Method and system for autonomously operating aircraft
EP3422130A1 (en) * 2017-06-29 2019-01-02 The Boeing Company Method and system for autonomously operating an aircraft
CN107943022A (en) * 2017-10-23 2018-04-20 清华大学 A kind of PID locomotive automatic Pilot optimal control methods based on intensified learning
KR102032067B1 (en) * 2018-12-05 2019-10-14 세종대학교산학협력단 Remote control device and method of uav based on reforcement learning
CN109739090A (en) * 2019-01-15 2019-05-10 哈尔滨工程大学 A kind of autonomous type underwater robot neural network intensified learning control method
CN109992000A (en) * 2019-04-04 2019-07-09 北京航空航天大学 A kind of multiple no-manned plane path collaborative planning method and device based on Hierarchical reinforcement learning
CN110083168A (en) * 2019-05-05 2019-08-02 天津大学 Small-sized depopulated helicopter based on enhancing study determines high control method
CN110275432A (en) * 2019-05-09 2019-09-24 中国电子科技集团公司电子科学研究院 Unmanned plane based on intensified learning hangs load control system
CN110488861A (en) * 2019-07-30 2019-11-22 北京邮电大学 Unmanned plane track optimizing method, device and unmanned plane based on deeply study

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ERLEND M.COATES 等: "Deep Reinforcement Learning Attitude Control of Fixed-Wing UAVs Using Proximal Policy Optimization", 《2019 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS)》 *
WILLIAM KOCH 等: "Reinforcement Learning for UAV Attitude Control", 《ACM TRANSACTIONS ON CYBER-PHYSICAL SYSTEMS》 *
李婷: "基于强化学习的无人机悬挂负载系统控制研究", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》 *
胡天琦 等: "四旋翼飞行器智能飞行控制系统研究与验证", 《科技创新导报》 *
蔡文澜 等: "基于增强学习的无人直升机姿态控制器设计", 《弹箭与制导学报》 *
郝钏钏 等: "基于参考模型的输出反馈强化学习控制", 《浙江大学学报(工学版)》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111786713A (en) * 2020-06-04 2020-10-16 大连理工大学 Unmanned aerial vehicle network hovering position optimization method based on multi-agent deep reinforcement learning
CN113359703A (en) * 2021-05-13 2021-09-07 浙江工业大学 Mobile robot line-following system suitable for various complex paths
CN113359704A (en) * 2021-05-13 2021-09-07 浙江工业大学 Self-adaptive SAC-PID method suitable for complex unknown environment
CN113268081A (en) * 2021-05-31 2021-08-17 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
CN113268081B (en) * 2021-05-31 2021-11-09 中国人民解放军32802部队 Small unmanned aerial vehicle prevention and control command decision method and system based on reinforcement learning
CN114237267A (en) * 2021-11-02 2022-03-25 中国人民解放军海军航空大学航空作战勤务学院 Flight maneuver decision auxiliary method based on reinforcement learning
CN114237267B (en) * 2021-11-02 2023-11-24 中国人民解放军海军航空大学航空作战勤务学院 Flight maneuver decision assisting method based on reinforcement learning
CN114428517A (en) * 2022-01-26 2022-05-03 海南大学 Unmanned aerial vehicle unmanned ship cooperation platform end-to-end autonomous landing control method

Also Published As

Publication number Publication date
CN111026147B (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN111026147B (en) Zero overshoot unmanned aerial vehicle position control method and device based on deep reinforcement learning
Shaobo et al. A collision avoidance decision-making system for autonomous ship based on modified velocity obstacle method
Moon et al. Challenges and implemented technologies used in autonomous drone racing
Floreano et al. Science, technology and the future of small autonomous drones
Naidoo et al. Quad-Rotor unmanned aerial vehicle helicopter modelling & control
US20180164124A1 (en) Robust and stable autonomous vision-inertial navigation system for unmanned vehicles
Qi et al. Autonomous landing solution of low-cost quadrotor on a moving platform
Zhao et al. Route planning for autonomous vessels based on improved artificial fish swarm algorithm
DE102020120357A1 (en) SYSTEM AND PROCEDURE FOR SIMULATIONS OF VEHICLE-BASED ITEM DELIVERY
Levin et al. Agile maneuvering with a small fixed-wing unmanned aerial vehicle
Xie et al. Application of improved Cuckoo search algorithm to path planning unmanned aerial vehicle
Zhou et al. Design and implementation of a novel obstacle avoidance scheme based on combination of CNN-based deep learning method and liDAR-based image processing approach
CN106292297A (en) Based on PID controller and the attitude control method of L1 adaptive controller
Kownacki et al. Precision landing tests of tethered multicopter and VTOL UAV on moving landing pad on a lake
Zhang et al. Bio-inspired vision based robot control using featureless estimations of time-to-contact
Roberge et al. Fast path planning for unmanned aerial vehicle using embedded GPU System
Trunov Transformation of operations with fuzzy sets for solving the problems on optimal motion of crewless unmanned vehicles
Serhat Development stages of a semi-autonomous underwater vehicle experiment platform
Kang et al. Autonomous waypoint guidance for tilt-rotor unmanned aerial vehicle that has nacelle-fixed auxiliary wings
Zufferey et al. Optic flow to steer and avoid collisions in 3D
Doyle et al. The vulnerability of UAVs: An adversarial machine learning perspective
Leung et al. A second generation micro-vehicle testbed for cooperative control and sensing strategies
Wang et al. Architecture design and flight control of a novel octopus shaped multirotor vehicle
Marcu Fuzzy logic approach in real-time UAV control
CN107632166A (en) A kind of historical wind speed based on unmanned plane big data obtains system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant