CN109655066A - One kind being based on the unmanned plane paths planning method of Q (λ) algorithm - Google Patents
One kind being based on the unmanned plane paths planning method of Q (λ) algorithm Download PDFInfo
- Publication number
- CN109655066A CN109655066A CN201910071929.6A CN201910071929A CN109655066A CN 109655066 A CN109655066 A CN 109655066A CN 201910071929 A CN201910071929 A CN 201910071929A CN 109655066 A CN109655066 A CN 109655066A
- Authority
- CN
- China
- Prior art keywords
- state
- unmanned plane
- value
- space
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000009187 flying Effects 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 18
- 230000007704 transition Effects 0.000 claims abstract description 10
- 238000004387 environmental modeling Methods 0.000 claims abstract description 8
- 238000010276 construction Methods 0.000 claims abstract description 7
- 238000013461 design Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 29
- 238000004364 calculation method Methods 0.000 claims description 23
- 230000009471 action Effects 0.000 claims description 17
- 238000012546 transfer Methods 0.000 claims description 13
- 230000006399 behavior Effects 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000009825 accumulation Methods 0.000 claims description 2
- 238000011217 control strategy Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
Landscapes
- Engineering & Computer Science (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Aviation & Aerospace Engineering (AREA)
- Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
- Catching Or Destruction (AREA)
Abstract
The present invention provides a kind of unmanned plane mission planning methods for being based on Q (λ) algorithm, including environmental modeling step, markov decision process model initialization step, Q (λ) algorithm iteration calculates step, optimal path step is calculated according to state value function, grid space is initialized according to unmanned plane minimum track segment length first, grid space coordinate is mapped as way point, and round and polygon threatening area is indicated, then Markovian decision model is established, it is indicated including unmanned plane during flying motion space, the design of state transition probability, the construction of reward function, then calculating is iterated on the basis of the model of building using Q (λ) algorithm, and the optimal path that can avoid the unmanned plane of threatening area safely is calculated according to final convergent state value function, the present invention learns traditional Q Algorithm is combined with effectiveness tracking, improves the convergent speed of cost function and precision, and guidance unmanned plane avoids threatening area and carries out autonomous path planning.
Description
Technical field
The present invention relates to a kind of unmanned plane, specifically a kind of unmanned plane paths planning method belongs to heuritic approach
Technical field.
Background technique
Unmanned plane path planning is the important component of unmanned plane mission planning, is to realize that unmanned plane independently executes task
Important stage.Unmanned plane path planning requires to cook up known to given known, part or in the environment of totally unknown information
Target point is reached from starting point, it can be around threatening area and barrier, safe and reliable collisionless and meet various constraint items simultaneously
The flight track of part.Path planning is divided into global path planning by the acquisition situation of the battlefield surroundings information according to locating for unmanned plane
And local paths planning.
In practical applications, if unmanned function obtains global context knowledge, Dynamic Programming realizing route rule can be used
It draws.Complexity and uncertain increase however as battlefield surroundings, the priori knowledge of the few environment of unmanned plane, so in reality
Need unmanned plane that there is the stronger ability for adapting to dynamic environment in the application of border.In this case, sensor information is depended on
The technology that real-time perception threatening area information carries out local paths planning just shows huge superiority.
There is algorithms easily to fall into local minimum or local oscillation, algorithm time cost for current local paths planning technology
Big and computerized information amount of storage is big, rule is difficult to the problems such as determining.And the unmanned plane paths planning method of Behavior-based control is known as
The hot spot studied now, essence are exactly that the ambient condition that sensor perceives is mapped to the movement of actuator, Behavior-based control
It is often extremely difficult in actual complex environment to the acquisition of the design of state feature vector and the sample for having supervision in method
's.Therefore these problems are urgently to be resolved.
Summary of the invention
The object of the present invention is to provide a kind of unmanned plane mission planning methods for being based on Q (λ) algorithm, learn in conjunction with Q and imitate
With tracking (Eligibility Traces), the rewards and punishments signal of quantization is given to the ambient condition of sensor perception, by continuous
With the interaction of environment, guides unmanned plane to carry out autonomous path planning and avoided threatening area safely, realize to external environment
The quick response of variation has the advantages that quick, real-time, promotion unmanned plane adaptability under unknown or part circumstances not known.
The present invention provides a kind of unmanned plane paths planning method for being based on Q (λ) algorithm, it is characterised in that: including following step
It is rapid:
Step 1, environmental modeling: utilizing the collected environmental information of sensor, identifies threatening area, using Grid Method by nothing
Man-machine flight environment of vehicle is modeled, and by continuous spatial discretization, generates uniform grid chart according to the space size of setting, will
Grid vertex is as the way point after discrete;
Step 2, initialize markov decision process model: initialization is suitable for solving the unmanned plane path planning
Markov decision process model, the markov decision process model can use four-tuple<S, A, P, and R>expression, S are nothing
Man-machine state in which space, A are the motion space of unmanned plane, and P is state-transition matrix, and R is reward function, and Markov is determined
The initialization of plan process model includes the expression to unmanned plane during flying motion space, the design of state transition probability and reward function
Construction;
Step 3, it on the model established, is calculated using Q (λ) algorithm iteration: in the model that step 1 and step 2 are established
On the basis of, calculating is iterated using Q (λ) algorithm for combining Q-learning algorithm and effectiveness to track;Introduce state action valence
Value function Q (s takes the value of movement a a) to characterize unmanned plane in state s, establishes Q table and stores each state action to<s, and a>
Value;Introduce effectiveness tracking function E (s, a) indicate final state and state behavior to<s, a>causality;It carries out first
Q value and the initialization of E value, then in each learning cycle, the movement taken under s state by Boltzmann policy selection
a;After execution movement a is transferred to NextState s', Q (s, value a), and pass through E value more new formula are updated by Q value more new formula
The E value for updating all state actions pair, when reaching final state, when secondary learning cycle terminates, until reaching maximum study week
After issue, Q (λ) algorithm iteration calculating process terminates;
Step 4, optimal path is calculated according to state value function: obtains convergent state value function after step 3,
The movement a* with maximum Q value can be then selected at state s, continue to use deterministic strategy after taking movement a*, until
Final state is reached, the node in grid is finally mapped into longitude and latitude and then obtains optimal path.
As the specific steps for further defining that step 1 environmental modeling of the invention are as follows:
Step 1.1 initializes grid space according to unmanned plane minimum track segment length;
Unmanned plane fly between several destinations be along rectilinear flight, and when reaching certain destinations according to track requirement and
Change of flight posture, minimum track segment length are the most short distances that limits unmanned plane and must fly nonstop to before starting change of flight posture
From with unmanned plane minimum track segment length setting step-length, the available Discrete Grid space for meeting unmanned plane itself constraint;
The latitude and longitude coordinates that unmanned plane start position is arranged are S=(lonS,latS), the latitude and longitude coordinates of target point are T
=(lonT,latT), unmanned plane minimum track segment length is dmin, the size of grid space is m*n, by dminIt is set as grid step
It grows, then the calculation formula of m, n are as follows:
Grid space coordinate is mapped as way point by step 1.2;
Using vertex raster as the way point after discrete, the coordinate in grid space uses (x, y) to indicate, setting grid is empty
Between the corresponding latitude and longitude coordinates of origin (0,0) be (lono,lato), then (x, y) corresponding way point latitude and longitude coordinates (lonxy,
latxy) calculation formula it is as follows: lonxy=lono+dmin*x,latxy=lato+dmin*y。
The expression of step 1.3 threatening area information;
Unmanned plane will consider the spatial position in threat source in flight course, be divided into threatening area according to threat source category
Node label containing threatening area is 1, is expressed as no-fly zone by border circular areas and polygonal region in grid space
Domain, the node label without containing threatening area are 0, are expressed as that region can be flown;For round threatening area, the setting area center of circle is sat
It is designated as (lonc,latc), threatening area radius is r (km), for each node (x, y) in grid, according to haversine public affairs
Distance d of the corresponding way point of formula calculate node to the threat area center of circlexyo, haversine equation is calculated according to latitude and longitude coordinates
Distance on spherical surface between two points;
If dxyo(x, y) corresponding node label is then 1, is otherwise labeled as 0 by≤r, for polygon threatening area,
With way point (lonxy,latxy) start, horizontal direction to the right (or to the left) makees a ray, calculates the ray and polygon area
The intersection point number in domain, if intersection point number is odd number, way point, which is located at polygon, to be threatened in area, is by (x, y) node label
1, if intersection point number is even number, threatened outside area in polygon, is 1 by node label.
As the specific steps for further defining that the step 2 markov decision process model initialization of the invention
Are as follows:
Step 2.1 indicates unmanned plane during flying motion space
Using grid vertex as way point in grid space, then a vertex to another vertex shares eight transfer sides
To (except boundary point);Certain limitation is done to shift direction according to the constraint of unmanned plane itself and the threat in space distribution, it will
The behavior of unmanned plane is generalized for discrete movement space, by course state with 45 ° for interval carry out discretization, can obtain 8 from
Bulk state;According to the discretization course state of setting, 5 unmanned plane during flying movements are set, and flying nonstop to is indicated with number 0, turned right
45 ° are indicated with 1, and turning left 45 ° is indicated with 2, and turning right 90 ° is indicated with 3, turns left 90 ° to indicate that then motion space is expressed as A=with 4
[0,1,2,3,4], each number respectively indicate a movement;
Step 2.2 design point transition probability
After state transition probability refers to that execution acts under a certain air route state when unmanned plane, another air route state is reached
Conditional probability is usedIt indicates, represents the probability that unmanned plane execution movement a at state s is transferred to state s';
Since at study initial stage, unmanned plane is unknown to environment, easily enter threatening area, unmanned plane enters threatening area i.e.
Representing a learning cycle terminates, and is confined near original state to the exploration of environment, so setting is moved when what unmanned plane was taken
It is will lead into threatening area or when will lead to unmanned plane leave state space, generating state does not shift, i.e. unmanned plane
State does not change, and is transferred to the state that movement is directed toward for 100% under the conditions of remaining;The state space of unmanned plane is S, is threatened
Regional space is O, thenCalculation formula are as follows:
The construction of step 2.3 reward function
Unmanned plane carries out that instant reward can be obtained when way point is transferred into next state, is based on the study of Q (λ) algorithm
Target be exactly maximize accumulation immediately reward, the construction of reward function to consider influence track performance various indexs, including away from
Distance, flight safety, threat degree of target point etc.;Indicate that unmanned plane takes movement a to be transferred to s' state at state s
The instant reward function obtained, calculation formula is as follows, wherein w1、w2、w3For weighting coefficient, fd、fo、faFor by normalization
The route evaluation factor of reason;
fdIt indicates visibility, takes state s' away from the inverse of target point distance, the latitude and longitude coordinates of s' are s'=(lons',
lats'), the latitude and longitude coordinates of target point are T=(lonT,latT), fdCalculation formula are as follows:
foIndicate threatening area to the threat degree of state s',Wherein IoIt indicates to the current shape of unmanned plane
There is the threat area set threatened in state transfer,It indicates to threaten area oiTo the threat degree of s', area o is threatenediLatitude and longitude coordinates
For Calculation formula are as follows:
faIndicate that the penalty term acted to unmanned plane during flying, the flare maneuver that unmanned plane is taken are to influence unmanned plane during flying peace
Full key factor;According to the unmanned plane during flying motion space that step 2.1 is arranged, by faProcessing is discrete function:
As of the invention further defining that, the step 3 on the model established, is calculated using Q (λ) algorithm iteration
Specific steps are as follows:
Step 3.1 initializes Q table
To each state action in Q table to Q (s, a) carries out the initialization of Q value, Q (s,~) indicate s state lower it is stateful
The initial value of movement pair, sTIndicate final state, then Q (s, calculation formula a) are as follows:
If s is final state, initial Q value is 0, otherwise sets s and s for Q valueTDistance inverse, s state pair
The coordinate answered is (x, y), sTThe corresponding coordinate of state is (xT,yT), dssTCalculation formula are as follows:
Step 3.2 initializes E value
When each learning cycle starts, by all state actions to<s, a>E value E (s a) is initialized as 0;
Step 3.3 carries out movement selection using Boltzmann Distribution Strategy.
In each learning cycle, first setting original state, then according to Boltzmann Distribution Strategy selection act into
The transfer of row state;Probability p (a | s) calculation formula of movement a is taken under s state are as follows:
Wherein T indicates temperature coefficient, for the exploration intensity of control strategy.Biggish temperature can be used at study initial stage
Coefficient is gradually reduced temperature coefficient to guarantee stronger tactful exploring ability later.Then it is selected according to p (a | s) using wheel disc method
Movement a is selected, and (s, value a) add one by E;
Step 3.4 updates Q value
Unmanned plane takes steps the movement a of 3.2 selections at state s, is transferred to state s', and obtain reward r immediately, then
Q (s, more new formula a) are as follows:
Q (s, a)=Q (s, a)+α * (r+ γ * maxaQ(s′,a)-Q(s,a))*E(s,a)
Wherein α is learning rate, and γ is discount factor, and γ indicates the attention degree to future reward, maxa(s' a) is Q
Maximum Q value under state s';
Step 3.5 updates E value
To all state actions to E (s, more new formula a) are as follows: ((s, a), wherein λ is weight ginseng to E by s, a)=λ * E
Number, when state s' is final state, then this learning cycle terminates, and into next learning cycle, is otherwise transferred to s' state,
And return step 3.2, continue learning process;
Further define that the step 4 according to the specific step of state value function calculating optimal path as of the invention
Suddenly are as follows:
Step 4.1 carries out state transfer using deterministic policy
After step 3, state value Q has restrained;Original state s is set first, and selection has maximum under s state
The movement a* of Q value, and state transfer is carried out, act the selection formula of a* are as follows: a*=argmaxa∈A(s a) acts a when taking to Q
After being transferred to NextState s', continue that deterministic policy selection is taken to act, until reaching final state;
Mesh space is mapped to way point latitude and longitude coordinates by step 4.2
The optimal path coordinate in grid obtained in step 4.1 is mapped to way point according to the formula in step 1.2
Latitude and longitude coordinates, then obtain unmanned plane optimal path.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
1. using unmanned plane minimum track segment length as discretization step-length, it is contemplated that unmanned plane itself constraint solves
The shortcomings that discretization process of environmental modeling lacks foundation, obtains the discrete programming sky that can give full play to unmanned plane during flying ability
Between;
2. when state transition probability is arranged, when the movement that unmanned plane is taken, which will lead to it, enters threatening area, nobody
Machine not generating state shift, keep the current status unchanged the study for continuing current period, solve study initial stage nobody
Machine and environmental interaction are confined to the disadvantage near original state, improve convergence speed of the algorithm;
3.Q learning algorithm does not need global environmental knowledge, but by the method for similar trial and error, constantly handed over environment
Mutually, optimal policy is approached by optimization behavior memory function, is suitable for that unmanned plane under dynamic environment is to environment unknown or part
Unknown situation, guidance unmanned plane carry out autonomous path planning;
4. traditional Q-learning algorithm is to see a step under current state during algorithm iteration more, by learning to calculate in Q
Effectiveness is introduced in method and tracks function, the prediction of all step numbers has been comprehensively considered, so that the calculating for cost function is more acurrate.And
And effective online updating can be carried out, needing not wait for a learning cycle terminates just to carry out the update of Q value, before can abandoning
Learning data, accelerate the speed of algorithmic statement.
Detailed description of the invention
Unmanned plane discrete movement and its transfer result in Fig. 1 grid space.
Algorithm iteration flow chart in each learning cycle of Fig. 2.
Specific embodiment
Further explanation is done to the present invention with reference to the accompanying drawing.
In order to facilitate narration, the simple primary variables defined in algorithm:
The latitude and longitude coordinates of unmanned plane start position are S=(lonS,latS), the latitude and longitude coordinates of target point are T=
(lonT,latT), the size of grid space is m*n, and point coordinate is (x, y) in grid space.Markov model with four-tuple <
S, A, P, R > expression, S are unmanned plane status space, and A is unmanned plane motion space, and R is reward function, and P is state transfer
Probability matrix.
The present invention proposes a kind of unmanned plane paths planning method for being based on Q (λ) algorithm, including environmental modeling step, Ma Er
Section husband decision process model initialization step, Q (λ) algorithm iteration calculate step, calculate optimal path according to state value function
Step;
Specific step is as follows:
Step 1) environmental modeling step
The step-length of grid space is set unmanned plane minimum track segment length d by step 1.1)min;
Step 1.2) is according to formulaComputation grid space size;
Step 1.3) is according to formula lonxy=lono+dmin*x,latxy=lato+dmin* grid space coordinate is mapped as by y
Way point latitude and longitude coordinates, (lono,lato) it is grid space origin (0,0) corresponding latitude and longitude coordinates;
Node label containing threatening area is 1, indicates no-fly region by step 1.4) in grid space.It will be free of
The node label for having threatening area is 0, is expressed as that region can be flown;
Step 2) markov decision process model initialization
Step 2.1) is arranged 5 unmanned plane during flying movements, flies nonstop to and use number according to unmanned plane shift direction as shown in Figure 1
Word 0 indicates, turning right 45 ° is indicated with 1, and turning left 45 ° is indicated with 2, and 90 ° of right-hand rotation is indicated with 3, turning left 90 ° is indicated with 4, by unmanned plane
Flare maneuver space representation is A=[0,1,2,3,4], and each number respectively indicates a movement;
Step 2.2) by state transition probability be set as the movement that unmanned plane is taken will lead to its into threatening area or
When will lead to unmanned plane leave state space, generating state is not shifted, i.e., drone status does not change, general under the conditions of remaining
100% is transferred to the state that movement is directed toward.State transition probability calculation formula are as follows:
Wherein O is threatening area space;
Step 2.3) unmanned plane takes movement a to be transferred to the instant reward function that s' state obtains at state sIt calculates
Formula isWherein w1、w2、w3For weighting coefficient, fd、fo、 faFor by normalized
Route evaluation factor;
Step 2.4) fdIt indicates visibility, takes state s' away from the inverse of target point distance, the latitude and longitude coordinates of s' are s'=
(lons',lats'), the latitude and longitude coordinates of target point are T=(lonT,latT), fdCalculation formula are as follows:
Step 2.5) foIndicate threatening area to the threat degree of state s',Wherein IoIt indicates to nobody
There is the threat area set threatened in the transfer of machine current state,It indicates to threaten area oiTo the threat degree of s', area o is threatenediWarp
Latitude coordinate is Calculation formula are as follows:
Step 2.6) faIndicate the penalty term acted to unmanned plane during flying, the flare maneuver that unmanned plane is taken is to influence nobody
The key factor of machine flight safety.According to the unmanned plane during flying motion space that step 2.1 is arranged, by faProcessing is discrete function,
Step 3) is iterated calculating on the model established, using Q (λ) algorithm, and algorithm is in each learning cycle
Iterative process it is as shown in Figure 2;
To each state action in Q table, to Q, (s a) carries out the initialization of Q value to step 3.1).Q (s,~) indicate s state
Under all state actions pair initial value, sTIndicate final state, then Q (s, calculation formula a) are as follows:
Step 3.2) is when each learning cycle starts, by all state actions to<s, a>E value E (s a) is initialized
It is 0;
Original state is arranged in step 3.3);
Step 3.4) carries out movement selection according to Boltzmann Distribution Strategy, taken under s state movement a Probability p (a |
S) calculation formula are as follows:
Step 3.5) is according to formula:
Q (s, a)=Q (s, a)+α * (r+ γ * maxaQ(s′,a)-Q(s,a))*E(s,a)
To Q, (s a) is updated;
Step 3.6) according to formula E (s, a)=λ * E (s a) is updated E value:
Step 3.7) takes movement a to be transferred to NextState s', if s' is final state, this learning cycle terminates,
Return step 3.2) enter next learning cycle, otherwise return step 3.4) continue iteration.
Step 4) calculates optimal path according to state value function:
After step 3), state value Q has restrained step 4.1), first setting original state s, selects under s state
The movement a* with maximum Q value is selected, and carries out state transfer, acts the selection formula of a* are as follows: a*=argmaxa∈AQ(s,a).When
After taking movement a to be transferred to NextState s', continue that deterministic policy selection is taken to act, until reaching final state;
Step 4.2) reflects the optimal path coordinate in grid obtained in step 4.1) according to the formula in step 1.3)
The latitude and longitude coordinates of way point are penetrated into, then obtain unmanned plane optimal path.
The above, the only specific embodiment in the present invention, but scope of protection of the present invention is not limited thereto, appoints
What is familiar with the people of the technology within the technical scope disclosed by the invention, it will be appreciated that expects transforms or replaces, and should all cover
Within scope of the invention, therefore, the scope of protection of the invention shall be subject to the scope of protection specified in the patent claim.
Claims (5)
1. being based on the unmanned plane paths planning method of Q (λ) algorithm, it is characterised in that: the following steps are included:
Step 1, environmental modeling: environmental information is acquired using sensor, identifies threatening area, using Grid Method by unmanned plane during flying
Environment is modeled, and by continuous spatial discretization, uniform grid chart is generated according to the space size of setting, by grid vertex
As the way point after discrete;
Step 2, initialize markov decision process model: initialization is suitable for solving the Ma Er of the unmanned plane path planning
Section husband decision process model, the markov decision process model four-tuple<S, A, P, R>expression, S are locating for unmanned plane
State space, A be unmanned plane motion space, P is state-transition matrix, and R is reward function, markov decision process mould
Type initialization includes the construction of the expression to unmanned plane during flying motion space, the design of state transition probability and reward function;
Step 3, it on the model established, is calculated using Q (λ) algorithm iteration: in the model basis that step 1 and step 2 are established
On, calculating is iterated using Q (λ) algorithm for combining Q-learning algorithm and effectiveness to track;It introduces state action and is worth letter
Number Q (s takes the value of movement a a) to characterize unmanned plane in state s, establishes Q table and stores each state action to<s, a>valence
Value;Introduce effectiveness tracking function E (s, a) indicate final state and state behavior to<s, a>causality;Q value is carried out first
It is initialized with E value, then in each learning cycle, the movement a that is taken under s state by Boltzmann policy selection;It holds
After action is transferred to NextState s' as a, Q (s, value a), and update by E value more new formula are updated by Q value more new formula
The E value of all state actions pair, when reaching final state, when secondary learning cycle terminates, until reaching maximum learning cycle number
Afterwards, Q (λ) algorithm iteration calculating process terminates;
Step 4, optimal path is calculated according to state value function: obtains convergent state value function after step 3, then may be used
To select the movement a* with maximum Q value at state s, continue after taking movement a* using deterministic strategy, until reaching
Node in grid is finally mapped to longitude and latitude and then obtains optimal path by final state.
2. the unmanned plane paths planning method according to claim 1 based on Q (λ) algorithm, it is characterised in that: the step
The specific steps of 1 environmental modeling are as follows:
Step 1.1 initializes grid space according to unmanned plane minimum track segment length;
It is along rectilinear flight that unmanned plane flies between several destinations, and while reaching certain destinations changes according to track requirement
Flight attitude, minimum track segment length are the shortest distances that limits unmanned plane and must fly nonstop to before starting change of flight posture, with
Step-length is arranged in unmanned plane minimum track segment length, can get the Discrete Grid space for meeting unmanned plane itself constraint;
The latitude and longitude coordinates that unmanned plane start position is arranged are S=(lonS,latS), the latitude and longitude coordinates of target point are T=
(lonT,latT), unmanned plane minimum track segment length is dmin, the size of grid space is m*n, by dminIt is set as grid step-length,
The then calculation formula of m, n are as follows:
Grid space coordinate is mapped as way point by step 1.2;
Using vertex raster as the way point after discrete, the coordinate in grid space uses (x, y) to indicate, setting grid space is former
The corresponding latitude and longitude coordinates of point (0,0) are (lono,lato), then (x, y) corresponding way point latitude and longitude coordinates (lonxy,latxy)
Calculation formula it is as follows: lonxy=lono+dmin*x,latxy=lato+dmin*y。
The expression of step 1.3 threatening area information;
Unmanned plane will consider the spatial position in threat source in flight course, and threatening area is divided into circle according to threat source category
Node label containing threatening area is 1, is expressed as no-fly region, is free of by region and polygonal region in grid space
The node label for having threatening area is 0, is expressed as that region can be flown;For round threatening area, setting area central coordinate of circle is
(lonc,latc), threatening area radius is r (km), for each node (x, y) in grid, according to haversine formula meter
Distance d of the corresponding way point of operator node to the threat area center of circlexyo, haversine equation is to calculate spherical surface according to latitude and longitude coordinates
Distance between upper two points;
If dxyo(x, y) corresponding node label is then 1, is otherwise labeled as 0, for polygon threatening area, with boat by≤r
Waypoint (lonxy,latxy) start, horizontal direction to the right (or to the left) makees a ray, calculates the ray and polygonal region
Intersection point number, if intersection point number is odd number, it is 1 by (x, y) node label that way point, which is located at polygon, which to be threatened in area, if
Intersection point number is even number, then threatens outside area in polygon, is 1 by node label.
3. the unmanned plane paths planning method according to claim 2 based on Q (λ) algorithm, it is characterised in that: the step
The specific steps of 2 markov decision process model initializations are as follows:
Step 2.1 indicates unmanned plane during flying motion space
Using grid vertex as way point in grid space, then a vertex to another vertex shares eight shift directions
(except boundary point);Certain limitation is done to shift direction according to the constraint of unmanned plane itself and the threat in space distribution, by nothing
Man-machine behavior is generalized for discrete movement space, by course state with 45 ° for interval carry out discretization, can obtain 8 it is discrete
State;According to the discretization course state of setting, 5 unmanned plane during flyings movements are set, fly nonstop to indicated with digital 0, turn right 45 ° with
1 indicate, turn left 45 ° indicated with 2, turn right 90 ° indicated with 3, turning left 90 ° is indicated with 4, then motion space be expressed as A=[0,1,2,
3,4], each number respectively indicates a movement;
Step 2.2 design point transition probability
After state transition probability refers to that execution acts under a certain air route state when unmanned plane, the condition of another air route state is reached
Probability is usedIt indicates, represents the probability that unmanned plane execution movement a at state s is transferred to state s';
Since at study initial stage, unmanned plane is unknown to environment, easily enter threatening area, unmanned plane enters threatening area and represents
One learning cycle terminates, and is confined near original state to the exploration of environment, so the movement meeting that setting is taken when unmanned plane
When it being caused to enter threatening area or will lead to unmanned plane leave state space, generating state is not shifted, i.e. drone status
It does not change, is transferred to the state that movement is directed toward for 100% under the conditions of remaining;The state space of unmanned plane is S, threatening area
Space is O, thenCalculation formula are as follows:
The construction of step 2.3 reward function
Unmanned plane carries out that instant reward can be obtained when way point is transferred into next state, is based on the learning objective of Q (λ) algorithm
It is exactly to maximize accumulation reward immediately, the construction of reward function will consider to influence the various indexs of track performance, including away from target
Distance, flight safety, the threat degree etc. of point;Indicate that unmanned plane takes movement a to be transferred to the acquisition of s' state at state s
Instant reward function, calculation formula is as follows, wherein w1、w2、w3For weighting coefficient, fd、fo、faFor the boat Jing Guo normalized
Mark factor of evaluation;
fdIt indicates visibility, takes state s' away from the inverse of target point distance, the latitude and longitude coordinates of s' are s'=(lons',lats'),
The latitude and longitude coordinates of target point are T=(lonT,latT), fdCalculation formula are as follows:
foIndicate threatening area to the threat degree of state s',Wherein IoIt indicates to turn unmanned plane current state
The threat area set for existing and threatening is moved,It indicates to threaten area oiTo the threat degree of s', area o is threatenediLatitude and longitude coordinates be Calculation formula are as follows:
faIndicate that the penalty term acted to unmanned plane during flying, the flare maneuver that unmanned plane is taken are to influence unmanned plane during flying safety
Key factor;According to the unmanned plane during flying motion space that step 2.1 is arranged, by faProcessing is discrete function:
4. the unmanned plane paths planning method according to claim 3 based on Q (λ) algorithm, it is characterised in that: the step
3 on the model established, the specific steps calculated using Q (λ) algorithm iteration are as follows:
Step 3.1 initializes Q table
To each state action in Q table, to Q, (s, a) carries out the initialization of Q value, and Q (s,~) indicates all state actions under s state
Pair initial value, sTIndicate final state, then Q (s, calculation formula a) are as follows:
If s is final state, initial Q value is 0, otherwise sets s and s for Q valueTDistance inverse, s state is corresponding
Coordinate is (x, y), sTThe corresponding coordinate of state is (xT,yT), dssTCalculation formula are as follows:
Step 3.2 initializes E value
When each learning cycle starts, by all state actions to<s, a>E value E (s a) is initialized as 0;
Step 3.3 carries out movement selection using Boltzmann Distribution Strategy.
In each learning cycle, original state is set first, is then acted according to Boltzmann Distribution Strategy selection and is carried out shape
State transfer;Probability p (a | s) calculation formula of movement a is taken under s state are as follows:
Wherein T indicates temperature coefficient, for the exploration intensity of control strategy.Biggish temperature coefficient can be used at study initial stage
To guarantee stronger tactful exploring ability, it is gradually reduced temperature coefficient later.Then it is selected according to p (a | s) using wheel disc method dynamic
Make a, and (s, value a) add one by E;
Step 3.4 updates Q value
Unmanned plane takes steps the movement a of 3.2 selections at state s, is transferred to state s', and obtain reward r immediately, then Q (s,
A) more new formula are as follows:
Q (s, a)=Q (s, a)+α * (r+ γ * maxaQ(s′,a)-Q(s,a))*E(s,a)
Wherein α is learning rate, and γ is discount factor, and γ indicates the attention degree to future reward, maxa(s' a) is state s' to Q
Under maximum Q value;
Step 3.5 updates E value
To all state actions to E (s, more new formula a) are as follows: E (s, a)=λ * E (and s, a), wherein λ is weight parameter, when
When state s' is final state, then this learning cycle terminates, and into next learning cycle, is otherwise transferred to s' state, and return
Step 3.2 is returned, learning process is continued.
5. the unmanned plane paths planning method according to claim 4 based on Q (λ) algorithm, it is characterised in that: the step
4 calculate the specific steps of optimal path according to state value function are as follows:
Step 4.1 carries out state transfer using deterministic policy
After step 3, state value Q has restrained;Original state s is set first, and selection has maximum Q value under s state
Movement a*, and carry out state transfer, act the selection formula of a* are as follows: a*=argmaxa∈A(s a) acts a transfer when taking to Q
To after NextState s', continue that deterministic policy selection is taken to act, until reaching final state;
Mesh space is mapped to way point latitude and longitude coordinates by step 4.2
Optimal path coordinate in grid obtained in step 4.1 is mapped to the warp of way point according to the formula in step 1.2
Latitude coordinate then obtains unmanned plane optimal path.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910071929.6A CN109655066B (en) | 2019-01-25 | 2019-01-25 | Unmanned aerial vehicle path planning method based on Q (lambda) algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910071929.6A CN109655066B (en) | 2019-01-25 | 2019-01-25 | Unmanned aerial vehicle path planning method based on Q (lambda) algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109655066A true CN109655066A (en) | 2019-04-19 |
CN109655066B CN109655066B (en) | 2022-05-17 |
Family
ID=66121623
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910071929.6A Active CN109655066B (en) | 2019-01-25 | 2019-01-25 | Unmanned aerial vehicle path planning method based on Q (lambda) algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109655066B (en) |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134140A (en) * | 2019-05-23 | 2019-08-16 | 南京航空航天大学 | A kind of unmanned plane paths planning method based on potential function award DQN under the unknown continuous state of environmental information |
CN110324805A (en) * | 2019-07-03 | 2019-10-11 | 东南大学 | A kind of radio sensor network data collection method of unmanned plane auxiliary |
CN110320931A (en) * | 2019-06-20 | 2019-10-11 | 西安爱生技术集团公司 | Unmanned plane avoidance Route planner based on Heading control rule |
CN110428115A (en) * | 2019-08-13 | 2019-11-08 | 南京理工大学 | Maximization system benefit method under dynamic environment based on deeply study |
CN110673637A (en) * | 2019-10-08 | 2020-01-10 | 福建工程学院 | Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning |
CN110726416A (en) * | 2019-10-23 | 2020-01-24 | 西安工程大学 | Reinforced learning path planning method based on obstacle area expansion strategy |
CN110879610A (en) * | 2019-10-24 | 2020-03-13 | 北京航空航天大学 | Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle |
CN111006693A (en) * | 2019-12-12 | 2020-04-14 | 中国人民解放军陆军工程大学 | Intelligent aircraft track planning system and method thereof |
CN111026157A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Intelligent aircraft guiding method based on reward remodeling reinforcement learning |
CN111123963A (en) * | 2019-12-19 | 2020-05-08 | 南京航空航天大学 | Unknown environment autonomous navigation system and method based on reinforcement learning |
CN111160755A (en) * | 2019-12-26 | 2020-05-15 | 西北工业大学 | DQN-based real-time scheduling method for aircraft overhaul workshop |
CN111328023A (en) * | 2020-01-18 | 2020-06-23 | 重庆邮电大学 | Mobile equipment multitask competition unloading method based on prediction mechanism |
CN111340324A (en) * | 2019-09-25 | 2020-06-26 | 中国人民解放军国防科技大学 | Multilayer multi-granularity cluster task planning method based on sequential distribution |
CN111399541A (en) * | 2020-03-30 | 2020-07-10 | 西北工业大学 | Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network |
CN111479216A (en) * | 2020-04-10 | 2020-07-31 | 北京航空航天大学 | Unmanned aerial vehicle cargo conveying method based on UWB positioning |
CN111538059A (en) * | 2020-05-11 | 2020-08-14 | 东华大学 | Self-adaptive rapid dynamic positioning system and method based on improved Boltzmann machine |
CN111612162A (en) * | 2020-06-02 | 2020-09-01 | 中国人民解放军军事科学院国防科技创新研究院 | Reinforced learning method and device, electronic equipment and storage medium |
CN111736461A (en) * | 2020-06-30 | 2020-10-02 | 西安电子科技大学 | Unmanned aerial vehicle task collaborative allocation method based on Q learning |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
CN112130124A (en) * | 2020-09-18 | 2020-12-25 | 北京北斗天巡科技有限公司 | Rapid calibration and error processing method for unmanned aerial vehicle management and control equipment in civil aviation airport |
CN112356031A (en) * | 2020-11-11 | 2021-02-12 | 福州大学 | On-line planning method based on Kernel sampling strategy under uncertain environment |
CN112525213A (en) * | 2021-02-10 | 2021-03-19 | 腾讯科技(深圳)有限公司 | ETA prediction method, model training method, device and storage medium |
CN113033815A (en) * | 2021-02-07 | 2021-06-25 | 广州杰赛科技股份有限公司 | Intelligent valve cooperation control method, device, equipment and storage medium |
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113176786A (en) * | 2021-04-23 | 2021-07-27 | 成都凯天通导科技有限公司 | Q-Learning-based hypersonic aircraft dynamic path planning method |
CN113867369A (en) * | 2021-12-03 | 2021-12-31 | 中国人民解放军陆军装甲兵学院 | Robot path planning method based on alternating current learning seagull algorithm |
CN114020009A (en) * | 2021-10-20 | 2022-02-08 | 中国航空工业集团公司洛阳电光设备研究所 | Terrain penetration planning method for small-sized fixed-wing unmanned aerial vehicle |
CN114115340A (en) * | 2021-11-15 | 2022-03-01 | 南京航空航天大学 | Airspace cooperative control method based on reinforcement learning |
CN114153213A (en) * | 2021-12-01 | 2022-03-08 | 吉林大学 | Deep reinforcement learning intelligent vehicle behavior decision method based on path planning |
CN115562357A (en) * | 2022-11-23 | 2023-01-03 | 南京邮电大学 | Intelligent path planning method for unmanned aerial vehicle cluster |
WO2024020923A1 (en) * | 2022-07-27 | 2024-02-01 | 苏州泽达兴邦医药科技有限公司 | Granulation process for traditional chinese medicine production, and process strategy calculation method |
CN117806340A (en) * | 2023-11-24 | 2024-04-02 | 中国电子科技集团公司第十五研究所 | Airspace training flight path automatic planning method and device based on reinforcement learning |
CN117928559A (en) * | 2024-01-26 | 2024-04-26 | 兰州理工大学 | Unmanned aerial vehicle path planning method under threat avoidance based on reinforcement learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108170147A (en) * | 2017-12-31 | 2018-06-15 | 南京邮电大学 | A kind of unmanned plane mission planning method based on self organizing neural network |
CN108171315A (en) * | 2017-12-27 | 2018-06-15 | 南京邮电大学 | Multiple no-manned plane method for allocating tasks based on SMC particle cluster algorithms |
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
CN108413959A (en) * | 2017-12-13 | 2018-08-17 | 南京航空航天大学 | Based on the Path Planning for UAV for improving Chaos Ant Colony Optimization |
US20180308371A1 (en) * | 2017-04-19 | 2018-10-25 | Beihang University | Joint search method for uav multiobjective path planning in urban low altitude environment |
-
2019
- 2019-01-25 CN CN201910071929.6A patent/CN109655066B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180308371A1 (en) * | 2017-04-19 | 2018-10-25 | Beihang University | Joint search method for uav multiobjective path planning in urban low altitude environment |
CN108413959A (en) * | 2017-12-13 | 2018-08-17 | 南京航空航天大学 | Based on the Path Planning for UAV for improving Chaos Ant Colony Optimization |
CN108171315A (en) * | 2017-12-27 | 2018-06-15 | 南京邮电大学 | Multiple no-manned plane method for allocating tasks based on SMC particle cluster algorithms |
CN108170147A (en) * | 2017-12-31 | 2018-06-15 | 南京邮电大学 | A kind of unmanned plane mission planning method based on self organizing neural network |
CN108319286A (en) * | 2018-03-12 | 2018-07-24 | 西北工业大学 | A kind of unmanned plane Air Combat Maneuvering Decision Method based on intensified learning |
Non-Patent Citations (3)
Title |
---|
YANG GAO等: "Multi-UAV Task Allocation Based on Improved Algorithm of Multi-objective Particle Swarm Optimization", 《2018 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC)》 * |
郝钏钏等: "基于Q学习的无人机三维航迹规划算法", 《上海交通大学学报》 * |
陈侠等: "应用改进神经网络的无人机三维航迹规划", 《电光与控制》 * |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134140A (en) * | 2019-05-23 | 2019-08-16 | 南京航空航天大学 | A kind of unmanned plane paths planning method based on potential function award DQN under the unknown continuous state of environmental information |
CN110134140B (en) * | 2019-05-23 | 2022-01-11 | 南京航空航天大学 | Unmanned aerial vehicle path planning method based on potential function reward DQN under continuous state of unknown environmental information |
CN110320931A (en) * | 2019-06-20 | 2019-10-11 | 西安爱生技术集团公司 | Unmanned plane avoidance Route planner based on Heading control rule |
CN110324805A (en) * | 2019-07-03 | 2019-10-11 | 东南大学 | A kind of radio sensor network data collection method of unmanned plane auxiliary |
CN110324805B (en) * | 2019-07-03 | 2022-03-08 | 东南大学 | Unmanned aerial vehicle-assisted wireless sensor network data collection method |
CN110428115A (en) * | 2019-08-13 | 2019-11-08 | 南京理工大学 | Maximization system benefit method under dynamic environment based on deeply study |
CN111340324B (en) * | 2019-09-25 | 2022-06-07 | 中国人民解放军国防科技大学 | Multilayer multi-granularity cluster task planning method based on sequential distribution |
CN111340324A (en) * | 2019-09-25 | 2020-06-26 | 中国人民解放军国防科技大学 | Multilayer multi-granularity cluster task planning method based on sequential distribution |
CN110673637A (en) * | 2019-10-08 | 2020-01-10 | 福建工程学院 | Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning |
CN110673637B (en) * | 2019-10-08 | 2022-05-13 | 福建工程学院 | Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning |
CN110726416A (en) * | 2019-10-23 | 2020-01-24 | 西安工程大学 | Reinforced learning path planning method based on obstacle area expansion strategy |
CN110879610A (en) * | 2019-10-24 | 2020-03-13 | 北京航空航天大学 | Reinforced learning method for autonomous optimizing track planning of solar unmanned aerial vehicle |
CN111006693A (en) * | 2019-12-12 | 2020-04-14 | 中国人民解放军陆军工程大学 | Intelligent aircraft track planning system and method thereof |
CN111006693B (en) * | 2019-12-12 | 2021-12-21 | 中国人民解放军陆军工程大学 | Intelligent aircraft track planning system and method thereof |
CN111026157A (en) * | 2019-12-18 | 2020-04-17 | 四川大学 | Intelligent aircraft guiding method based on reward remodeling reinforcement learning |
CN111123963A (en) * | 2019-12-19 | 2020-05-08 | 南京航空航天大学 | Unknown environment autonomous navigation system and method based on reinforcement learning |
CN111160755A (en) * | 2019-12-26 | 2020-05-15 | 西北工业大学 | DQN-based real-time scheduling method for aircraft overhaul workshop |
CN111160755B (en) * | 2019-12-26 | 2023-08-18 | 西北工业大学 | Real-time scheduling method for aircraft overhaul workshop based on DQN |
CN111328023A (en) * | 2020-01-18 | 2020-06-23 | 重庆邮电大学 | Mobile equipment multitask competition unloading method based on prediction mechanism |
CN111399541B (en) * | 2020-03-30 | 2022-07-15 | 西北工业大学 | Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network |
CN111399541A (en) * | 2020-03-30 | 2020-07-10 | 西北工业大学 | Unmanned aerial vehicle whole-region reconnaissance path planning method of unsupervised learning type neural network |
CN111479216A (en) * | 2020-04-10 | 2020-07-31 | 北京航空航天大学 | Unmanned aerial vehicle cargo conveying method based on UWB positioning |
CN111538059A (en) * | 2020-05-11 | 2020-08-14 | 东华大学 | Self-adaptive rapid dynamic positioning system and method based on improved Boltzmann machine |
CN111612162A (en) * | 2020-06-02 | 2020-09-01 | 中国人民解放军军事科学院国防科技创新研究院 | Reinforced learning method and device, electronic equipment and storage medium |
CN111736461B (en) * | 2020-06-30 | 2021-05-04 | 西安电子科技大学 | Unmanned aerial vehicle task collaborative allocation method based on Q learning |
CN111736461A (en) * | 2020-06-30 | 2020-10-02 | 西安电子科技大学 | Unmanned aerial vehicle task collaborative allocation method based on Q learning |
CN111880563A (en) * | 2020-07-17 | 2020-11-03 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
CN111880563B (en) * | 2020-07-17 | 2022-07-15 | 西北工业大学 | Multi-unmanned aerial vehicle task decision method based on MADDPG |
CN112130124A (en) * | 2020-09-18 | 2020-12-25 | 北京北斗天巡科技有限公司 | Rapid calibration and error processing method for unmanned aerial vehicle management and control equipment in civil aviation airport |
CN112130124B (en) * | 2020-09-18 | 2023-11-24 | 郑州市混沌信息技术有限公司 | Quick calibration and error processing method for unmanned aerial vehicle management and control equipment in civil aviation airport |
CN112356031B (en) * | 2020-11-11 | 2022-04-01 | 福州大学 | On-line planning method based on Kernel sampling strategy under uncertain environment |
CN112356031A (en) * | 2020-11-11 | 2021-02-12 | 福州大学 | On-line planning method based on Kernel sampling strategy under uncertain environment |
CN113033815A (en) * | 2021-02-07 | 2021-06-25 | 广州杰赛科技股份有限公司 | Intelligent valve cooperation control method, device, equipment and storage medium |
CN112525213B (en) * | 2021-02-10 | 2021-05-14 | 腾讯科技(深圳)有限公司 | ETA prediction method, model training method, device and storage medium |
CN112525213A (en) * | 2021-02-10 | 2021-03-19 | 腾讯科技(深圳)有限公司 | ETA prediction method, model training method, device and storage medium |
CN113093803A (en) * | 2021-04-03 | 2021-07-09 | 西北工业大学 | Unmanned aerial vehicle air combat motion control method based on E-SAC algorithm |
CN113176786A (en) * | 2021-04-23 | 2021-07-27 | 成都凯天通导科技有限公司 | Q-Learning-based hypersonic aircraft dynamic path planning method |
CN114020009A (en) * | 2021-10-20 | 2022-02-08 | 中国航空工业集团公司洛阳电光设备研究所 | Terrain penetration planning method for small-sized fixed-wing unmanned aerial vehicle |
CN114020009B (en) * | 2021-10-20 | 2024-03-29 | 中国航空工业集团公司洛阳电光设备研究所 | Small fixed-wing unmanned aerial vehicle terrain burst prevention planning method |
CN114115340A (en) * | 2021-11-15 | 2022-03-01 | 南京航空航天大学 | Airspace cooperative control method based on reinforcement learning |
CN114153213A (en) * | 2021-12-01 | 2022-03-08 | 吉林大学 | Deep reinforcement learning intelligent vehicle behavior decision method based on path planning |
CN113867369A (en) * | 2021-12-03 | 2021-12-31 | 中国人民解放军陆军装甲兵学院 | Robot path planning method based on alternating current learning seagull algorithm |
WO2024020923A1 (en) * | 2022-07-27 | 2024-02-01 | 苏州泽达兴邦医药科技有限公司 | Granulation process for traditional chinese medicine production, and process strategy calculation method |
CN115562357A (en) * | 2022-11-23 | 2023-01-03 | 南京邮电大学 | Intelligent path planning method for unmanned aerial vehicle cluster |
CN115562357B (en) * | 2022-11-23 | 2023-03-14 | 南京邮电大学 | Intelligent path planning method for unmanned aerial vehicle cluster |
CN117806340A (en) * | 2023-11-24 | 2024-04-02 | 中国电子科技集团公司第十五研究所 | Airspace training flight path automatic planning method and device based on reinforcement learning |
CN117928559A (en) * | 2024-01-26 | 2024-04-26 | 兰州理工大学 | Unmanned aerial vehicle path planning method under threat avoidance based on reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN109655066B (en) | 2022-05-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109655066A (en) | One kind being based on the unmanned plane paths planning method of Q (λ) algorithm | |
Wang et al. | Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach | |
Singla et al. | Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge | |
Choi et al. | Unmanned aerial vehicles using machine learning for autonomous flight; state-of-the-art | |
Sun et al. | Motion planning for mobile robots—Focusing on deep reinforcement learning: A systematic review | |
Dong et al. | A review of mobile robot motion planning methods: from classical motion planning workflows to reinforcement learning-based architectures | |
CN106483852B (en) | A kind of stratospheric airship control method based on Q-Learning algorithm and neural network | |
CN110362089A (en) | A method of the unmanned boat independent navigation based on deeply study and genetic algorithm | |
Xie et al. | Learning with stochastic guidance for robot navigation | |
CN113268074B (en) | Unmanned aerial vehicle flight path planning method based on joint optimization | |
CN109597425A (en) | Navigation of Pilotless Aircraft and barrier-avoiding method based on intensified learning | |
Alkowatly et al. | Bioinspired autonomous visual vertical control of a quadrotor unmanned aerial vehicle | |
Li et al. | A behavior-based mobile robot navigation method with deep reinforcement learning | |
Valasek et al. | Intelligent motion video guidance for unmanned air system ground target surveillance | |
Xue et al. | A UAV navigation approach based on deep reinforcement learning in large cluttered 3D environments | |
CN116679711A (en) | Robot obstacle avoidance method based on model-based reinforcement learning and model-free reinforcement learning | |
Wu et al. | Multi-objective reinforcement learning for autonomous drone navigation in urban areas with wind zones | |
Li et al. | A warm-started trajectory planner for fixed-wing unmanned aerial vehicle formation | |
Olaz et al. | Quadcopter neural controller for take-off and landing in windy environments | |
CN117387635A (en) | Unmanned aerial vehicle navigation method based on deep reinforcement learning and PID controller | |
Zhang et al. | A state-decomposition DDPG algorithm for UAV autonomous navigation in 3D complex environments | |
Hua et al. | A novel learning-based trajectory generation strategy for a quadrotor | |
Cui | Multi-target points path planning for fixed-wing unmanned aerial vehicle performing reconnaissance missions | |
Qu et al. | USV Path Planning Under Marine Environment Simulation Using DWA and Safe Reinforcement Learning | |
Mahé et al. | Trajectory-control using deep system identification and model predictive control for drone control under uncertain load |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |