CN113283827A - Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning - Google Patents

Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning Download PDF

Info

Publication number
CN113283827A
CN113283827A CN202110413367.6A CN202110413367A CN113283827A CN 113283827 A CN113283827 A CN 113283827A CN 202110413367 A CN202110413367 A CN 202110413367A CN 113283827 A CN113283827 A CN 113283827A
Authority
CN
China
Prior art keywords
unmanned aerial
aerial vehicle
logistics
reinforcement learning
deep reinforcement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110413367.6A
Other languages
Chinese (zh)
Other versions
CN113283827B (en
Inventor
于滨
张力
崔少华
刘家铭
单文轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Innovation Research Institute of Beihang University
Original Assignee
Hefei Innovation Research Institute of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Innovation Research Institute of Beihang University filed Critical Hefei Innovation Research Institute of Beihang University
Priority to CN202110413367.6A priority Critical patent/CN113283827B/en
Publication of CN113283827A publication Critical patent/CN113283827A/en
Application granted granted Critical
Publication of CN113283827B publication Critical patent/CN113283827B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0835Relationships between shipper or supplier and carriers
    • G06Q10/08355Routing methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Hardware Design (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning, which comprises the steps of firstly, constructing a deep reinforcement learning model through preprocessing a logistics distribution area and related data, establishing a corresponding unmanned aerial vehicle flight state space, an action space and a return value function, and training the deep reinforcement learning model by combining an offline learning mode and an online learning mode; secondly, planning a logistics distribution path and a flight path in the logistics process of the unmanned aerial vehicle by adopting a two-stage optimization method. And the unmanned aerial vehicle flight path planning stage is mainly completed by selecting real-time actions based on deep reinforcement learning. According to the method, the distribution cost of the unmanned aerial vehicle is estimated through deep reinforcement learning in the logistics path planning stage, so that the optimization of the logistics path is more suitable for the actual flight process of the unmanned aerial vehicle, the planning of the real-time flight path of the unmanned aerial vehicle is realized based on the deep reinforcement learning, and the method has the advantages of higher calculation speed and higher robustness.

Description

Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning
Technical Field
The invention belongs to the field of intelligent logistics, and particularly relates to an unmanned aerial vehicle logistics path planning method based on deep reinforcement learning.
Background
With the explosive development of unmanned aerial vehicle technology in recent years, more and more logistics enterprises are trying to use unmanned aerial vehicles as a supplement to urban subject logistics. Compare in traditional ground logistics distribution mode, use unmanned aerial vehicle to carry out logistics distribution more nimble, reduce artifical labour, improve advantages such as delivery coverage, consequently unmanned aerial vehicle commodity circulation is regarded as the last kilometer reasonable way of solving the commodity circulation. However, the use of the unmanned aerial vehicle for logistics transportation not only needs to plan a reasonable distribution path, but also needs to consider the safety trajectory of unmanned aerial vehicle flight in the distribution process, so that two aspects of path optimization in the logistics distribution process and airspace management in the unmanned aerial vehicle flight process need to be considered simultaneously when designing a corresponding unmanned aerial vehicle logistics path planning method, and reducing the distribution cost to the maximum extent under the condition of safe operation of the unmanned aerial vehicle is an important target for unmanned aerial vehicle logistics path planning.
Compared with the traditional vehicle path problem, the path planning process of unmanned aerial vehicle logistics also comprises planning of the landing and landing positions of the unmanned aerial vehicle and real-time path planning in the flight process of the unmanned aerial vehicle. In the existing logistics path planning method at home and abroad, the vehicle path problem is mainly researched based on a heuristic algorithm or an accurate algorithm, and the path problem and the flight control process of an unmanned aerial vehicle are not involved. Therefore, there is a need for a method that can consider both logistics and flight path planning when using an unmanned aerial vehicle for logistics distribution.
Disclosure of Invention
The invention provides a two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning. The method divides the logistics path planning problem based on the unmanned aerial vehicle into two processes: preprocessing and model training; two-stage unmanned aerial vehicle route planning, it is specific, two-stage unmanned aerial vehicle route planning includes: an unmanned aerial vehicle logistics path planning stage and an unmanned aerial vehicle flight path planning stage.
The method is characterized by comprising the following steps of collecting data of a training deep reinforcement learning model in a preprocessing and model training process, and training the model by combining offline data:
1) the method comprises the steps of firstly, carrying out space rasterization operation on an internal space of a logistics service area, setting an initial grid state by combining with the distribution of obstacles in the space, marking grids which are forbidden to enter, and constructing a simulation environment based on an actual space. Combining the space grating division result to construct an offline training data set for the existing unmanned aerial vehicle manual operation trajectory data;
2) determining a state space S and an action space A of the deep reinforcement learning, and setting a return value r of the deep reinforcement learning according to a distribution task, wherein the return value r consists of two parts, and the specific value r is rl+rsWherein r islIndicating the distance return of the current position of the drone from the target position, rsA value representing the unmanned aerial vehicle action safety return,
3) and constructing a training experience pool in the training process to store experience data (s, a, r, s'), carrying out data sampling in batches from the experience pool in a batch training mode, and training the neural network parameters for providing the Q value by combining a gradient descent algorithm.
4) And in a simulation environment, generating a logistics path at random, simulating a flight path planning stage of the unmanned aerial vehicle based on the trained deep reinforcement learning model, and performing online training on the model. Meanwhile, simulation is used as a mode for estimating the flight cost of the unmanned aerial vehicle in the first stage of the two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning.
The unmanned aerial vehicle logistics route planning stage confirms that the logistics distribution in-process waits for customer's access sequence and unmanned aerial vehicle to open and stop the position, combines unmanned aerial vehicle flight path to confirm the optimal delivery strategy under guaranteeing the delivery safety condition, its characterized in that: comprises the following steps of (a) carrying out,
1) collecting the location l of a customer point i to be served inside a service areaiDelivery demand qiService time siAnd a time window [ a ] in which it can be servicedi,bi]Constructing a customer data set;
2) unmanned aerial vehicle quantity N based on unmanned aerial vehicle starting and stopping point mmAnd the maximum number of unmanned aerial vehicles that can be accommodated
Figure BDA0003024841440000021
The maximum cargo capacity Q and the endurance time T of the unmanned aerial vehicle adopt a greedy insertion method to construct an initial logistics distribution path scheme, and the scheme mainly comprises the following steps of (1) distributing customers to be served to available unmanned aerial vehicles ni(ii) a (2) Unmanned plane niThe order in which the customer locations are visited; (3) unmanned plane niTakeoff position and landing position. The safety and cost time consumption of logistics distribution in the process of constructing the initial logistics distribution path scheme are simulated in a simulation environment by the deep reinforcement learning model trained in the preprocessing and model training process.
3) Optimizing an initial logistics distribution path scheme by using a neighborhood search-based algorithm, wherein the method comprises the following steps: mainly comprises the following steps: (1) performing customer point deletion operation on the existing logistics distribution path scheme, namely deleting customer nodes in a part of the existing logistics distribution path scheme based on a given deletion strategy, and putting the customer nodes into a customer set to be inserted; (2) selecting customers who are not arranged from the customer set to be inserted into the logistics distribution path scheme based on a given insertion strategy until all the customers are distributed; (3) local neighborhood searching is carried out on the deleted and inserted new logistics distribution path scheme, and a logistics distribution path with low cost is found; 4) and (3) judging whether the neighborhood searching process is converged, if not, returning to the step (1) for continuous circulation, and if so, adopting the logistics distribution route scheme with the least distribution cost.
Unmanned aerial vehicle flight path planning stage carries out real-time planning and adjustment to unmanned aerial vehicle flight path based on degree of depth reinforcement study, guarantees the safe flight of unmanned aerial vehicle at the delivery in-process, its characterized in that: comprises the following steps of (a) carrying out,
1) the method is characterized in that an unmanned aerial vehicle flight path task set is constructed, and an unmanned aerial vehicle flight controller is generated based on an unmanned aerial vehicle logistics distribution path scheme obtained in an unmanned aerial vehicle logistics path planning stageService sequence qniWhere m and m 'denote drones n, …, i, …, m' }, respectivelyiUnmanned plane stopping points for taking off and landing;
2) and based on the deep reinforcement learning model, selecting unmanned flight actions of all the arranged logistics distribution paths in real time, and updating the state space and the surrounding space grating accessible state. When unmanned plane niThe position of the unmanned aerial vehicle n is coincident with the destination position and all distribution tasks are completediThe flight path planning process of (1) is terminated;
3) and repeating the step 2) until all the unmanned aerial vehicles reach the preset destination and complete the distribution task.
The invention has the following advantages:
1. the invention combines the unmanned aerial vehicle flight path planning process based on deep reinforcement learning in the unmanned aerial vehicle logistics distribution path planning method. Meanwhile, the problem of path planning of two dimensions in logistics of the unmanned aerial vehicle is optimized, and a corresponding two-stage unmanned aerial vehicle path planning method is designed. The two-stage unmanned aerial vehicle path planning method adopted by the invention can effectively ensure the safety and the high efficiency of the unmanned aerial vehicle logistics path obtained by optimization.
2. In the first-stage unmanned aerial vehicle distribution path planning stage adopted by the invention, simulation results obtained by operating an unmanned aerial vehicle flight path planning model in a simulation environment based on deep reinforcement learning are used for distribution cost, distribution time and path safety in the unmanned aerial vehicle distribution process, so that the logistics path planning result in the first stage is more in accordance with the flight process of the unmanned aerial vehicle, the difference in cost estimation of the two-stage model is reduced, and the accuracy in the actual use process of the invention is improved.
3. The invention adopts the mode of combining the static training of the existing flight trajectory data of the unmanned aerial vehicle and the dynamic training process in the simulation environment to construct the unmanned aerial vehicle flight path planning method based on deep reinforcement learning. In actual use, the unmanned aerial vehicle distribution process is controlled by the trained deep reinforcement learning model, compared with the traditional path planning algorithm, the time for calculating the optimal strategy of the unmanned aerial vehicle in real time in the actual use process is saved, the matching with the actual distribution environment is ensured, and the safety of the distribution process is guaranteed.
Drawings
FIG. 1 is a basic flow chart of a two-stage unmanned aerial vehicle path planning method based on deep reinforcement learning;
FIG. 2 is a schematic diagram of an optional action in a flight path planning phase of an unmanned aerial vehicle;
fig. 3 is a schematic diagram of a planning stage of a logistics distribution path of an unmanned aerial vehicle.
Detailed Description
The following detailed description of specific embodiments of the invention is provided in conjunction with the accompanying drawings:
the invention adopts a two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning, which comprises the following specific steps as shown in figure 1:
1) a preprocessing and model training stage:
(1) firstly, carrying out space grating operation in a distribution area, constructing a simulation environment, setting an inaccessible airspace according to the distribution of obstacles in the distribution airspace, and setting an initial value for each space grating grid, wherein 1 represents that the unmanned aerial vehicle can enter, and 0 represents that the unmanned aerial vehicle can not enter. Collecting the manual operation trajectory data of the existing unmanned aerial vehicle, and constructing an offline training data set;
(2) and determining a state space S and an action space A of deep reinforcement learning. Wherein state space mainly embodies the spatial position that unmanned aerial vehicle located, load state and surplus continuation of the journey specifically are:
Figure BDA0003024841440000041
wherein (x)t,yt,ht) Representing the coordinates and altitude of the drone at time t, qtIndicating the cargo capacity of the drone at time t,
Figure BDA0003024841440000042
representing the remaining endurance time of the drone at time t,
Figure BDA0003024841440000043
this indicates the completion status of the delivery task for customer i at time t, and if 0 indicates that the delivery task has not been completed, then if 1 indicates that the delivery task has been completed. The motion space a includes the motions that can be selected at time t, specifically, 7 motions { climb, descend, advance, retreat, turn left, turn right, and keep home position }, as shown in fig. 2, where the basic units of climb, descend, advance, and retreat are all a grid of spatial grid.
(3) Setting a deep reinforcement learning return value r according to a distribution task, wherein the return value r consists of two parts, and the specific r is rl+rsWherein r islIndicating a distance return, specific r, of a current position of the drone from a target positionlThe calculation can be given by:
Figure BDA0003024841440000044
rsrepresenting the unmanned aerial vehicle action safety report value, the specific calculation mode can be given by the following formula:
Figure BDA0003024841440000051
(4) establishing a training experience pool in the training process to store experience data (s, a, r, s'), carrying out data sampling in batches from the experience pool in a batch training mode, providing Q-value neural network parameters for training, updating the parameters in the training process by adopting a gradient descent algorithm, and specifically expressing a loss function as follows: (y)t-Qt(s,a;θ))2Wherein y istCalculated from the following formula:
Figure BDA0003024841440000052
wherein the parameter γ represents a reduction factor of the return value, in the specific example 0.95 is used, wherein the termination condition comprises completion of the delivery task back to the unmanned aerial vehicle stop point, driving into the area marked 0 but possibly other unmanned aerial vehicles, and reaching the unmanned aerial vehicle endurance limit. The action selection in the training process follows an epsilon-greedy strategy, namely, the action which can obtain the maximum return value is selected under the probability epsilon, and an action space A is randomly selected under the probability 1-epsilon.
(5) In a simulation environment, randomly generating logistics path data to obtain an online training data set, specifically, including the starting position of the path
Figure BDA0003024841440000053
End position
Figure BDA0003024841440000054
And intermediate customer point location to be serviced
Figure BDA0003024841440000055
And expected arrival time
Figure BDA0003024841440000056
And (4) simulating a flight path planning stage of the unmanned aerial vehicle based on the deep reinforcement learning model trained in the step (4), planning a flight path of the unmanned aerial vehicle in the online training data set according to an epsilon-greedy strategy, and performing online training on the model. Meanwhile, simulation is used as a mode for estimating the flight cost, flight time and flight safety of the unmanned aerial vehicle in the first stage of the two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning. The acquisition of the flight cost and the flight time of the unmanned aerial vehicle is realized by recording the energy consumption and the flight time of the unmanned aerial vehicle in the flight process in real time in the simulation process.
2) Unmanned aerial vehicle logistics distribution route planning stage:
(1) a structure flow data set for collecting the position l of a customer point i to be served in a distribution area of the structure flowiDelivery demand qiService time siAnd a time window [ a ] in which it can be servicedi,bi]A customer data set is constructed, with customer nodes represented by the serial number i. Determining the number N of unmanned aerial vehicles at starting and stopping points (indicated by a sequence number m) of each unmanned aerial vehiclemAnd the maximum number of unmanned aerial vehicles capable of being accommodatedMeasurement of
Figure BDA0003024841440000057
Determining each drone (by sequence number n)iExpressed) and a time of flight T.
(2) Method for constructing initial logistics distribution path scheme by greedy insertion method, specifically including unmanned aerial vehicle niTake-off and landing positions m and m' and unmanned aerial vehicle niThe sequence of customers served { … i … }, m and m' as shown in fig. 3, respectively represent the start and end of the unmanned airplane logistics path, and three customers i, j and k are served accordingly during the unmanned airplane distribution process. The specific greedy insertion method can be summarized as: and sequentially selecting one customer from the set of customers to be served and inserting the customer into the currently obtained unmanned aerial vehicle logistics distribution path set according to the inserting rule, wherein the new distribution path after the inserting is selected and the distribution cost of the distribution path before the inserting is increased to the minimum. The above insertion operation is repeated in this manner until all customers are assigned to the path. Particularly, the unmanned aerial vehicle distribution cost is obtained by adopting a mode of flight cost consumption of the unmanned aerial vehicle under a temporary path obtained by simulation construction in a simulation environment, and the action of the unmanned aerial vehicle is generated according to a strategy of selecting the maximum return.
(3) The logistics distribution path is optimized by using a neighborhood search-based algorithm: the specific process is as follows:
step 1: a proportion of customers in the existing set of paths is deleted according to a given deletion policy, with alpha in the example being in the range 0-1. Specifically, the method comprises the following steps: the adopted deletion strategy comprises randomly selecting customers with the proportion of alpha in the existing path set to delete; selecting the customers with the ratio of alpha which can reduce the path cost most after deletion; the option delete may result in a path cost reduction for the kth customer (k-regret delete, k options 2, 3 and 4 in the example); randomly deleting all customers served by the unmanned aerial vehicle; and selecting the unmanned aerial vehicle with the largest current cost to delete all customers. And putting all the deleted customers into a customer set to be inserted.
Step 2: and selecting a position where the customer is inserted into the set of customers to be inserted according to a given insertion strategy so that the inserted flight cost is minimum, wherein the flight cost is obtained according to the action of selecting the maximum return by combining the deep reinforcement learning model obtained by training according to the flight simulation environment of the unmanned aerial vehicle. The specific insertion operation includes: randomly selecting customers from a customer set to be inserted; selecting the customer with the minimum cost increase after insertion; the insertion is selected in such a way that the cost increases for the kth customer (k-regret insertion, k in the example chosen 2, 3 and 4).
Wherein each delete and insert operation has a selection weight wiIn each iteration according to the following formula:
Figure BDA0003024841440000061
calculating each deletion insertion operation selection probability piAnd selects delete and insert operations based on the probability.
And step 3: if not, the maximum cycle number L1Returning to the step 1 to continue the circulation for the circulation times l1=l1+1. If the maximum cycle number is reached, calling a local neighborhood search strategy to optimize the current result, specifically: the strategy of local neighborhood search comprises the following steps: the order of two customers within the exchange path, the order of two customers between exchange paths, and several customers with the same position in the service sequence between exchange paths. The number of iterations of the local neighborhood search in the example is L2
And 4, step 4: and judging whether the maximum search cycle number L is reached, if the maximum search cycle number L is not reached, updating, deleting and inserting the operation weight, returning to the step 1, setting the cycle number L to be L +1, and otherwise, outputting the current best result to the unmanned aerial vehicle flight path planning model in the second stage. Specifically, the method comprises the following steps: the specific gravity of the delete and insert operation is updated as follows:
Figure BDA0003024841440000071
wherein the parameter r represents a coefficient for updating the specific gravity according to the deletion insertion operation score, and in the example, the value range of η is 0 to 1. RhoiRepresenting the number of times each operation occurs during the iteration, piiThe scores of the operations in the iterative process are shown, specifically: a new optimal solution is obtained with a score of 33 when an insert delete operation is performed, a solution that is not optimal but better than the solution before the operation is obtained with a score of 9 when an insert delete operation is performed, and a score of 13 when a solution that is worse than the solution before the operation but is selected based on a simulated annealing mechanism is obtained after an insert delete operation is performed.
3) An unmanned aerial vehicle flight path planning stage based on deep reinforcement learning:
(1) an unmanned aerial vehicle flight path task set is established, specifically, an unmanned aerial vehicle n is established based on a logistics distribution path obtained based on a neighborhood algorithm in the first stageiFlight path sequence qniWhere m and m 'denote drones n, …, i, …, m' }, respectivelyiTakeoff and landing positions. Determining the starting point and the end point of a flight path planning stage according to the flight path sequence, namely (x)m,ym,hm),(xm′,ym′,hm′)。
(2) Unmanned aerial vehicle action selection, in particular, unmanned aerial vehicle niAt an initial time t0From the starting point (x)m,ym,hm) Starting from, at any time t, it is first determined whether the end point (x) has been reachedm′,ym′,hm′) If the destination is reached, n of the unmanned aerial vehicleiThe path planning is completed. If the terminal is not reached, selecting the action corresponding to the maximum Q value according to the Q value output by the neural network, and generating the unmanned aerial vehicle state s' at the moment of t + 1. Particularly, in the flight process of the unmanned aerial vehicle, a radar carried by the unmanned aerial vehicle detects whether an obstacle exists in an adjacent space grid in real time, and if the obstacle exists, the state of the grid is marked as 0 in real time, namely, the obstacle cannot be entered.
(3) And (3) repeating the step (2) until the unmanned aerial vehicle completes the distribution tasks arranged in the sequence and finally reaches the preset landing position.
The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiment, and it should be noted that several modifications and amendments without departing from the principle of the present invention should be considered as the protection scope of the present invention.

Claims (8)

1. A two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning is characterized by comprising the following stages:
a preprocessing and model training stage: the method comprises the steps of performing space rasterization on a space in a distribution range, collecting flight trajectory data of a manually operated unmanned aerial vehicle to construct an offline training set, constructing an unmanned aerial vehicle flight simulation environment by combining with space domain characteristics, designing an unmanned aerial vehicle state space, an action space and return value function, and an offline and online training model by combining with an unmanned aerial vehicle distribution process;
unmanned aerial vehicle logistics path planning stage: collecting logistics distribution data in a distribution range, constructing an initial distribution path scheme, evaluating the flight path of the unmanned aerial vehicle by combining a trained model, and optimizing the logistics path of the unmanned aerial vehicle;
unmanned aerial vehicle flight path planning stage: and determining an unmanned aerial vehicle task sequence by combining the output of the unmanned aerial vehicle path planning stage, and outputting the flight path of the unmanned aerial vehicle based on deep reinforcement learning.
2. The deep reinforcement learning-based two-phase unmanned aerial vehicle logistics path planning method of claim 1, wherein the preprocessing phase comprises
Carrying out space grating operation in the distribution area, constructing a simulation environment, and setting an entry state for the space grating;
determining a state space S and an action space A of deep reinforcement learning;
setting a deep reinforcement learning return value function r according to a distribution task;
collecting flight trajectory data of the manually operated unmanned aerial vehicle, and constructing a discrete training data set.
3. The scheme of claim 2, wherein the state space comprises four types of state information including a spatial position where the unmanned aerial vehicle is located, a loading state, a remaining endurance, and a customer point service state, and the action space comprises { climbing, descending, advancing, retreating, turning left, turning right, and keeping an original position } selectable actions.
4. The solution of claim 2, wherein the deep reinforcement learning return value function is derived from the return value r of the drone from the target positionlAnd unmanned aerial vehicle action safety report value rsTwo parts are formed.
5. The deep reinforcement learning-based two-phase unmanned aerial vehicle logistics path planning method according to claim 1, wherein for the unmanned aerial vehicle logistics path planning phase, the method comprises the following steps:
collecting unmanned aerial vehicle logistics demand data and constructing an unmanned aerial vehicle logistics demand data set;
determining an unmanned aerial vehicle logistics distribution path planning initial scheme;
and optimizing the unmanned aerial vehicle logistics distribution path planning scheme, and taking the optimized unmanned aerial vehicle logistics distribution path planning scheme as the input of the unmanned aerial vehicle flight path planning stage.
6. The scheme of claim 5, wherein a method for optimizing the logistics distribution path of the unmanned aerial vehicle based on neighborhood search is adopted, and a special neighborhood search process comprises the following steps:
a large neighborhood search process based on delete and insert operations;
and (3) optimizing based on local neighborhood searching.
7. The scheme of claim 5, wherein the flight path cost is obtained by using the simulation environment of claim 2 based on the deep reinforcement learning model when the logistics distribution path of the unmanned aerial vehicle is optimized.
8. The deep reinforcement learning-based two-phase unmanned aerial vehicle logistics path planning method according to claim 1, is characterized in that the unmanned aerial vehicle flight path planning phase comprises the following steps:
constructing a flight path task set of the unmanned aerial vehicle;
and carrying out unmanned aerial vehicle flight action selection based on deep reinforcement learning.
CN202110413367.6A 2021-04-16 2021-04-16 Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning Active CN113283827B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110413367.6A CN113283827B (en) 2021-04-16 2021-04-16 Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110413367.6A CN113283827B (en) 2021-04-16 2021-04-16 Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN113283827A true CN113283827A (en) 2021-08-20
CN113283827B CN113283827B (en) 2024-03-12

Family

ID=77276893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110413367.6A Active CN113283827B (en) 2021-04-16 2021-04-16 Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113283827B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201925A (en) * 2022-02-17 2022-03-18 佛山科学技术学院 Unmanned aerial vehicle cluster cooperative task planning method, electronic equipment and readable storage medium
US20220397403A1 (en) * 2021-05-24 2022-12-15 Ocado Innovation Limited System and method for determining a route for a multi-depot vehicle network
CN117806340A (en) * 2023-11-24 2024-04-02 中国电子科技集团公司第十五研究所 Airspace training flight path automatic planning method and device based on reinforcement learning
CN118170013A (en) * 2024-02-26 2024-06-11 无锡学院 Unmanned aerial vehicle auxiliary distribution system and method based on reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110155328A (en) * 2019-05-21 2019-08-23 上海理工大学 The method that unmanned plane carries out medical material dispatching for the mobile clinic in earthquake-stricken area
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN111142557A (en) * 2019-12-23 2020-05-12 清华大学 Unmanned aerial vehicle path planning method and system, computer equipment and readable storage medium
CN112034887A (en) * 2020-09-10 2020-12-04 南京大学 Optimal path training method for unmanned aerial vehicle to avoid cylindrical barrier to reach target point
CN112148008A (en) * 2020-09-18 2020-12-29 中国航空无线电电子研究所 Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110155328A (en) * 2019-05-21 2019-08-23 上海理工大学 The method that unmanned plane carries out medical material dispatching for the mobile clinic in earthquake-stricken area
CN110673637A (en) * 2019-10-08 2020-01-10 福建工程学院 Unmanned aerial vehicle pseudo path planning method based on deep reinforcement learning
CN111142557A (en) * 2019-12-23 2020-05-12 清华大学 Unmanned aerial vehicle path planning method and system, computer equipment and readable storage medium
CN112034887A (en) * 2020-09-10 2020-12-04 南京大学 Optimal path training method for unmanned aerial vehicle to avoid cylindrical barrier to reach target point
CN112148008A (en) * 2020-09-18 2020-12-29 中国航空无线电电子研究所 Real-time unmanned aerial vehicle path prediction method based on deep reinforcement learning

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220397403A1 (en) * 2021-05-24 2022-12-15 Ocado Innovation Limited System and method for determining a route for a multi-depot vehicle network
CN114201925A (en) * 2022-02-17 2022-03-18 佛山科学技术学院 Unmanned aerial vehicle cluster cooperative task planning method, electronic equipment and readable storage medium
CN117806340A (en) * 2023-11-24 2024-04-02 中国电子科技集团公司第十五研究所 Airspace training flight path automatic planning method and device based on reinforcement learning
CN117806340B (en) * 2023-11-24 2024-08-30 中国电子科技集团公司第十五研究所 Airspace training flight path automatic planning method and device based on reinforcement learning
CN118170013A (en) * 2024-02-26 2024-06-11 无锡学院 Unmanned aerial vehicle auxiliary distribution system and method based on reinforcement learning

Also Published As

Publication number Publication date
CN113283827B (en) 2024-03-12

Similar Documents

Publication Publication Date Title
CN113283827A (en) Two-stage unmanned aerial vehicle logistics path planning method based on deep reinforcement learning
CN107169608B (en) Distribution method and device for multiple unmanned aerial vehicles to execute multiple tasks
CN113159432B (en) Multi-agent path planning method based on deep reinforcement learning
CN110544296B (en) Intelligent planning method for three-dimensional global track of unmanned aerial vehicle in uncertain enemy threat environment
CN107103164B (en) Distribution method and device for unmanned aerial vehicle to execute multiple tasks
CN111678524B (en) Rescue aircraft path planning method and system based on flight safety
CN114186924A (en) Collaborative distribution path planning method and device, electronic equipment and storage medium
CN115730700A (en) Self-adaptive multi-target task planning method, system and equipment based on reference point
CN116225046A (en) Unmanned aerial vehicle autonomous path planning method based on deep reinforcement learning under unknown environment
CN114372415B (en) Method, device and equipment for designing manned moon-boarding track based on reinforcement learning
Ding et al. Improved GWO algorithm for UAV path planning on crop pest monitoring
CN114815891A (en) PER-IDQN-based multi-unmanned aerial vehicle enclosure capture tactical method
CN115237157A (en) Road network constraint-based air-to-ground unmanned cluster multi-task point path planning method
CN114578845B (en) Unmanned aerial vehicle track planning method based on improved ant colony algorithm
CN114237282A (en) Intelligent unmanned aerial vehicle flight path planning method for intelligent industrial park monitoring
CN117170408A (en) Photovoltaic panel site inspection path intelligent planning system and method based on unmanned aerial vehicle
WO2024164367A1 (en) Safe-reinforcement-learning-based unmanned aerial vehicle path planning method in urban airspace
CN115759328B (en) Helicopter mission planning method, system and equipment based on multi-objective optimization
CN115479608B (en) Terminal area approach aircraft four-dimensional track planning method based on time attribute
CN114637331A (en) Unmanned aerial vehicle multi-task path planning method and system based on ant colony algorithm
CN114967748A (en) Unmanned aerial vehicle path planning method based on space deformation
Zhang et al. A UAV autonomous maneuver decision-making algorithm for route guidance
CN114924593B (en) Quick planning method for vehicle and multi-unmanned aerial vehicle combined route
EP4177865A1 (en) Method for determining a flight plan
CN118114850A (en) Unmanned aerial vehicle passive positioning task-oriented autonomous path planning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant