CN114964268A

CN114964268A - Unmanned aerial vehicle navigation method and device

Info

Publication number: CN114964268A
Application number: CN202210902202.XA
Authority: CN
Inventors: 李唯; 张宁远; 曹一丁; 郭伟; 杨雷
Original assignee: Baiyang Times Beijing Technology Co ltd
Current assignee: Baiyang Times Beijing Technology Co ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-08-30
Anticipated expiration: 2042-07-29
Also published as: CN114964268B

Abstract

The application discloses an unmanned aerial vehicle navigation method and device. The method comprises the following steps: constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments; constructing a deep reinforcement learning model for unmanned aerial vehicle navigation based on simulation operation information of a simulation target unmanned aerial vehicle in a simulation environment; and when the target unmanned aerial vehicle runs, navigating by using the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle. When the target unmanned aerial vehicle actually operates, the actual operation scheme can be obtained by utilizing the model and the actual operation information of the target unmanned aerial vehicle, so that the navigation of the target unmanned aerial vehicle is realized, the unmanned aerial vehicle does not need to be familiar with the environment in advance, the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the function of the unmanned aerial vehicle is exerted to the maximum.

Description

Unmanned aerial vehicle navigation method and device

Technical Field

The application relates to the technical field of unmanned aerial vehicles, in particular to an unmanned aerial vehicle navigation method and device.

Background

In recent years, with the continuous development of technologies such as intelligent control technology and robot technology, the autonomous control technology of the unmanned aerial vehicle has made a great progress. The unmanned aerial vehicle as a flight platform capable of carrying various sensing devices and computing devices has the advantages of small size, low manufacturing cost, high flexibility and the like, and can be widely applied to various tasks such as regional reconnaissance, disaster search and rescue and the like.

At present, an existing unmanned aerial vehicle navigation method generally utilizes an environment sensing device carried by an unmanned aerial vehicle to become familiar with a known environment, an environment model corresponding to the known environment is constructed in advance, and then autonomous navigation is realized based on the environment model. Therefore, in the existing unmanned aerial vehicle navigation method, the requirement on the precision of an environment model is high in order to realize a navigation scheme with high accuracy. In this case, if the known environment changes or the drone enters an unknown environment, it is also difficult to achieve accurate autonomous navigation based on a previously generated environment model.

Disclosure of Invention

The embodiment of the application provides an unmanned aerial vehicle navigation method and device, and aims to solve the problem that the traditional unmanned aerial vehicle navigation method is difficult to meet the requirement of accurate navigation.

In a first aspect, an embodiment of the present application provides an unmanned aerial vehicle navigation method, including:

constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments;

constructing a deep reinforcement learning model for unmanned aerial vehicle navigation based on simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment;

and when the target unmanned aerial vehicle runs, navigating by using the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle.

Optionally, the constructing a deep reinforcement learning model for drone navigation based on the simulation operation information of the simulation target drone in the simulation environment includes:

based on the simulation operation information, a navigation strategy model for planning navigation information is constructed by utilizing a deep learning algorithm;

based on the navigation strategy model, a navigation evaluation model for evaluating the navigation information is constructed by utilizing a reinforcement learning algorithm;

and optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model converges, and taking the converged navigation strategy model as the deep reinforcement learning model.

Optionally, the constructing a navigation strategy model for planning navigation information based on the simulation operation information and by using a deep learning algorithm includes:

taking simulation visual information of the simulation target unmanned aerial vehicle in the simulation environment as input of a navigation prediction model, and taking simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment as output of the navigation prediction model, and constructing the navigation prediction model;

the simulation task information of the simulation target unmanned aerial vehicle in the simulation environment and the output of the navigation prediction model are jointly used as the input of a navigation matching model, and the matching degree between the output of the navigation prediction model and the simulation task information is used as the output of the navigation matching model to construct the navigation matching model;

and constructing the navigation strategy model based on the navigation prediction model and the navigation matching model.

Optionally, the constructing a navigation evaluation model for evaluating the navigation information based on the navigation policy model and by using a reinforcement learning algorithm includes:

acquiring navigation information matched with the simulation task information from the simulation navigation information output by the navigation prediction model as target navigation information;

and taking the output of the navigation strategy model and the target navigation information as the input of the navigation evaluation model, and taking the reward evaluation value corresponding to the target navigation information as the output of the navigation evaluation model to construct the navigation evaluation model.

Optionally, before the target drone runs, according to the real running information of the target drone and navigating by using the deep reinforcement learning model, the method further includes:

constructing a navigation test environment of the target unmanned aerial vehicle, and performing navigation test on the deep reinforcement learning model in the navigation test environment to obtain a test result;

determining test random information based on the simulation environment and the navigation test environment;

and updating the deep reinforcement learning model according to the test result and the test random information.

Optionally, when the target drone operates, navigating according to the real operation information of the target drone and by using the deep reinforcement learning model, including:

acquiring real visual information and real task information of the target unmanned aerial vehicle during operation;

inputting the real visual information and the real task information into the deep reinforcement learning model;

acquiring the predicted navigation information output by the depth-enhanced model; the predicted navigation information is navigation information matched with the real task information;

and controlling the target unmanned aerial vehicle to operate based on the predicted navigation information.

Optionally, the constructing a simulation environment corresponding to the simulation target drone based on the device parameters of the target drone and the environment parameters of different known environments includes:

according to the equipment parameters of the target unmanned aerial vehicle, constructing a digital twin model corresponding to the target unmanned aerial vehicle as the simulation target unmanned aerial vehicle;

building a simulation environment set corresponding to the different known environments according to the environment parameters of the different known environments;

and constructing the simulation environment based on the digital twin model and the simulation environment set.

Optionally, the constructing a digital twin model corresponding to the target drone as the simulated target drone according to the device parameter of the target drone includes:

constructing a control and state estimation system simulation model of the simulation target unmanned aerial vehicle based on the control and state estimation system test parameters in the equipment parameters;

constructing a power system simulation model of the simulation target unmanned aerial vehicle based on simulation control parameters output by the control and state estimation system simulation model and power system test parameters in the equipment parameters;

constructing a dynamic simulation model of the simulation target unmanned aerial vehicle based on the simulation power system parameters output by the power system simulation model and the dynamic model test parameters in the equipment parameters;

constructing a rigid motion simulation model of the simulation target unmanned aerial vehicle based on simulation dynamic parameters output by the dynamic simulation model and rigid motion model test parameters in the equipment parameters;

and constructing the digital twin model according to the control and state estimation system simulation model, the power system simulation model, the dynamics simulation model and the rigid motion simulation model.

Optionally, the drone navigation method further includes:

acquiring simulation motion parameters output by the rigid motion simulation model;

and updating the control and state estimation system simulation model and/or the dynamic simulation model according to the simulation motion parameters.

In a second aspect, the embodiment of the present application provides an unmanned aerial vehicle navigation head, which includes:

the simulation environment construction module is used for constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments;

the model building module is used for building a deep reinforcement learning model for unmanned aerial vehicle navigation based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment;

and the navigation module is used for navigating according to the real operation information of the target unmanned aerial vehicle and by utilizing the deep reinforcement learning model when the target unmanned aerial vehicle operates.

According to the technical scheme, the embodiment of the application has the following advantages:

in the embodiment of the application, the simulation environment corresponding to the simulation target unmanned aerial vehicle can be established firstly through the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments, and then the deep reinforcement learning model for unmanned aerial vehicle navigation is established based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle runs, navigation can be performed by using the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle. Because the deep reinforcement learning model is constructed based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, the model can provide a simulation operation scheme of the simulation target unmanned aerial vehicle in the simulation environment. Like this, when target unmanned aerial vehicle actual operation, utilize this model and target unmanned aerial vehicle's true operation information, alright obtain actual operation scheme and realize target unmanned aerial vehicle's navigation, need not to be familiar with the environment in advance to improve unmanned aerial vehicle navigation's efficiency and accuracy, maximize ground play unmanned aerial vehicle's effect.

Drawings

Fig. 1 is a flowchart of an unmanned aerial vehicle navigation method according to an embodiment of the present application;

FIG. 2 is a flowchart of an implementation of building a deep reinforcement learning model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an unmanned aerial vehicle navigation apparatus provided in an embodiment of the present application.

Detailed Description

As described above, the inventors found in the research on the unmanned aerial vehicle navigation method that: the existing unmanned aerial vehicle navigation method generally utilizes an environment sensing device carried by an unmanned aerial vehicle to become familiar with a known environment, an environment model corresponding to the known environment is constructed in advance, and then autonomous navigation is realized based on the environment model. Therefore, in the existing unmanned aerial vehicle navigation method, the requirement on the precision of an environment model is high in order to realize a navigation scheme with high accuracy. In this case, if the known environment changes or the drone enters an unknown environment, it is also difficult to achieve accurate autonomous navigation based on a previously generated environment model.

In order to solve the above problem, an embodiment of the application provides an unmanned aerial vehicle navigation method. The method can comprise the following steps: through the device parameters of the target unmanned aerial vehicle and the environment parameters of different known environments, the simulation environment corresponding to the simulation target unmanned aerial vehicle can be established firstly, and then the deep reinforcement learning model for unmanned aerial vehicle navigation is established based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle runs, navigation can be performed by using the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle.

Because the deep reinforcement learning model is constructed based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, the model can provide a simulation operation scheme of the simulation target unmanned aerial vehicle in the simulation environment. Like this, when target unmanned aerial vehicle actual operation, utilize this model and target unmanned aerial vehicle's true operation information, alright form actual operation scheme and realize target unmanned aerial vehicle's navigation, need not to be familiar with the environment in advance to improve unmanned aerial vehicle navigation's efficiency and accuracy, maximize ground play unmanned aerial vehicle's effect.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Fig. 1 is a flowchart of an unmanned aerial vehicle navigation method according to an embodiment of the present application. With reference to fig. 1, an unmanned aerial vehicle navigation method provided in an embodiment of the present application may include:

s101: and constructing a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the equipment parameters of the target unmanned aerial vehicle and the environment parameters of different known environments.

Because the equipment that different grade type unmanned aerial vehicles such as many rotor unmanned aerial vehicles, single rotor unmanned aerial vehicle, fixed wing unmanned aerial vehicle configured is different, consequently also different to the influence of environmental parameter such as air current, atmospheric pressure. Based on the method, the simulation environment of the simulation unmanned aerial vehicle can be constructed by comprehensively considering the equipment parameters of the target unmanned aerial vehicle and the parameters of different known environments, so that the simulation environment with higher precision is constructed. The embodiment of the application may not be specifically limited in the acquisition mode of the device parameters of the target unmanned aerial vehicle. For example, the device parameters of the target drone may be obtained from a producer associated with the target drone, or, if the control system of the target drone is configured with a drone database, the device parameters may be obtained directly from the drone database. In addition, the embodiments of the present application may not be specifically limited to the manner of acquiring the environmental parameters of different known environments. For example, the environmental parameters of the corresponding environment may be determined based on an existing environment model, or the environmental parameters of the operating environment may be acquired by using an environment sensing device mounted on the target drone each time the target drone operates.

In addition, as for the construction method of the simulation environment corresponding to the simulation target unmanned aerial vehicle, the embodiment of the present application may not be specifically limited, and for convenience of understanding, the following description is made in conjunction with a possible implementation method.

In a possible implementation, S101 may specifically include: according to the equipment parameters of the target unmanned aerial vehicle, constructing a digital twin model corresponding to the target unmanned aerial vehicle as a simulation target unmanned aerial vehicle; building simulation environment sets corresponding to different known environments according to environment parameters of the different known environments; and constructing the simulation environment of the unmanned aerial vehicle based on the digital twin model and the simulation environment set. Therefore, high-precision simulation of the target unmanned aerial vehicle is achieved through the digital twin technology, and a simulation environment set is combined, so that the deep reinforcement learning model with high accuracy can be constructed in the follow-up process, and accurate unmanned aerial vehicle navigation is achieved.

Specifically, the construction process of the digital twin model corresponding to the target drone may include: constructing a control and state estimation system simulation model corresponding to the simulation target unmanned aerial vehicle based on the control and state estimation system test parameters in the equipment parameters; constructing a power system simulation model corresponding to the simulation target unmanned aerial vehicle based on simulation control parameters output by the control and state estimation system simulation model and power system test parameters in the equipment parameters; constructing a dynamic simulation model corresponding to the simulation target unmanned aerial vehicle based on the simulation dynamic system parameters output by the dynamic system simulation model and the dynamic model test parameters in the equipment parameters; constructing a rigid motion simulation model corresponding to the simulation target unmanned aerial vehicle based on simulation kinetic parameters output by the kinetic simulation model and rigid motion model test parameters in the equipment parameters; and constructing a digital twin model according to the control and state estimation system simulation model, the power system simulation model, the dynamics simulation model and the rigid motion simulation model.

In practical applications, the target drone is a quad-rotor drone, for a control and state system model, a flight control simulation model may be determined by a serial PID (Proportional Integral Derivative) control algorithm based on control and state estimation system test parameters, a sensor model with sensor noise added is used as the state estimation simulation model, and then the control and state estimation system simulation model is constructed based on the flight control simulation model and the state estimation simulation model, for a power system simulation model, a second-order system model may be constructed based on power system test parameters and simulation control signals output by the control and state estimation system simulation model, and used as the power system simulation model, for the power system simulation model, a rotational speed lift simulation model may be constructed based on the power system test parameters and the simulation rotational speed output by the power system simulation model, as a dynamic simulation model. For the rigid body motion simulation model, a Bullet3 physical engine can be adopted, and a 6-degree-of-freedom motion model is constructed as the rigid body motion simulation model based on the simulation power and the simulation moment output by the simulation dynamic model.

In addition, in order to improve the accuracy of the simulation target unmanned aerial vehicle obtained based on the digital twin technology, in the embodiment of the application, the simulation target unmanned aerial vehicle can be optimized by using the simulation motion parameters output by the rigid motion simulation model. Specifically, the simulation motion parameters output by the rigid motion simulation model can be obtained; and updating the control and state estimation system simulation model and/or the dynamic simulation model according to the simulation motion parameters. Here, the simulated motion parameter may be a simulated air flow rate in a simulation environment in which the simulation target drone is located.

S102: and constructing a deep reinforcement learning model for unmanned aerial vehicle navigation based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment.

Here, the simulation operation information may include simulation visual information of the simulation target drone in the simulation environment, simulation navigation information of the simulation target drone in the simulation environment, and simulation task information of the simulation target drone in the simulation environment. In addition, for the construction process of the deep reinforcement learning model, please refer to the introduction made below for technical details.

In addition, in the embodiment of the application, in order to improve the accuracy of the deep reinforcement learning model, the deep reinforcement learning model can be optimized through a virtual-real migration technology. Specifically, a navigation test environment of the target unmanned aerial vehicle can be established, a navigation test can be performed on the deep reinforcement learning model in the navigation test environment to obtain a test result, test random information is determined based on the simulation environment and the navigation test environment, and then the deep reinforcement learning model can be updated according to the test result and the test random information. The navigation test environment is a real environment, and thus the test random information can be reflected as a small difference between the simulation environment and the real environment. In practical application, the test random information may be random environment information, such as random illumination information, random wind speed information, or the like, and may also be a dynamic fuzzy error model affected by the angular velocity of the camera of the unmanned aerial vehicle, a dynamic response model of the unmanned aerial vehicle subjected to stochastic processing, or the like. Therefore, by simulating the difference between the virtual model and the reality model and optimizing the deep reinforcement learning model, the difference can be reduced, and the accuracy of the deep reinforcement learning model is improved.

S103: and when the target unmanned aerial vehicle runs, navigating by using the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle.

Here, the real operation information may include real visual information and real task information when the target drone operates.

For the real visual information, it may embody environmental information that the target unmanned aerial vehicle shoots when running, and specifically, if the target unmanned aerial vehicle is configured with an RGB (Red Green Blue, three primary colors of Red, Green, and Blue) camera and a depth camera, the real visual information may be obtained by: respectively shooting images of the running environment by using an RGB camera and a depth camera; carrying out characteristic processing and image recognition on the respectively shot images to obtain respectively processed images and image recognition results; and respectively processing the image and the image recognition result as real visual information. In addition, the real visual information may be acquired in a real-time acquisition manner, or may be acquired according to a preset acquisition frequency, where the preset acquisition frequency is, for example, 60 frames per second, and this embodiment of the present application may not be specifically limited.

For real task information, it may embody the environmental destination of the target drone at runtime. Specifically, the real task information can be obtained by issuing an instruction by an unmanned aerial vehicle operator, and specifically, the unmanned aerial vehicle operator can directly use an information entry module configured by the target unmanned aerial vehicle and issue an instruction containing the real task information to the target unmanned aerial vehicle by inputting the information. For example, the information entry module may be embodied as a keyboard, and the unmanned aerial vehicle operator manually inputs an instruction containing real task information in a manner of operating the keyboard, so that the target unmanned aerial vehicle can obtain the instruction. Or the information input module can be embodied as a voice acquisition module, an unmanned aerial vehicle operator inputs an instruction containing real task information in a voice mode, and the target unmanned aerial vehicle performs voice recognition to determine the real task information.

In addition, for the implementation process of using the deep reinforcement learning model to perform actual navigation of the target unmanned aerial vehicle, the embodiment of the application may not be specifically limited. For ease of understanding, the following description is made in connection with one possible embodiment.

In a possible implementation manner, S103 may specifically include: acquiring real visual information and real task information of a target unmanned aerial vehicle in operation; inputting real visual information and real task information into a deep reinforcement learning model; acquiring the predicted navigation information output by the depth enhancement model; and controlling the target unmanned aerial vehicle to operate based on the predicted navigation information. And the predicted navigation information is the navigation information matched with the real task information. Therefore, the deep reinforcement learning model constructed by the simulation operation information of the target simulation unmanned aerial vehicle in the simulation operation environment can obtain an actual operation scheme after providing real visual information and real character information to realize the navigation of the target unmanned aerial vehicle, the target unmanned aerial vehicle does not need to be familiar with the environment in advance, the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the function of the unmanned aerial vehicle is exerted to the maximum.

It can be understood that, in the course of above-mentioned unmanned aerial vehicle navigation, not only can carry out relevant operation to target unmanned aerial vehicle, for example construct unmanned aerial vehicle simulation environment, construct the degree of depth reinforcement learning model that is used for unmanned aerial vehicle navigation etc. can also carry out above-mentioned operation respectively to different unmanned aerial vehicles to the realization is to the autonomic navigation of various types of unmanned aerial vehicles. In order to facilitate understanding of a navigation method for a specific drone, in the embodiments of the present application, a detailed description is made by taking a target drone as an example.

Based on the relevant contents of S101 to S103, in the embodiment of the application, a simulation environment corresponding to the simulation target unmanned aerial vehicle may be constructed first through the device parameters of the target unmanned aerial vehicle and the environment parameters of different known environments, and then a deep reinforcement learning model for unmanned aerial vehicle navigation is constructed based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment. Therefore, when the target unmanned aerial vehicle runs, navigation can be performed by using the deep reinforcement learning model according to the real running information of the target unmanned aerial vehicle. Because the deep reinforcement learning model is constructed based on the simulation operation information of the simulation target unmanned aerial vehicle in the simulation environment, the model can provide a simulation operation scheme of the simulation target unmanned aerial vehicle in the simulation environment. Like this, when target unmanned aerial vehicle actual operation, utilize this model and target unmanned aerial vehicle's true operation information, alright obtain actual operation scheme and realize target unmanned aerial vehicle's navigation, need not to be familiar with the environment in advance to improve unmanned aerial vehicle navigation's efficiency and accuracy, maximize ground play unmanned aerial vehicle's effect.

In order to realize accurate autonomous navigation of the unmanned aerial vehicle, the embodiment of the application can adopt deep reinforcement learning to navigate the target unmanned aerial vehicle. Based on this, the embodiment of the present application may provide one possible implementation manner of building a deep reinforcement learning model. It may specifically include S201-S203. S201 to S203 are described below with reference to the embodiments and the drawings, respectively.

Fig. 2 is a flowchart of an implementation manner of building a deep reinforcement learning model according to an embodiment of the present disclosure. As shown in fig. 2, S201 to S203 may specifically include:

s201: and constructing a navigation strategy model for planning navigation information based on the simulation operation information and by utilizing a deep learning algorithm.

For the construction process of the navigation policy model, the embodiment of the present application is not particularly limited, and for convenience of understanding, the following description is made in conjunction with a possible implementation manner.

In a possible implementation, S201 may specifically include: taking simulation visual information of the simulation target unmanned aerial vehicle in the simulation environment as input of a navigation prediction model, and taking simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment as output of the navigation prediction model to construct the navigation prediction model; the method comprises the steps that simulation task information of a simulation target unmanned aerial vehicle in a simulation environment and the output of a navigation prediction model are jointly used as the input of a navigation matching model, the matching degree between the output of the navigation prediction model and the simulation task information is used as the output of the navigation matching model, and the navigation matching model is constructed; and constructing a navigation strategy model based on the navigation prediction model and the navigation matching model. Here, the navigation prediction model may predict the navigation path based on the simulation visual information of the simulation target unmanned aerial vehicle in the simulation environment, and the navigation matching model may determine the matching degree between the navigation path predicted by the navigation prediction model and the simulation task information, thereby facilitating task execution of the task.

Wherein, the navigation prediction model can be composed of a multi-layer network structure. Specifically, the simulated visual information takes, as an example, an RGB image captured by an RGB camera subjected to feature processing, a depth image captured by a depth camera subjected to feature processing, and an image recognition result of the depth image, and deep learning models corresponding to three different types of simulated visual information are processed to jointly form a navigation prediction model. The network structure of the deep learning model corresponding to the RGB image can be embodied as a first layer of a ResNet50 network, and a second layer of a full connection layer; the network structures of the deep learning models corresponding to the image recognition results of the depth image and the depth image respectively can be embodied as a first layer which is a CNN (Convolutional Neural network) network, and a second layer which is a full connection layer. Further, the processing of the deep learning network corresponding to the three different types of simulated visual information may include performing joint embedding training on the three types of network structures, embedding the three types of network structures into the same vector space for information fusion, and storing the three types of network structures through a memory network.

In addition, the navigation matching model may be a classification model, such as a transform model. For example, if the simulation task information is embodied as a boy searching for a yellow hat, when the simulation navigation information is embodied as a boy of at least one yellow hat appearing in the simulation visual information of the target unmanned aerial vehicle, the matching degree output by the navigation matching model may be represented as matching; when the simulated navigation information is embodied as boys without yellow caps in the simulated visual information of the target unmanned aerial vehicle, the matching degree output by the navigation matching model can be represented as mismatching.

S202: and constructing a navigation evaluation model for evaluating navigation information based on the navigation strategy model and by using a reinforcement learning algorithm.

For the construction process of the navigation evaluation model, the embodiment of the present application is not particularly limited, and for the convenience of understanding, the following description is made in conjunction with a possible implementation manner.

In a possible implementation manner, S202 may specifically include: acquiring navigation information matched with the simulation task information from the simulation navigation information output by the navigation prediction model as target navigation information; and taking the output of the navigation strategy model and the target navigation information as the input of the navigation evaluation model, and taking the reward evaluation value corresponding to the target navigation information as the output of the navigation evaluation model to construct the navigation evaluation model. Here, the reward evaluation value may be determined based on a criterion required at the time of actual application, for example, a completion time period based on the simulation task information, or the like. Therefore, the accuracy of the navigation prediction model is judged by evaluating the target navigation information, so that the navigation prediction model is optimized based on the navigation evaluation model.

S203: and optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model converges, and taking the converged navigation strategy model as a deep reinforcement learning model.

The advantages and disadvantages of the target navigation information can be obtained based on the navigation evaluation model, so that the navigation evaluation model is used for optimizing the parameters of the navigation strategy model, and the accuracy of the navigation prediction model is further improved, so that the navigation efficiency and accuracy of the unmanned aerial vehicle are improved, and the unmanned aerial vehicle plays a role to the maximum.

Based on the relevant contents of the above S201-S203, in the embodiment of the present application, the navigation information is planned by constructing the navigation policy model, the navigation evaluation model is constructed to evaluate the navigation information and update and optimize the navigation policy model, so that when the target unmanned aerial vehicle actually operates, an actual operation scheme can be obtained by using the finally obtained deep reinforcement learning model and the actual operation information of the target unmanned aerial vehicle to realize the navigation of the target unmanned aerial vehicle, and the unmanned aerial vehicle does not need to be familiar with the environment in advance, thereby improving the efficiency and accuracy of the unmanned aerial vehicle navigation, and maximally exerting the function of the unmanned aerial vehicle.

Based on the unmanned aerial vehicle navigation method provided by the embodiment, the embodiment of the application further provides an unmanned aerial vehicle navigation device. The unmanned aerial vehicle navigation device is described below with reference to the embodiments and the accompanying drawings respectively.

Fig. 3 is a schematic structural diagram of an unmanned aerial vehicle navigation apparatus provided in an embodiment of the present application. With reference to fig. 3, an unmanned aerial vehicle navigation apparatus 300 provided in an embodiment of the present application may include:

the simulation environment construction module 301 is configured to construct a simulation environment corresponding to the simulation target unmanned aerial vehicle based on the device parameters of the target unmanned aerial vehicle and the environment parameters of different known environments;

the model building module 302 is used for building a deep reinforcement learning model for unmanned aerial vehicle navigation based on simulation operation information of a simulation target unmanned aerial vehicle in a simulation environment;

and the navigation module 303 is configured to perform navigation by using a deep reinforcement learning model according to the actual operation information of the target unmanned aerial vehicle when the target unmanned aerial vehicle operates.

In this application embodiment, through the cooperation of simulation environment construction module 301, model construction module 302 and navigation module 303 three, when target unmanned aerial vehicle actual operation, utilize this model and target unmanned aerial vehicle's true operation information, alright obtain actual operation scheme and realize target unmanned aerial vehicle's navigation, need not to be familiar with the environment in advance to improve unmanned aerial vehicle navigation's efficiency and accuracy, maximize ground play unmanned aerial vehicle's effect.

As an embodiment, in order to improve the efficiency and accuracy of the drone navigation, the model building module 302 may specifically include:

the navigation strategy model building module is used for building a navigation strategy model for planning navigation information based on simulation operation information and by utilizing a deep learning algorithm;

the navigation evaluation model building module is used for building a navigation evaluation model for evaluating navigation information based on the navigation strategy model and by utilizing a reinforcement learning algorithm;

and the model optimization module is used for optimizing the navigation strategy model based on the navigation evaluation model until the navigation strategy model converges, and taking the converged navigation strategy model as a deep reinforcement learning model.

As an embodiment, in order to improve the efficiency and accuracy of the unmanned aerial vehicle navigation, the navigation policy model building module may specifically include:

the first construction module is used for constructing the navigation prediction model by taking simulation visual information of the simulation target unmanned aerial vehicle in a simulation environment as the input of the navigation prediction model and taking simulation navigation information of the simulation target unmanned aerial vehicle in the simulation environment as the output of the navigation prediction model;

the second construction module is used for constructing the navigation matching model by taking the simulation task information of the simulation target unmanned aerial vehicle in the simulation environment and the output of the navigation prediction model as the input of the navigation matching model and taking the matching degree between the output of the navigation prediction model and the simulation task information as the output of the navigation matching model;

and the third construction module is used for constructing a navigation strategy model based on the navigation prediction model and the navigation matching model.

As an embodiment, in order to improve the efficiency and accuracy of the unmanned aerial vehicle navigation, the navigation evaluation model building module may specifically include:

the fourth construction module is used for acquiring navigation information matched with the simulation task information from the simulation navigation information output by the navigation prediction model as target navigation information;

and the fifth construction module is used for taking the output of the navigation strategy model and the target navigation information as the input of the navigation evaluation model, and taking the reward evaluation value corresponding to the target navigation information as the output of the navigation evaluation model to construct the navigation evaluation model.

As an embodiment, in order to improve the efficiency and accuracy of drone navigation, the drone navigation device 300 further includes:

the navigation test module is used for constructing a navigation test environment of the target unmanned aerial vehicle and performing navigation test on the deep reinforcement learning model in the navigation test environment to obtain a test result;

the test random information determining module is used for determining test random information based on the simulation environment and the navigation test environment;

and the first model updating module is used for updating the deep reinforcement learning model according to the test result and the test random information.

As an embodiment, in order to improve the efficiency and accuracy of the navigation of the drone, the navigation module 303 specifically includes:

the first navigation module is used for acquiring real visual information and real task information of the target unmanned aerial vehicle during operation;

the second navigation module is used for inputting the real visual information and the real task information into the deep reinforcement learning model;

the third navigation module is used for acquiring the predicted navigation information output by the depth enhancement model; predicting the navigation information to be the navigation information matched with the real task information;

and the fourth navigation module is used for controlling the target unmanned aerial vehicle to operate based on the predicted navigation information.

As an embodiment, in order to improve the efficiency and accuracy of the drone navigation, the simulation environment building module 301 may specifically include:

the digital twin model building module is used for building a digital twin model corresponding to the target unmanned aerial vehicle as a simulation target unmanned aerial vehicle according to the equipment parameters of the target unmanned aerial vehicle;

the simulation environment set building module is used for building simulation environment sets corresponding to different known environments according to the environment parameters of the different known environments;

and the simulation environment construction submodule is used for constructing a simulation environment based on the digital twin model and the simulation environment set.

As an embodiment, in order to improve the efficiency and accuracy of the unmanned aerial vehicle navigation, the digital twin model building module may specifically include:

the first simulation model building module is used for building a simulation model of the control and state estimation system of the simulation target unmanned aerial vehicle based on the test parameters of the control and state estimation system in the equipment parameters;

the second simulation model building module is used for building a power system simulation model of the simulation target unmanned aerial vehicle based on simulation control parameters output by the control and state estimation system simulation model and power system test parameters in the equipment parameters;

the third simulation model building module is used for building a dynamic simulation model of the simulation target unmanned aerial vehicle based on the simulation dynamic system parameters output by the dynamic system simulation model and the dynamic model test parameters in the equipment parameters;

the fourth simulation model building module is used for building a rigid motion simulation model of the simulation target unmanned aerial vehicle based on simulation kinetic parameters output by the kinetic simulation model and rigid motion model test parameters in the equipment parameters;

and the fifth simulation model building module is used for building a digital twin model according to the control and state estimation system simulation model, the power system simulation model, the dynamics simulation model and the rigid motion simulation model.

As an embodiment, in order to improve the efficiency and accuracy of drone navigation, the drone navigation model further includes:

the simulation motion parameter acquisition module is used for acquiring simulation motion parameters output by the rigid motion simulation model;

and the second model updating module is used for updating the control and state estimation system simulation model and/or the dynamic simulation model according to the simulation motion parameters.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An unmanned aerial vehicle navigation method is characterized by comprising the following steps:

2. The method of claim 1, wherein the building of the deep reinforcement learning model for drone navigation based on the simulation operation information of the simulation target drone in the simulation environment comprises:

3. The method of claim 2, wherein constructing a navigation strategy model for planning navigation information based on the simulation run information and using a deep learning algorithm comprises:

4. The method of claim 3, wherein constructing a navigation evaluation model for evaluating the navigation information based on the navigation strategy model and using a reinforcement learning algorithm comprises:

5. The method of any of claims 1 to 4, wherein before navigating with the deep reinforcement learning model according to the real operation information of the target drone when the target drone is operating, the method further comprises:

6. The method according to any one of claims 1 to 4, wherein the navigating according to the real operation information of the target unmanned aerial vehicle and by using the deep reinforcement learning model when the target unmanned aerial vehicle is operating comprises:

acquiring the predicted navigation information output by the depth enhancement model; the predicted navigation information is navigation information matched with the real task information;

7. The method according to any one of claims 1 to 4, wherein the constructing a simulation environment corresponding to the simulated target drone based on the device parameters of the target drone and the environment parameters of different known environments comprises:

8. The method of claim 7, wherein the constructing a digital twin model corresponding to the target drone as the simulated target drone according to the device parameters of the target drone comprises:

constructing a dynamic simulation model of the simulation target unmanned aerial vehicle based on the simulation dynamic system parameters output by the dynamic system simulation model and the dynamic model test parameters in the equipment parameters;

constructing a rigid motion simulation model of the simulation target unmanned aerial vehicle based on simulation kinetic parameters output by the kinetic simulation model and rigid motion model test parameters in the equipment parameters;

and constructing the digital twin model according to the control and state estimation system simulation model, the power system simulation model, the dynamic simulation model and the rigid motion simulation model.

9. The method of claim 8, further comprising:

10. An unmanned aerial vehicle navigation head, comprising: