CN117906614A

CN117906614A - Light autonomous navigation method and device based on neural network driving

Info

Publication number: CN117906614A
Application number: CN202410102883.0A
Authority: CN
Inventors: 石飞; 孟子阳
Original assignee: Tsinghua University; Beijing Information Science and Technology University
Current assignee: Tsinghua University; Beijing Information Science and Technology University
Priority date: 2024-01-24
Filing date: 2024-01-24
Publication date: 2024-04-19

Abstract

The invention relates to the technical field of unmanned aerial vehicle navigation, in particular to a light autonomous navigation method and device based on neural network driving, wherein the method comprises the following steps: acquiring and training a preset depth reinforcement learning network by using sparse Lidar data, polar coordinates of an unmanned aerial vehicle relative to a target point and linear speed and angular speed data of the unmanned aerial vehicle at the previous moment; quantifying the trained deep reinforcement learning network to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle to obtain the unmanned aerial vehicle with the deep reinforcement learning network; and acquiring real-time sparse Lidar ranging data, real-time polar coordinate data, and real-time linear speed and real-time angular speed of the previous moment of the real-time sparse Lidar ranging data, and automatically navigating to a target position by an unmanned aerial vehicle with a deep reinforcement learning network based on the real-time data. Therefore, the problems that the neural network algorithm adopted by the existing nano unmanned aerial vehicle often has high computational complexity, large memory occupation and the like are solved.

Description

Light autonomous navigation method and device based on neural network driving

Technical Field

The invention relates to the technical field of unmanned aerial vehicle navigation, in particular to a light-weight autonomous navigation method and device based on neural network driving.

Background

With rapid advances in technology, the demands for the intellectualization, autonomy, and miniaturization of unmanned systems are increasing. Because of its small size and light volume, the unmanned nanoscale aircraft is able to enter many narrow and high-risk areas, such as indoor spaces, pipes or building seams, while ensuring a safe distance from the person. Autonomous navigational capability is one of the most important and fundamental capabilities of a drone. While the traditional "sense-locate-map-plan-control" navigation framework has been validated in many applications, its high demand for computing resources makes it less suitable for nano-unmanned aircraft. Levine et al show that an end-to-end learning approach is beneficial, so that a neural network driven "end-to-end" navigation strategy can provide a new solution to this problem by means of a lightweight deep learning approach.

Laser navigation technology has been widely used in the field of autopilot with its precision and rapidity. As an efficient ranging sensor, liDAR (laser radar) is capable of providing accurate distance, location and velocity information. However, the amount of dense LiDAR data collected during autopilot is excessive and very time consuming to process. Mark et al train a Convolutional Neural Network (CNN) on an expert demonstration basis using 1080-dimensional laser ranging data to learn a mapping from the original two-dimensional laser ranging results and target locations to the steering commands required by the robot. However, the data volume is too large, so that the method is not suitable for being used on a nano unmanned aerial vehicle. The sparse LiDAR data only needs a small amount of dimension information, so that the data processing efficiency is greatly improved. Tai et al takes 10-dimensional laser radar data and a target position as inputs, and takes a continuous steering command as an output of the unmanned vehicle, which proves that the target position can be reached through a deep reinforcement learning network by using only sparse radar data. However, the network frame is large, and the network frame cannot be directly deployed on the nano unmanned aerial vehicle, but the sparse LiDAR sensor with low data volume can be proved from the side surface to be an ideal sensor of the nano unmanned aerial vehicle.

Nevertheless, the research of autonomous navigation of nano unmanned aerial vehicles using sparse LiDAR is still relatively limited. At present, light autonomous navigation technology driven by a neural network is not researched much. Existing neural network algorithms often have the problems of high computational complexity and large memory occupation. Therefore, a low-power-consumption neural network model is designed, and navigation is performed by using sparse LiDAR data and target position information, so that the method is an incompletely explored field for the unmanned aerial vehicle.

Disclosure of Invention

The invention provides a light-weight autonomous navigation method and device based on neural network driving, which are used for solving the problems that a neural network algorithm adopted by the existing nano unmanned aerial vehicle often has high computational complexity, large memory occupation and the like.

An embodiment of a first aspect of the present invention provides a lightweight autonomous navigation method based on neural network driving, including the steps of: acquiring sparse Lidar data, polar coordinates of an unmanned aerial vehicle relative to a target point and linear speed and angular speed data of the unmanned aerial vehicle at the previous moment, and training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates and the linear speed and angular speed data of the unmanned aerial vehicle at the previous moment to obtain a trained deep reinforcement learning network; quantifying the trained deep reinforcement learning network, so as to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle, thereby obtaining the unmanned aerial vehicle with the deep reinforcement learning network; and acquiring real-time sparse Lidar ranging data, real-time polar coordinate data of the unmanned aerial vehicle relative to a target position, and real-time linear speed and real-time angular speed of the unmanned aerial vehicle at the previous moment, wherein the unmanned aerial vehicle with the depth reinforcement learning network is used for autonomous navigation to the target position based on the real-time sparse Lidar ranging data, the real-time polar coordinate data, the real-time linear speed and the real-time angular speed.

Optionally, the acquiring sparse Lidar data, polar coordinates of the unmanned aerial vehicle relative to the target point, and linear velocity and angular velocity data of a previous moment thereof, and training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data of the previous moment, to obtain a trained deep reinforcement learning network, including:

constructing a simulation scene on a Gazebo simulation platform, and determining the preset deep reinforcement learning network;

acquiring the sparse Lidar data, the polar coordinates and the linear speed and angular speed data of the previous moment in the simulation scene;

and training the preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates and the linear velocity and angular velocity data of the previous moment to obtain the trained deep reinforcement learning network.

Optionally, training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data of the previous moment to obtain a trained deep reinforcement learning network, including:

Based on a preset action equation, solving the next moment action of the unmanned aerial vehicle according to the sparse Lidar data, the polar coordinates and the linear speed and angular speed data of the previous moment;

And based on a reward function, iteratively training the preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates, the linear speed and angular speed data of the previous moment and the action of the next moment to obtain the trained deep reinforcement learning network.

Optionally, the preset action equation is:

V_t＝F(S_t,P_t,V_t-1)

Wherein V _t is the next moment of the unmanned aerial vehicle, S _t is sparse Lidar data, P _t is the relative position of the target, and V _t-1 is the last moment of the unmanned aerial vehicle.

Optionally, the reward function is:

Wherein r (S _t,a_t) is the return obtained according to the reward and punishment strategy, rarrive is the positive reward, rcollide is the negative reward, V _t-1 is the movement condition of the unmanned aerial vehicle at a moment, d _t is the distance between the current moment of the unmanned aerial vehicle and the target, c _d is the distance between the unmanned aerial vehicle and the periphery of the target, min _x is the minimum ranging reading in sparse Lidar data, and c _o is the distance between the unmanned aerial vehicle and the obstacle.

Optionally, the quantifying the trained deep reinforcement learning network to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle to obtain the unmanned aerial vehicle with the deep reinforcement learning network includes:

Quantizing the trained deep reinforcement learning network by using the sparse Lidar data, the polar coordinates and the linear velocity and angular velocity data of the previous moment to obtain a network tensor;

dividing the network tensor by using a preset optimal tensor dividing mode to obtain an optimal data partition, and generating a C code of the trained deep reinforcement learning network according to the optimal data partition;

and deploying the C code on a computing chip of the unmanned aerial vehicle to obtain the unmanned aerial vehicle with the deep reinforcement learning network.

Optionally, the quantifying the trained deep reinforcement learning network by using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data at the previous moment to obtain a network tensor includes:

And mapping the sparse Lidar data, the polar coordinates and the linear speed and angular speed data at the previous moment onto an N-bit pure integer tensor t$ by using a reference range [ alpha _t,β_t ] of tensor t of each layer of Actor network to obtain the network tensor.

An embodiment of a second aspect of the present invention provides a lightweight autonomous navigation device based on neural network driving, including:

The off-line training module is used for acquiring sparse Lidar data, polar coordinates of the unmanned aerial vehicle relative to a target point and linear speed and angular speed data of the unmanned aerial vehicle at the previous moment, and training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates and the linear speed and angular speed data of the unmanned aerial vehicle at the previous moment to obtain a trained deep reinforcement learning network;

The on-line deployment module is used for quantifying the trained deep reinforcement learning network so as to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle to obtain the unmanned aerial vehicle with the deep reinforcement learning network;

And the online navigation module is used for acquiring real-time sparse Lidar ranging data, real-time polar coordinate data of the unmanned aerial vehicle relative to the target position, and real-time linear velocity and real-time angular velocity of the unmanned aerial vehicle at the previous moment, and the unmanned aerial vehicle with the deep reinforcement learning network automatically navigates to the target position based on the real-time sparse Lidar ranging data, the real-time polar coordinate data, the real-time linear velocity and the real-time angular velocity.

An embodiment of a third aspect of the present invention provides an electronic device, including: the system comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the program to realize the lightweight autonomous navigation method based on the neural network drive.

A fourth aspect of the present invention provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the neural network drive-based lightweight autonomous navigation method as above.

The light autonomous navigation method based on the neural network driving provided by the embodiment of the invention can be suitable for intelligent application of the unmanned platform under the extremely low power consumption requirement, utilizes sparse Lidar ranging data, polar coordinates of the unmanned aerial vehicle relative to a target point, linear speed and angular speed at the last moment of the unmanned aerial vehicle as input data of a deep reinforcement learning network, utilizes a reward function to enable the unmanned aerial vehicle to learn autonomous navigation of the target driving in a simulation environment, quantifies trained network parameters, decomposes each layer of network to be deployed on an extremely low power consumption edge computing chip, namely realizes deployment and operation of the autonomous navigation network on the extremely low power consumption computing chip, and has important significance for autonomous of the nano unmanned platform only using Lidar as a sensing unit.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

fig. 1 is a flowchart of a lightweight autonomous navigation method based on neural network driving according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a GAP8 chip architecture according to an embodiment of the present invention;

FIG. 3 is a schematic view of a simulation environment provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of the input/output of a deep reinforcement learning network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a deep reinforcement learning network according to an embodiment of the present invention;

FIG. 6 is a graph of success rate of a deep reinforcement learning network provided by an embodiment of the present invention in three scenarios;

FIG. 7 is a GAPFlow tool chain diagram for use with an embodiment of the present invention;

Fig. 8 is a schematic block diagram of a lightweight autonomous navigation device based on neural network driving according to an embodiment of the present invention;

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.

The following describes a lightweight autonomous navigation method and device based on neural network driving according to an embodiment of the present invention with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a lightweight autonomous navigation method based on neural network driving according to an embodiment of the present invention.

As shown in fig. 1, the lightweight autonomous navigation method based on neural network driving includes the following steps:

In step S101, sparse Lidar data, polar coordinates of the unmanned aerial vehicle relative to the target point, and linear velocity and angular velocity data of the unmanned aerial vehicle at a previous time are obtained, and a preset deep reinforcement learning network is trained by using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data of the unmanned aerial vehicle at the previous time, so as to obtain a trained deep reinforcement learning network.

Further, in an embodiment of the present invention, sparse Lidar data, polar coordinates of an unmanned aerial vehicle relative to a target point, and linear velocity and angular velocity data of a previous time are obtained, and a preset deep reinforcement learning network is trained by using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data of the previous time, so as to obtain a trained deep reinforcement learning network, including:

constructing a simulation scene on a Gazebo simulation platform, and determining a preset deep reinforcement learning network;

acquiring sparse Lidar data, polar coordinates and linear speed and angular speed data of the previous moment in a simulation scene;

And training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates and the linear velocity and angular velocity data at the previous moment to obtain a trained deep reinforcement learning network.

Further, in an embodiment of the present invention, training a preset deep reinforcement learning network using sparse Lidar data, polar coordinates, and linear velocity and angular velocity data of a previous time to obtain a trained deep reinforcement learning network includes:

based on a preset action equation, solving the action of the unmanned aerial vehicle at the next moment according to sparse Lidar data, polar coordinates and linear speed and angular speed data of the previous moment;

Based on the reward function, the preset deep reinforcement learning network is trained by utilizing sparse Lidar data, polar coordinates, linear speed and angular speed data of the previous moment and actions of the next moment in an iterative mode, and the trained deep reinforcement learning network is obtained.

Specifically, a special simulation scene and model are built on a Gazebo simulation platform, a proper depth reinforcement learning algorithm is selected, the polar coordinates of the unmanned aerial vehicle relative to the target point and the linear speed and the angular speed of the unmanned aerial vehicle at the previous moment are used as input data, and the action of the unmanned aerial vehicle at the next moment can be predicted through the processing of a neural network.

The preset action equation is as follows:

V_t＝F(S_t,P_t,V_t-1)

The goal of reinforcement learning is to maximize the jackpot of the drone, which ultimately learns a mapping of environmental states to actions by interacting with the environment. Wherein the bonus function is set to:

If the drone reaches the target through the threshold check c _d, a positive reward is given (r _arrive), but if the drone collides with an obstacle through the minimum range reading check, a negative reward is given (r _collide). And (3) through continuous and repeated training iteration, obtaining an efficient deep reinforcement learning network model and saving weight data generated by training.

Meanwhile, a proper weight data set is selected and used for training parameters of the deep reinforcement learning network so that the estimation accuracy of the network meets the requirement, and weight data generated in an offline training process is used.

In step S102, the trained deep reinforcement learning network is quantized, so that the trained deep reinforcement learning network is deployed on a computing chip of the unmanned aerial vehicle, and the unmanned aerial vehicle with the deep reinforcement learning network is obtained.

Further, in one embodiment of the present invention, quantifying the trained deep reinforcement learning network to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle to obtain the unmanned aerial vehicle with the deep reinforcement learning network, includes:

Quantizing the trained deep reinforcement learning network by using sparse Lidar data, polar coordinates and linear speed and angular speed data at the previous moment to obtain a network tensor;

Specifically, sparse Lidar ranging data, polar coordinates of an unmanned aerial vehicle relative to a target position, and linear speed and angular speed data of the unmanned aerial vehicle at the previous moment are obtained to be used as input of a trained deep reinforcement learning network, the reference range [ alpha _t,β_t ] of tensor t of each layer of Actor network is used to map the sparse Lidar data, the polar coordinates and the linear speed and angular speed data of the previous moment onto an N-bit pure integer tensor t$, so as to obtain a network tensor,

ε_t＝(β_t-α_t)(2_N-1)

Where ε _t is commonly referred to as a scaling factor, scaling the tensor from a floating point number to an integer representation. All tensors in the network are forced to be converted into quantized tensor form by the quantization flow. All tensors in the network were post-8-bit trained and quantized using the NNTOOL tool developed by GWT (GreenWaves Technologies).

The model is deployed on the chip to realize and exert the hardware performance mainly by generating a C code which directly controls the calculation of the bottom memory. The core problem is how to achieve maximum parallelism for all available cores while reducing the time-consuming data transfer. To achieve optimal data partitioning, autoTiler tools are used, autoTiler tools can generate a tile structure based on the memory budget of the shared L1 memory. With this tiling, data is then pipelined in and out of L2 or external memory to keep the kernel busy. In the iterative process, a call to the basic kernel on the tiling parameters is also inserted at the requested location. And calculating the optimal data partition by using AutoTiler tools, generating a C code, moving the partitions between different memory levels available on the chip, and simultaneously efficiently running a network model on the eight-core cluster to obtain the unmanned aerial vehicle with the deep reinforcement learning network.

In step S103, real-time sparse Lidar ranging data, real-time polar coordinate data of the unmanned aerial vehicle relative to the target position, and real-time linear velocity and real-time angular velocity of the unmanned aerial vehicle at the previous moment are collected, and the unmanned aerial vehicle with the deep reinforcement learning network is autonomously navigated to the target position based on the real-time sparse Lidar ranging data, the real-time polar coordinate data, the real-time linear velocity and the real-time angular velocity.

Specifically, the calculation process of the trained deep reinforcement learning network on the ultra-low power consumption edge calculation chip is packaged into a code expressed by C language, and before the code is called, the ranging data of the real-time sparse Lidar, the real-time linear speed and the real-time angular speed of the unmanned aerial vehicle at the last moment and the real-time polar coordinate information of the unmanned aerial vehicle from the target point are firstly required to be collected. These data are retrieved and stored into an array by calling the interface in gap sdk. Then, the real-time sparse Lidar ranging data, the real-time polar coordinates of the unmanned aerial vehicle relative to the target position, the real-time linear speed and the real-time angular speed at the previous moment are used as inputs to be transmitted to a network for calculation, and the linear speed and the angular speed of the unmanned aerial vehicle at the next moment can be predicted after processing. By repeatedly performing this process, it can be ensured that the nano-unmanned aerial vehicle approaches its target position with higher accuracy and speed.

The following embodiment further describes a lightweight autonomous navigation method based on neural network driving.

It should be noted that, this specific embodiment is deployed on AIdeck platforms, AIdeck mainly carries the GAP8 chip and Himax camera of the GWT, which enhances the computing power of the nano unmanned aerial vehicle, so that the complex task based on artificial intelligence can be efficiently run on the nano unmanned aerial vehicle platform. Notably, as shown in fig. 2, the GAP8 chip is a commercial-level embedded RISC-V multi-core processor developed based on the open source project of the PULP, and includes two computing domains: (1) Fabric Controller (FC) and 512KB L2 memories for control tasks; (2) A cluster domain containing 8 cores for parallel computing of high-demand workload and 64KB of directly accessible L1 memory. In addition, the code implementation of this particular embodiment uses the Python language and employs Pytorch as its main deep learning framework.

As shown in fig. 2, this embodiment uses Gazebo simulation platform on ubuntu 20.04 system, models environment and robot 3D based on ROS (Robot Operating System ), builds system frame, creates subscription and release mechanism, obtains information such as message and service from environment, carries out simulation experiment based on this, and uses sparse lidar ranging data to train, quantify network and verify final deployment result.

The training scene is a10 m room, no obstacle exists in the scene I, and only one unmanned plane reaches a target point in advance; four fixed obstacles and a target point which is reached by the unmanned aerial vehicle in the second scene; scene three is complex, two movable barriers, a plurality of irregular barriers and a target point which is reached by the unmanned aerial vehicle in advance are arranged; the unmanned aerial vehicle in the first scene can reach the target point only by not striking the wall, the success rate is highest, the two scenes are complex, the unmanned aerial vehicle needs to move to the target point while avoiding fixed obstacles, the obtained return is less than that of the first scene, the moving obstacles in the third scene are used as scenes of the verification network, the obstacle avoidance and navigation capability of the unmanned aerial vehicle are more tested, and the algorithm can be generalized into a complex environment through training of the first two scenes.

In this embodiment, a representative depth deterministic strategy Gradient algorithm DDPG (DEEP DETERMINISTIC Policy Gradient) in a depth reinforcement learning algorithm is used, the input and output results of the network are shown in fig. 4, the network structure of DDPG is shown in fig. 5, for the Actor network, after the input information passes through three full-connection layers with 500 neurons, the input information is activated by using a sigmoid function, because the linear velocity is guaranteed to be non-negative, and finally the input information is fused into a two-dimensional output action. Actions in the Critic network are connected to a layer of 250 neurons, while other inputs are connected to another layer of 250 neurons. Both layers are joined to form a first hidden layer. And then, outputting the Q value of the state and action pair through two fully connected neural networks. Finally, the output Q value is also activated by a linear activation function y=kx+b. Where x is the input to the last layer, k and b are the training weights and bias for this layer, and y is the predicted q value. Since the reinforcement learning drone seeks to obtain the maximum return from the reward function, the following settings are made for the reward function:

Wherein r (S _t,a_t) is the return obtained according to the rewarding and punishing strategy, rarrive is the positive rewarding, rcollide is the negative rewarding, V _t-1 is the movement condition of the unmanned aerial vehicle at a moment, d _t is the distance between the current moment of the unmanned aerial vehicle and the target, c _d is the distance between the unmanned aerial vehicle and the periphery of the target, which is set to 0.25 meter, min _x is the minimum ranging reading in sparse Lidar data, and c _o is the distance between the unmanned aerial vehicle and the obstacle, which is set to 0.5 meter.

If the drone reaches the target through the threshold check c _d, a positive reward is given (r _arrive), but if the drone collides with an obstacle through the minimum range reading check, a negative reward is given (r _collide). The experimental setup navigates from one location to another in the map, 100 times in each scene, recording the success rate of eventually reaching the target point as fig. 6. And (3) through continuous and repeated training iteration, obtaining an efficient deep reinforcement learning network model and saving weight data generated by training.

As shown in fig. 7, the use of GAPFlow tool chains developed with GWT in this particular embodiment, includes two tools named NNTOOL and AutoTiler. The deep reinforcement learning network is migrated to GAP8 using neural network mapping tool NNTOOL, but since RISC-V kernel of GAP8 has no floating point arithmetic unit, the input, weight and bias of the neural network need to be quantized to 8-bit integer or 16-bit fixed point values, so the quantization of NNTool is particularly important. Therefore, NNTool in the embodiment of the present invention adopts ONNX DRL to adjust the DRL architecture, convert it into a format that can be used by AutoTiler, and convert the weight data into a format that can be flashed to GAP 8. The use of NNTool tools to implement 8-bit training and quantization of DRL can increase the network operation speed on GAP8 chip and effectively reduce the memory consumption.

The core challenge in deploying the model to the GAP8 chip is how to reduce the cost of data transfer while ensuring maximum parallel operation for all available cores. The GAP8 chip includes three memories of L1, L2 and L3, wherein the internal memory of GAP8 is divided into two layers: l1 memory and L2 memory, L2 memory size is 512kB, all kernels are accessible, L1 memory is divided into two parts: the 16kB memory for the fabric controller and the 64kB shared memory for the cluster cores and HWCE, the L3 memory needs to be added from outside, and can be connected to the GAP8 chip through four SPI or HyperBus interfaces. GAP8 does not implement data caching between L1 and L2 memory. It is therefore necessary to copy all data from L2 to L1 and vice versa. AutoTiler tools developed by GREENWAVES TECHNOLOGIES can simplify this process. In this embodiment, a AutoTiler tool is used to generate a tile structure based on the memory budget of the shared L1 memory, and then use this tile structure to bring data in or out of L2 or external memory in a pipelined fashion to keep the kernel busy. In the iterative process, a call to the basic kernel on the tiling parameters is also inserted at the requested location. And calculating optimal data partitions by using AutoTiler tools, not needing to manage complex double/triple buffering, generating a C code, moving the partitions between different memory levels available on a chip, and simultaneously efficiently running a network model on an eight-core cluster to finally obtain the unmanned aerial vehicle with a deep reinforcement learning network, wherein the unmanned aerial vehicle with the deep reinforcement learning network carries out autonomous navigation according to real-time sparse Lidar ranging data, real-time polar coordinate data of the unmanned aerial vehicle relative to a target position, and real-time linear speed and real-time angular speed of the unmanned aerial vehicle at the previous moment, and flies to the target position.

According to the light autonomous navigation method based on the neural network drive, which is provided by the embodiment of the invention, the more light deep reinforcement learning network, quantization and deployment on a low-power chip are focused on, after training is carried out on the deep reinforcement learning algorithm in the design layer of the network, 8bit quantization is carried out on trained network parameters, the calculation of the network is deployed on the low-power chip in a reasonable division manner, the real-time operation of the deep reinforcement learning network on the low-power chip is realized through various operations, the method has important significance for the intelligent application of a nano robot platform, and meanwhile, experimental verification shows that the method has good navigation results and acceptable processing speed and good engineering application value.

Next, a lightweight autonomous navigation device based on a neural network drive according to an embodiment of the present invention will be described with reference to the accompanying drawings.

Fig. 8 is a block schematic diagram of a lightweight autonomous navigation device based on neural network driving according to an embodiment of the present invention.

As shown in fig. 8, the neural network drive-based lightweight autonomous navigation device 80 includes: an offline training module 801, an online deployment module 802, and an online navigation module 803.

The offline training module 801 is configured to obtain sparse Lidar data, a polar coordinate of an unmanned aerial vehicle relative to a target point, and linear velocity and angular velocity data of the unmanned aerial vehicle at a previous time, train a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinate, and the linear velocity and angular velocity data of the unmanned aerial vehicle at the previous time, and obtain a trained deep reinforcement learning network. The online deployment module 802 is configured to quantize the trained deep reinforcement learning network, so as to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle, thereby obtaining the unmanned aerial vehicle with the deep reinforcement learning network. The online navigation module 803 is configured to collect real-time sparse Lidar ranging data, real-time polar coordinate data of a relative target position of the unmanned aerial vehicle, and real-time linear velocity and real-time angular velocity of the unmanned aerial vehicle in the previous moment, and autonomously navigate to the target position by the unmanned aerial vehicle with the deep reinforcement learning network based on the real-time sparse Lidar ranging data, the real-time polar coordinate data, the real-time linear velocity and the real-time angular velocity.

It should be noted that the explanation of the embodiments of the lightweight autonomous navigation method based on the neural network drive described above is also applicable to the lightweight autonomous navigation device based on the neural network drive of this embodiment, and will not be repeated here.

According to the light autonomous navigation device based on the neural network driving, which is provided by the embodiment of the invention, the light autonomous navigation device based on the neural network driving focuses on a more light deep reinforcement learning network, quantization and deployment on a low-power chip, after training of a deep reinforcement learning algorithm in a network design level, 8bit quantization is carried out on trained network parameters, and a reasonable division mode is adopted to deploy network calculation on the low-power chip, so that real-time operation of the deep reinforcement learning network on the low-power chip is realized through various operations, the light autonomous navigation device has important significance for intelligent application of a nano robot platform, and meanwhile, experimental verification shows that the light autonomous navigation device has good navigation result and acceptable processing speed and good engineering application value.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device may include:

Memory 901, processor 902, and a computer program stored on memory 901 and executable on processor 902.

The processor 902 implements the lightweight autonomous navigation method based on neural network driving provided in the above embodiment when executing a program.

Further, the electronic device further includes:

a communication interface 903 for communication between the memory 901 and the processor 902.

Memory 901 for storing a computer program executable on processor 902.

Memory 901 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 901, the processor 902, and the communication interface 903 are implemented independently, the communication interface 903, the memory 901, and the processor 902 may be connected to each other through a bus and perform communication with each other. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (PERIPHERAL COMPONENT, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 9, but not only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 901, the processor 902, and the communication interface 903 are integrated on a chip, the memory 901, the processor 902, and the communication interface 903 may communicate with each other through internal interfaces.

The processor 902 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the invention.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the lightweight autonomous navigation method based on neural network driving as above.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "N" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer cartridge (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims

1. The light autonomous navigation method based on the neural network driving is characterized by comprising the following steps of:

acquiring sparse Lidar data, polar coordinates of an unmanned aerial vehicle relative to a target point and linear speed and angular speed data of the unmanned aerial vehicle at the previous moment, and training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates and the linear speed and angular speed data of the unmanned aerial vehicle at the previous moment to obtain a trained deep reinforcement learning network;

Quantifying the trained deep reinforcement learning network, so as to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle, thereby obtaining the unmanned aerial vehicle with the deep reinforcement learning network;

And acquiring real-time sparse Lidar ranging data, real-time polar coordinate data of the unmanned aerial vehicle relative to a target position, and real-time linear speed and real-time angular speed of the unmanned aerial vehicle at the previous moment, wherein the unmanned aerial vehicle with the depth reinforcement learning network is used for autonomous navigation to the target position based on the real-time sparse Lidar ranging data, the real-time polar coordinate data, the real-time linear speed and the real-time angular speed.

2. The neural network drive-based lightweight autonomous navigation method according to claim 1, wherein the acquiring sparse Lidar data, polar coordinates of an unmanned aerial vehicle relative to a target point, and linear velocity and angular velocity data of a previous time thereof, and training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data of the previous time, to obtain a trained deep reinforcement learning network comprises:

3. The neural network drive-based lightweight autonomous navigation method according to claim 1, wherein the training a preset deep reinforcement learning network by using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data at the previous time to obtain a trained deep reinforcement learning network comprises:

4. The neural network drive-based lightweight autonomous navigation method of claim 3, wherein the preset equation of motion is:

V_t＝F(S_t,P_t,V_t-1)

5. A neural network drive based lightweight autonomous navigation method according to claim 3, wherein the reward function is:

6. The neural network drive-based lightweight autonomous navigation method of claim 1, wherein the quantifying the trained deep reinforcement learning network to deploy the trained deep reinforcement learning network on a computing chip of the unmanned aerial vehicle, resulting in an unmanned aerial vehicle with a deep reinforcement learning network, comprises:

7. The neural network drive-based lightweight autonomous navigation method of claim 6, wherein the quantizing the trained deep reinforcement learning network using the sparse Lidar data, the polar coordinates, and the linear velocity and angular velocity data at the previous time to obtain a network tensor comprises:

8. A neural network drive-based lightweight autonomous navigation device, comprising:

9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the neural network drive-based lightweight autonomous navigation method of any of claims 1-7.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that the program is executed by a processor for implementing the neural network drive-based lightweight autonomous navigation method as claimed in any one of claims 1 to 7.