CN115826628A

CN115826628A - NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system and method

Info

Publication number: CN115826628A
Application number: CN202310147930.9A
Authority: CN
Inventors: 周仁建; 王聪; 姚慧敏; 张继花
Original assignee: Chengdu Aeronautic Polytechnic
Current assignee: Chengdu Aeronautic Polytechnic
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-03-21
Anticipated expiration: 2043-02-22
Also published as: CN115826628B

Abstract

The invention discloses a heterogeneous unmanned aerial vehicle visual obstacle avoidance system and method based on a NeRF neural network, and belongs to the technical field of unmanned aerial vehicle visual obstacle avoidance. The invention solves the problem that the SLAM visual obstacle avoidance is unstable by adopting the traditional RGB-D depth camera under the condition that the unmanned aerial vehicle directly irradiates sunlight, and improves the reliability of autonomous flight of the unmanned aerial vehicle.

Description

NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system and method

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle visual obstacle avoidance, and particularly relates to a heterogeneous unmanned aerial vehicle visual obstacle avoidance system and method based on a NeRF neural network.

Background

NeRF (nerve radiation field) is one of the current areas of intense research, and the problem to be solved is how to generate new views from new perspectives given some captured views. The method is different from the traditional three-dimensional reconstruction method which represents the scene as point cloud, grid, voxel and other explicit expressions, and is a novel method, the scene is modeled into a continuous 5D radiation field and is implicitly stored in a neural network, only sparse multi-angle image training with position is needed to be input to obtain a neural radiation field model, and clear pictures at any visual angle can be rendered according to the model.

Fusing deep learning with traditional geometry is a trend in SLAM development. In the prior art, some single-point modules in SLAM are replaced by neural networks, such as feature extraction, feature matching, loop back and depth estimation. Compared with single-point replacement, the NeRF-based SLAM method is a brand-new framework and can replace the traditional SLAM from end to end, and both the design method and the implementation architecture are adopted.

In the autonomous flight process of the unmanned aerial vehicle, flexible obstacle avoidance needs to be achieved, and the method generally adopted is to perform SLAM map reconstruction based on a laser radar auxiliary RGB-D depth camera or a binocular camera.

Although many three-dimensional reconstruction solutions are based on RGB-D or lidar sensors, scene reconstruction from monocular images provides a more convenient solution. RGBD cameras may fail under certain conditions, such as direct sunlight, and the lidar is still heavier than a monocular RGB camera. In addition, stereo cameras reduce the depth estimation problem to a one-dimensional disparity search, but rely on accurate calibration of cameras that are prone to miscalibration in actual operation.

Disclosure of Invention

Aiming at the defects in the prior art, the heterogeneous unmanned aerial vehicle visual obstacle avoidance system and method based on the NeRF neural network provided by the invention solve the problem that the SLAM visual obstacle avoidance is unstable by adopting the traditional RGB-D depth camera under the condition that the unmanned aerial vehicle is directly irradiated by sunlight.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that: a heterogeneous unmanned aerial vehicle visual obstacle avoidance system based on a NeRF neural network comprises a core module, and a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module which are respectively connected with the core module;

the unmanned aerial vehicle monocular camera module is used for collecting images to obtain original images;

the IMU inertial navigation module is used for acquiring the camera attitude;

the core module is used for obtaining a global map according to the original image and the camera posture;

the laser radar module is used for obtaining a path planning result by combining the global map and the core module;

and the unmanned aerial vehicle flight control module is used for controlling the unmanned aerial vehicle according to the path planning result to realize obstacle avoidance of the unmanned aerial vehicle.

The invention has the beneficial effects that: according to the invention, an unmanned aerial vehicle monocular camera module is used for collecting images, an image collecting subunit is arranged in an FPGA editable logic submodule of a core module, the collected images are transmitted to a NeRF-SLAM mapping subunit, and the NeRF-SLAM mapping subunit is combined with an IMU inertial navigation module and an attitude solution subunit, so that a scene is modeled into a continuous 5D radiation field, and SLAM mapping can be performed through a three-dimensional scene implicit stored in a neural network. Uploading the map building result to an APU application processor sub-module through a MicroBlaze processor unit to update a global map of an obstacle avoidance system, planning a flight path by combining a laser radar module and a laser radar SLAM unit, and sending the path planning result to an unmanned aerial vehicle flight control system to control an unmanned aerial vehicle so as to realize visual obstacle avoidance of the unmanned aerial vehicle; the monocular camera module of the unmanned aerial vehicle is used, so that the weight and the size of the unmanned aerial vehicle are reduced; meanwhile, the concept of heterogeneous treatment is adopted, the generalization capability of the NeRF-SLAM radiation nerve field is improved, and the NeRF-SLAM radiation nerve field cannot fall into a local minimum value.

Furthermore, the core module comprises an APU application processor submodule and an FPGA editable logic submodule connected with the APU application processor submodule;

the APU application processor sub-module comprises a laser radar SLAM unit, a global map unit and a flight path planning unit which are connected in sequence; the laser radar SLAM unit is connected with the laser radar module; the global map unit is connected with the FPGA editable logic sub-module; the flight path planning unit is connected with the unmanned aerial vehicle flight control module;

the FPGA editable logic sub-module comprises a MicroBlaze processor unit and a NeRF-SLAM radiating neural field unit which are sequentially connected; the NeRF-SLAM radiation nerve field unit comprises a NeRF-SLAM mapping subunit, an image acquisition subunit and an attitude calculation subunit, wherein the image acquisition subunit and the attitude calculation subunit are respectively connected with the NeRF-SLAM mapping subunit; the MicroBlaze processor unit is respectively connected with the NeRF-SLAM map building subunit and the global map unit; the image acquisition subunit is connected with the monocular camera module of the unmanned aerial vehicle; and the attitude resolving subunit is connected with the IMU inertial navigation module.

The beneficial effects of the above further scheme are: by adopting the idea of isomerism processing, the generalization capability of the NeRF-SLAM radiation neural field is improved, and the NeRF-SLAM radiation neural field cannot fall into a local minimum value.

The invention provides a NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance method, which comprises the following steps:

s1, obtaining an original image through a monocular camera module of an unmanned aerial vehicle;

s2, modeling the original image into a continuous 5D radiation field by using a NeRF-SLAM radiation nerve field unit to obtain a mapping result;

s3, transmitting the mapping result to a global map unit through a MicroBlaze processor unit to obtain a global map;

s4, obtaining a path planning result by utilizing a laser radar SLAM unit and a laser radar module according to the global map;

and S5, transmitting the path planning result to an unmanned aerial vehicle flight control module by using a flight path planning unit, and realizing the visual obstacle avoidance of the unmanned aerial vehicle.

The invention has the beneficial effects that: the structure of the Hessian matrix is used for optimizing the NeRF-SLAM model, meanwhile, the MLP multilayer perceptron in the classical NeRF network is replaced by the SLFNN (single hidden layer feedforward neural network), an ELM (extreme learning machine) model is conveniently constructed, solution is accelerated, and reliability of autonomous flight of the unmanned aerial vehicle is improved.

Further, the step S2 is specifically as follows:

s201, according to the original image, utilizing a NeRF-SLAM to radiate an image acquisition subunit of a neural field unit to obtain a continuous frame RGB image;

s202, optimizing a NeRF-SLAM model by utilizing the structure of a Hessian matrix according to the continuous frame RGB images to obtain an improved NeRF-SLAM model;

s203, obtaining a mapping result according to the improved NeRF-SLAM model.

The beneficial effects of the above further scheme are: the structure of the Hessian matrix is used for optimizing the NeRF-SLAM model, so that the radiation field parameters can be optimized, and the camera posture can be refined.

Further, the improved NeRF-SLAM model in step S202 includes a first single hidden layer feedforward neural network SLFNN1, a second single hidden layer feedforward neural network SLFNN2, a calculation and position coding module, a volume rendering module, and a GRU convolution module;

the calculation and position coding module is used for acquiring a scene position and an observation direction;

the first single hidden layer feedforward neural network SLFNN1 is used for obtaining voxel density and feature vectors according to the scene position;

the second single hidden layer feedforward neural network SLFNN2 is used for obtaining colors according to the observation direction and the characteristic vector;

the volume rendering module is used for obtaining voxel output according to the voxel density and the color;

and the GRU convolution module is used for acquiring the output of the GRU convolution module.

The beneficial effects of the above further scheme are: the MLP multilayer perceptron in the classical NeRF network is replaced by an SLFNN (single hidden layer feedforward neural network), so that an ELM (extreme learning machine) model is conveniently constructed and the solution is accelerated.

Further, the step S202 specifically includes the following steps:

s2021, obtaining dense optical flow, optical flow weight and GRU convolution module output through a GRU convolution module according to the continuous frame RGB image;

s2022, obtaining a camera matrix by using Schur Schur complement of the depth arrowhead-shaped sparse Hessian matrix according to the dense optical flow and the optical flow weight;

s2023, obtaining a first posture according to Cholesky Gerrisby decomposition of the camera matrix;

s2024, obtaining depth according to the first posture;

s2025, carrying out block division on the Hessian matrix to obtain a block division result; the expression of the block segmentation result is:

wherein ,

is a Hessian matrix; c is a block camera matrix;Ea Hessian matrix that is non-diagonal to the camera or depth;

transpose of the Hessian matrix for camera/depth non-diagonal;Pa diagonal matrix corresponding to the depth of each pixel of the key frame;

incremental updates on lie algebra for camera poses in SE (3) group;

an incremental update to the inverse depth per pixel;

is the attitude residual error;wis the depth residual error;bis a residual vector;

s2026, obtaining a depth marginal covariance and an attitude marginal covariance according to the block segmentation result; the expressions of the depth margin covariance and the attitude margin covariance are as follows:

wherein ,

is the depth margin covariance;

the inverse of the diagonal matrix corresponding to the inverse depth of each key frame pixel;

the inverse transpose of the diagonal matrix corresponding to the inverse depth of each key frame pixel;

is attitude margin covariance;Lcholesky factorization for the lower triangular matrix;

transpose of Cholesky factorization of lower triangular matrix;

s2027, obtaining an improved NeRF-SLAM model according to the first posture, the depth, the posture marginal covariance, the depth marginal covariance and the continuous frame RGB image.

The beneficial effects of the above further scheme are: and optimizing the NeRF-SLAM model by using the structure of the Hessian matrix to obtain an improved NeRF-SLAM model, and preparing for obtaining a drawing result subsequently.

Further, the step S203 specifically includes the following steps:

s2031, acquiring the posture of the camera through an IMU inertial navigation module, and obtaining a second posture by using a posture resolving subunit of a NeRF-SLAM radiation nerve field unit;

s2032, according to the first posture, the depth, the posture marginal covariance, the depth marginal covariance and the second posture, high-frequency coding is carried out through a calculation and position coding module to obtain a scene position and an observation direction; the expression of the high-frequency coding is as follows:

wherein ,

is highA function of frequency encoding;pis a position vector;

is a sine function;

is a cosine function;

s2033, obtaining voxel density and feature vectors by using a first single hidden layer feedforward neural network SLFNN1 according to the scene position;

s2034, obtaining color by using a second single hidden layer feedforward neural network SLFNN2 according to the observation direction and the characteristic vector;

s2035, obtaining voxel output through a volume rendering module according to the voxel density and color; the voxel output expression is:

wherein ,

outputting for the voxel;

is a ray from

TotCumulative transmission along the ray, i.e. ray

TotProbability of not encountering any particle;

as a function of voxel density;

is a function of color;

the corresponding light ray;dis a direction vector;

is a proximal boundary;

is a distal boundary;tthe distance from the sampling point on the camera light to the optical center of the camera;sis the position on the light;

s2036, combining the voxel output and the GRU convolution module output, and obtaining a mapping result by utilizing a NeRF-SLAM mapping subunit of the NeRF-SLAM radiation nerve field unit through local mapping, obstacle area judgment and grid map occupation.

The beneficial effects of the above further scheme are: and obtaining a mapping result according to the improved NeRF-SLAM model, and preparing for flight path planning.

Further, the loss function of the mapping result is:

wherein ,

a loss function for the charting result;Tis in a first posture;

is a neural parameter;

is a color loss function;Ithe color image to be rendered;

is a rendered color image;

is a hyper-parameter for balancing depth and color supervision;

is a depth loss function;Dis depth;

is uncertainty margin covariance;

is the rendered depth.

The beneficial effects of the above further scheme are: parameters are continuously adjusted according to the loss function of the mapping result, and the reliability of autonomous flight of the unmanned aerial vehicle is improved.

Drawings

FIG. 1 is a system block diagram of the present invention.

FIG. 2 is a flow chart of the method of the present invention.

Fig. 3 is a schematic structural diagram of a basic frame of the SLFNN hardware unit design in embodiment 3 of the present invention.

Fig. 4 is a flowchart of training the first single hidden layer feedforward neural network or the second single hidden layer feedforward neural network in embodiment 3 of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined by the appended claims, and all changes that can be made by the invention using the inventive concept are intended to be protected.

Example 1

As shown in fig. 1, a heterogeneous unmanned aerial vehicle visual obstacle avoidance system based on a NeRF neural network comprises a core module, and a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module which are respectively connected with the core module;

the IMU inertial navigation module is used for acquiring the camera attitude;

The core module comprises an APU application processor submodule and an FPGA editable logic submodule connected with the APU application processor submodule;

the APU application processor sub-module comprises a laser radar SLAM unit, a global map unit and a flight path planning unit which are sequentially connected; the laser radar SLAM unit is connected with the laser radar module; the global map unit is connected with the FPGA editable logic sub-module; the flight path planning unit is connected with the unmanned aerial vehicle flight control module;

the FPGA editable logic sub-module comprises a MicroBlaze processor unit and a NeRF-SLAM radiation nerve field unit which are sequentially connected; the NeRF-SLAM radiation nerve field unit comprises a NeRF-SLAM mapping subunit, an image acquisition subunit and an attitude calculation subunit which are respectively connected with the NeRF-SLAM mapping subunit; the MicroBlaze processor unit is respectively connected with the NeRF-SLAM map building subunit and the global map unit; the image acquisition subunit is connected with the monocular camera module of the unmanned aerial vehicle; and the attitude resolving subunit is connected with the IMU inertial navigation module.

In this embodiment, the core module adopts a development board with a model of Zynq uidrale + MPSoCXCZU5 EV.

The working principle of the invention is as follows: according to the invention, an unmanned aerial vehicle monocular camera module is used for collecting images, an image collecting subunit is arranged in an FPGA editable logic submodule of a core module, the collected images are transmitted to a NeRF-SLAM mapping subunit, and the NeRF-SLAM mapping subunit is combined with an IMU inertial navigation module and an attitude resolving subunit, so that a scene is modeled into a continuous 5D radiation field, and SLAM mapping can be implicitly carried out through a three-dimensional scene stored in a neural network. The map building result is uploaded to an APU application processor submodule through a MicroBlaze processor unit to update a global map of an obstacle avoidance system, planning of a flight path is achieved by combining a laser radar module and a laser radar SLAM unit, the path planning result is sent to an unmanned aerial vehicle flight control system to control the unmanned aerial vehicle, and visual obstacle avoidance of the unmanned aerial vehicle is achieved.

Example 2

As shown in fig. 2, the invention provides a NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance method, which includes the following steps:

The step S2 is specifically as follows:

s203, obtaining a mapping result according to the improved NeRF-SLAM model.

The improved NeRF-SLAM model in the step S202 comprises a first single hidden layer feedforward neural network SLFNN1, a second single hidden layer feedforward neural network SLFNN2, a calculation and position coding module, a volume rendering module and a GRU convolution module;

The step S202 is specifically as follows:

s2024, obtaining depth according to the first posture;

wherein ,

incremental updates on lie algebra for camera poses in SE (3) group;

an incremental update to the inverse depth per pixel;

wherein ,

is the depth margin covariance;

transpose of Cholesky factorization of lower triangular matrix;

The step S203 is specifically as follows:

wherein ,

as a function of the high frequency encoding;pis a position vector;

is a sine function;

is a cosine function;

s2035, obtaining voxel output through a volume rendering module according to the voxel density and color; the expression of the voxel output is:

wherein ,

outputting for the voxel;

is a ray from

TotCumulative transmission along the ray, i.e. ray

TotProbability of not encountering any particle;

as a function of voxel density;

is a function of color;

corresponding light rays;dis a direction vector;

is a proximal boundary;

s2036, combining the voxel output and the GRU convolution module output, and obtaining a mapping result by using a NeRF-SLAM mapping subunit of a NeRF-SLAM radiation nerve field unit through local mapping, obstacle region judgment and grid map occupation.

The loss function of the mapping result is:

wherein ,

a loss function for the mapping result;Tis in a first posture;

is a neural parameter;

is a color loss function;Ithe color image to be rendered;

is a rendered color image;

is a hyper-parameter for balancing depth and color supervision;

is a depth loss function;Dis depth;

is uncertainty margin covariance;

is the rendered depth.

Example 3

As shown in figure 1, the heterogeneous unmanned aerial vehicle visual obstacle avoidance system based on the NeRF neural network mainly comprises a core module, a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module. The core module mainly comprises an APU application program processor submodule and an FPGA editable logic submodule; the core module adopts the model to be Zynq UItrascale + MPSoCXCZU5 EV's development board. The APU application processor sub-module is mainly composed of a laser radar SLAM unit, a global map unit and a flight path planning unit. The FPGA editable logic sub-module mainly comprises a MicroBlaze processor unit and a NeRF-SLAM radiation nerve field unit; the NeRF-SLAM radiation nerve field unit comprises a NeRF-SLAM mapping subunit, an image acquisition subunit and an attitude calculation subunit. The method adopts a heterogeneous processing thought, wherein an FPGA editable logic sub-module replaces an MLP multi-layer perceptron in the NeRF-SLAM model, an ELM solves a first single-hidden-layer feedforward neural network SLFNN1 and a second single-hidden-layer feedforward neural network SLFNN2 by adopting a heterogeneous FPGA editable logic sub-module resource acceleration limit learning machine, improves the generalization capability of a nerve radiation field of the NeRF-SLAM model, cannot fall into a local minimum value, and completes drawing construction through a heterogeneous APU application program processor sub-module. In addition, the monocular camera module of the unmanned aerial vehicle is used as an unmanned aerial vehicle visual obstacle avoidance SLAM mapping sensor, monocular depth estimation is carried out by utilizing an intensive light stream estimation algorithm, the weight and the size of the unmanned aerial vehicle are reduced, the cost is reduced, and the reliability of autonomous flight of the unmanned aerial vehicle is improved.

In this embodiment, the MicroBlaze processor unit adopted by the improved NeRF-SLAM model is based on RISC (reduced instruction set processor), which has the advantages of high speed, reduced design cost, and improved reliability.

The FPGA editable logic sub-module mainly comprises a MicroBlaze processor unit and a NeRF-SLAM radiation nerve field unit. The MicroBlaze processor unit is mainly used for managing the NeRF-SLAM radiation nerve field unit, finishing communication with a general purpose input/output (GPIO) interface and a high-speed peripheral I/O interface and processing a clock; the functions comprise (1) system control, (2) data preprocessing, (3) ELM training and (4) map building control.

Further, the MicroBlaze processor unit is connected with a BRAM through an LMB local memory bus, and the storage neuron coefficient can be flexibly changed. Furthermore, considering that implementation of ELM requires a large amount of on-chip memory resources, the use of BRAM is also advantageous to avoid delays caused by off-chip accesses.

Further, the MicroBlaze processor unit is connected to an EMC external storage controller via an XCL cache link to store data into a cache, such as an SD card.

Further, the MicroBlaze processor unit accesses a general purpose I/O interface (GPIO), a high speed peripheral I/O interface and a clock through a PLB peripheral local bus.

Further, after the MicroBlaze processor unit completes data processing work through the FPU floating point processor, the data processing result is transmitted to the PS through the AXI bus.

Furthermore, the NeRF-SLAM radiation nerve field unit acquires original images through an external unmanned aerial vehicle monocular camera module, and continuous frame image input is obtained through an image acquisition subunitI(ii) a Starting with a series of images img 1 to img n, the modified NeRF-SLAM model first computes the dense optical flow between pairs of frames, using the GRU convolution module, given the correlation between pairs of frames and a guess of the current dense optical flow, computes a new dense optical flow, and a weight for each optical flow measurement.

Further, using these streams and weights as metric values, the improved NeRF-SLAM model employed by the present invention solves a dense Beam Adjustment (BA) problem where the 3D geometry is parameterized as a set of inverse depth maps for each keyframe. Parameterization of this structure results in a very efficient approach to solving the dense BA problem, which can be achieved by linearizing the equation set into a familiar camera or depth arrow-like block sparse Hessian matrix

Is expressed as a linear least squares problem, whereincAndpis the dimension of the camera and the point.

Further, to solve the linear least squares problem, the Schur complement of the Hessian matrix is used to compute the simplified camera matrix, since it is not depth dependent and the dimensions are much smaller.

Further, the problem of the camera pose is solved through Cholesky factorization, and the pose is solved through front replacement and back replacementT. Given these posesTWe can solve for the sample depth. Further, given a gestureTAnd depthDThe modified NeRF-SLAM model proposes to compute induced optical flow and provide it again to the GRU convolution module as an initial guess.

Further, the improved NeRF-SLAM model needs to be optimized by using a structure of a Hessian matrix, and block division is performed as follows:

；

；

wherein ,His a Hessian matrix, and the Hessian matrix,bis a residual error that is a function of the error,Cis a matrix of block cameras that is,Pis a diagonal matrix corresponding to the depth of each pixel of the key frame. By using

Incremental updates on lie algebra representing camera pose in SE (3) group, and

is an incremental update to the inverse depth per pixel.EIs a Hessian matrix of cameras or depths that are non-diagonal,

andwcorresponding to pose and depth residuals.

Further, from block segmentation of the Hessian matrix, the dense depth can be efficiently calculated

And posture

Marginal covariance of (2):

wherein ,

is the inverse of the diagonal matrix corresponding to the inverse depth of each key frame pixel,

is the inverse transpose of the diagonal matrix corresponding to the inverse depth of each key frame pixel.EIs a Hessian block matrix of cameras or depth non-diagonals,

is the transpose of the block matrix of the camera/depth non-diagonal Hessian,Lis the Cholesky factorization of the lower triangular matrix,

is the transpose of the Cholesky factorization of the lower triangular matrix,

representing an inverse matrix.

Further, given all the information computed by the tracking module-pose, depth, their respective marginal covariances, and the input RGB image-the radiation field parameters can be optimized and the camera pose refined at the same time.

Furthermore, the MLP multilayer perceptron in the classical NeRF network is replaced by a first single-hidden-layer feedforward neural network SLFNN1 and a second single-hidden-layer feedforward neural network SLFNN2 which are realized based on FPGA editable logic sub-modules, so that an ELM (extreme learning machine) model can be conveniently constructed and the solution can be accelerated.

Further, the IMU inertial navigation module acquires the camera attitude, a second attitude Q is obtained through an attitude resolving subunit, and the second attitude Q is sent to the calculation and position coding module, and the depth covariance generated in the front is sent to the calculation and position coding module

Combined depthDAttitude covariance

Unite the first postureTAnd are also connected to the calculation and position coding module. Calculating scene position (x, y, z) and viewing direction of points in a scene

Wherein the scene position (x, y, z) is encodedThen sending the scene position into a first single hidden layer feedforward neural network SLFNN1, after passing through (1) a random neuron, (2) an activation function lookup table and (3) an output neuron, outputting the voxel density through a maximum output selection module of the first single hidden layer feedforward neural network SLFNN1 and outputting the voxel density

And 256-dimensional feature vectors, and then the 256-dimensional feature vectors obtained above are associated with the viewing direction

After Concat combination, processing by a second single hidden layer feedforward neural network SLFNN2, passing through (1) a random neuron, (2) an activation function lookup table and (3) an output neuron, and then outputting color RGB through a maximum output selection module of the second single hidden layer feedforward neural network SLFNN2.

Further, the MicroBlaze processor unit trains a first single hidden layer feedforward neural network SLFNN1 and a second single hidden layer feedforward neural network SLFNN2 through an FPGA editable logic sub-module fast unidirectional link FSL. The first single hidden layer feedforward neural network SLFNN1 and the second single hidden layer feedforward neural network SLFNN2 are composed of (1) random neurons, (2) an activation function lookup table, (3) output neurons and a maximum output selection module.

Further, in the calculation and position coding module, improving the NeRF-SLAM model requires high-frequency details of the scene to reconstruct a high-definition scene. The high-frequency coding function adopted in the coding process is as follows:

wherein ,pin the form of a position vector, the position vector,

the representation is taken as a sine wave,

the representation is taken as the cosine of the,πtake 3.1415926.

The first single hidden layer feedforward neural network SLFNN1 and the second single hidden layer feedforward neural network SLFNN2 are common processing methods combined with FPGA editable logic sub-modules such as parallel and assembly lines, a basic frame of hardware unit design can be obtained, as shown in FIG. 3, in the execution process of the ELM algorithm, a series of numerical values, such as random weights, need to be generated

And bias

And solving for the weights

. M +1 input data realize specific digital coding by random weight from the input module, are input into the neuron module through the multiplexer, and output calculation results outwards in time. The input module and the multiplexer adopt parallel processing, and the neuron module and the output module adopt serial processing.

Further, the voxel density

And the color RGB, i.e. c = (r, g, b), is connected to the volume rendering module, and the rendered voxel output is obtained by the volume rendering module:

wherein ,

representing voxel output, function

Represents rays from

TotCumulative transmission along the ray, i.e. ray from

TotDo not touchThe probability of any particle being present,

as a function of voxel density;

is a function of color;

the corresponding light ray;dis a direction vector.

In the HLS high-level synthesis adopted in the actual FPGA editable logic sub-module, the method is realized in a discretization mode:

wherein ,

；

；

wherein

Is output for the discretized voxels,

is a discretized transmission.

In this embodiment, HLS high-level synthesis refers to a process of automatically converting a logic structure described in a high-level language into a circuit model described in a low-abstraction-level language.

Further, the calculation result of the voxel calculation and the output of the former GRU convolution module are Concat and are output to a NeRF-SLAM mapping subunit for mapping. And after the NeRF-SLAM mapping subunit performs three steps of local mapping, obstacle area judgment and grid map occupation, the mapping result is output to the MicroBlaze processor unit through the fast unidirectional link FSL of the FPGA editable logic submodule.

Further, considering uncertainty perception loss, the mapping loss function is expressed as:

given hyper-parameter

Depth of balance, we are right to attitudeTAnd neural parameters

All minimize and color monitor (will)

Set to 1.0).

Further, the depth loss function may be expressed as:

wherein

Is the depth of rendering, andDdense depth and uncertainty estimated for the tracking module. We render depth as the expected ray termination distance.

Further, the depth of each pixel is evaluated by sampling the three-dimensional location along the ray of the pixel, evaluating the sampleiDensity of (2)

And alpha synthesizing the calculated result density, similar to standard volume rendering; the expression for pixel depth is:

according to the sampleiAlong the depth of the ray or rays,

is the distance between successive samples and is used as an input variable for pixel depthAmount of the compound (A). Wherein

Is the bulk density of the sampleiIs generated by evaluating an MLP on the three-dimensional world coordinates.

Further, the air conditioner is provided with a fan,

is along the light to the sampleiIs defined as:

further, the color loss function is defined as, in the original NeRF:

wherein

Is a rendered color image, synthesized similar to a depth image, by using volume rendering. Each color of each pixel is also calculated by sampling the ray along the pixel and alpha synthesizing the resulting density and color:

, wherein

Is also that

And a transmittance of

Is a color estimated by an ELM (extreme learning machine). For a given sampleiWhile estimating the density

And color

。

Further, the training process of the first single hidden layer feedforward neural network SLFNN1 and the second single hidden layer feedforward neural network SLFNN2 adopted in the improved NeRF-SLAM model is shown in FIG. 4. The method mainly comprises the steps of designing an external software unit and an internal hardware unit, and firstly, starting a training process in the first step; secondly, inputting preprocessing data; thirdly, processing the training labels; fourthly, entering a hardware unit design and distributing random weight; fifthly, calculating hidden layer output; sixthly, calculating the generalized inverse of the matrix; seventhly, calculating network output; eighthly, outputting data accuracy; and step nine, judging whether the training is needed again, if so, re-executing the steps four, five, six, seven and eight, and if not, ending the training process.

In conclusion, the invention provides that the unmanned aerial vehicle acquires images through the monocular camera module of the unmanned aerial vehicle in the autonomous flight process, the FPGA editable logic sub-module of the heterogeneous processor core module is internally provided with the image acquisition sub-unit, and the acquired images are transmitted to the NeRF-SLAM image establishing sub-unit; the neural network-based SLAM mapping method based on the IMU inertial navigation module is characterized by combining the IMU inertial navigation module and the attitude calculation subunit, the NeRF-SLAM mapping subunit models a scene into a continuous 5D radiation field, and the SLAM mapping can be implicitly performed through a three-dimensional scene stored in a neural network. The map building result is uploaded to an APU application processor submodule through a MicroBlaze processor unit based on RISC (reduced instruction set processor) to update a global map of an obstacle avoidance system, a flight path is planned by combining a laser radar module and a laser radar SLAM unit, and the path planning result is sent to an unmanned aerial vehicle flight control module to control the unmanned aerial vehicle.

Claims

1. A heterogeneous unmanned aerial vehicle visual obstacle avoidance system based on a NeRF neural network is characterized by comprising a core module, a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module, wherein the laser radar module, the unmanned aerial vehicle flight control module, the unmanned aerial vehicle monocular camera module and the IMU inertial navigation module are respectively connected with the core module;

the IMU inertial navigation module is used for acquiring the camera attitude;

the laser radar module is used for combining the core module according to the global map to obtain a path planning result;

2. The NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system according to claim 1, wherein the core module comprises an APU application processor submodule and an FPGA editable logic submodule connected with the APU application processor submodule;

3. The obstacle avoidance method of the NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system, as claimed in any one of claims 1-2, wherein the NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance method comprises the following steps:

4. The obstacle avoidance method of the NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system, as claimed in claim 3, wherein the step S2 is as follows:

s203, obtaining a mapping result according to the improved NeRF-SLAM model.

5. The obstacle avoidance method of the NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system, as claimed in claim 4, wherein the improved NeRF-SLAM model in step S202 comprises a first single hidden layer feed-forward neural network SLFNN1, a second single hidden layer feed-forward neural network SLFNN2, a calculation and position coding module, a volume rendering module and a GRU convolution module;

6. The obstacle avoidance method of the NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system, as claimed in claim 5, wherein the step S202 is as follows:

s2024, obtaining depth according to the first posture;

wherein ,

incremental updates on lie algebra for camera poses in SE (3) group;

an incremental update to the inverse depth per pixel;

wherein ,

is the depth margin covariance;

cholesky Cholesy for lower triangular matricesTransposing a base decomposition factor;

7. The obstacle avoidance method of the NeRF neural network-based heterogeneous unmanned aerial vehicle visual obstacle avoidance system according to claim 6, wherein the step S203 is specifically as follows:

wherein ,

as a function of the high frequency encoding;pis a position vector;

is a sine function;

is a cosine function;

s2034, obtaining the color by utilizing a second single hidden layer feedforward neural network SLFNN2 according to the observation direction and the characteristic vector;

wherein ,

outputting for the voxel;

is a ray from

TotCumulative transmission along the ray, i.e. ray

TotProbability of not encountering any particle;

as a function of voxel density;

is a function of color;

the corresponding light ray;dis a direction vector;

is a proximal boundary;

8. The obstacle avoidance method of the heterogeneous unmanned aerial vehicle visual obstacle avoidance system based on the NeRF neural network, as recited in claim 7, wherein the loss function of the mapping result is: