CN115826628B

CN115826628B - Heterogeneous unmanned aerial vehicle vision obstacle avoidance system and method based on NeRF neural network

Info

Publication number: CN115826628B
Application number: CN202310147930.9A
Authority: CN
Inventors: 周仁建; 王聪; 姚慧敏; 张继花
Original assignee: Chengdu Aeronautic Polytechnic
Current assignee: Chengdu Aeronautic Polytechnic
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-05-09
Anticipated expiration: 2043-02-22
Also published as: CN115826628A

Abstract

The invention discloses a heterogeneous unmanned aerial vehicle vision obstacle avoidance system and method based on a NeRF neural network, and belongs to the technical field of unmanned aerial vehicle vision obstacle avoidance. The invention solves the problem that the SLAM vision obstacle avoidance is unstable by adopting the traditional RGB-D depth camera under the condition of direct sunlight of the unmanned aerial vehicle, and improves the reliability of autonomous flight of the unmanned aerial vehicle.

Description

Heterogeneous unmanned aerial vehicle vision obstacle avoidance system and method based on NeRF neural network

Technical Field

The invention belongs to the technical field of unmanned aerial vehicle vision obstacle avoidance, and particularly relates to a heterogeneous unmanned aerial vehicle vision obstacle avoidance system and method based on a NeRF neural network.

Background

NeRF (neural radiation field) is one of the current important research fields, and the problem to be solved is how to generate a new view angle diagram given some photographed diagrams. Different from the traditional three-dimensional reconstruction method, which is used for expressing a scene as explicit expressions such as point cloud, grid, voxel and the like, the method develops a new way, models the scene as a continuous 5D radiation field which is implicitly stored in a neural network, only needs to input sparse multi-angle images with phase for training to obtain a neural radiation field model, and clear pictures under any visual angles can be rendered according to the model.

Fusing deep learning with traditional geometry is a trend in SLAM development. In previous technical applications, some single-point modules in SLAM were replaced by neural networks, such as feature extraction, feature matching, loop-back, and depth estimation. Compared with single-point substitution, neRF-based SLAM method based on NeRF is a brand new framework, and can substitute traditional SLAM end to end, both in design method and in realization architecture.

In the autonomous flight process of the unmanned aerial vehicle, flexible avoidance of obstacles is required, and a method for performing SLAM map reconstruction based on a laser radar auxiliary RGB-D depth camera or a binocular camera is generally adopted.

While many three-dimensional reconstruction solutions are based on RGB-D or lidar sensors, scene reconstruction from monocular images provides a more convenient solution. RGBD cameras may fail under certain conditions, such as direct sunlight, and lidar is still heavier than monocular RGB cameras. In addition, the stereoscopic camera simplifies the depth estimation problem to a one-dimensional parallax search, but relies on accurate calibration of the camera that is prone to incorrect calibration in actual operation.

Disclosure of Invention

Aiming at the defects in the prior art, the heterogeneous unmanned aerial vehicle vision obstacle avoidance system and method based on the NeRF neural network provided by the invention solve the problem that the unmanned aerial vehicle is unstable in SLAM vision obstacle avoidance by adopting a traditional RGB-D depth camera under the condition of direct sunlight.

In order to achieve the aim of the invention, the invention adopts the following technical scheme: a heterogeneous unmanned aerial vehicle vision obstacle avoidance system based on a NeRF neural network comprises a core module, and a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module which are respectively connected with the core module;

the unmanned aerial vehicle monocular camera module is used for acquiring images to obtain original images;

the IMU inertial navigation module is used for acquiring the camera gesture;

the core module is used for obtaining a global map according to the original image and the camera gesture;

the laser radar module is used for combining the core module according to the global map to obtain a path planning result;

and the unmanned aerial vehicle flight control module is used for controlling the unmanned aerial vehicle according to the path planning result to realize the obstacle avoidance of the unmanned aerial vehicle.

The beneficial effects of the invention are as follows: the invention collects images through the unmanned plane monocular camera module, an image collection subunit is arranged in the FPGA editable logic subunit of the core module, the collected images are transmitted to the NeRF-SLAM image building subunit, the NeRF-SLAM image building subunit models a scene into a continuous 5D radiation field by combining with the IMU inertial navigation module and the gesture resolving subunit, and SLAM image building can be performed implicitly through a three-dimensional scene stored in a neural network. Uploading a graph construction result to an APU application program processor submodule through a MicroBlaze processor unit to update a global map of the obstacle avoidance system, and combining a laser radar module and a laser radar SLAM unit to realize the planning of a flight path, wherein the path planning result is sent to an unmanned aerial vehicle flight control system to control the unmanned aerial vehicle, so that the unmanned aerial vehicle vision obstacle avoidance is realized; the weight and the volume of the unmanned aerial vehicle are reduced by using the monocular camera module of the unmanned aerial vehicle; meanwhile, by adopting the concept of heterogeneous treatment, the generalization capability of the NeRF-SLAM radiation nerve field is improved, and the local minimum value is not trapped.

Further, the core module comprises an APU application processor sub-module and an FPGA editable logic sub-module connected with the APU application processor sub-module;

the APU application program processor submodule comprises a laser radar SLAM unit, a global map unit and a flight path planning unit which are sequentially connected; the laser radar SLAM unit is connected with the laser radar module; the global map unit is connected with the FPGA editable logic submodule; the flight path planning unit is connected with the unmanned aerial vehicle flight control module;

the FPGA editable logic submodule comprises a MicroBlaze processor unit and a NeRF-SLAM radiation nerve field unit which are connected in sequence; the NeRF-SLAM radiation nerve field unit comprises a NeRF-SLAM image construction subunit, an image acquisition subunit and an attitude solution subunit which are respectively connected with the NeRF-SLAM image construction subunit; the MicroBlaze processor unit is respectively connected with the NeRF-SLAM map building subunit and the global map unit; the image acquisition subunit is connected with the unmanned aerial vehicle monocular camera module; and the gesture solution operator unit is connected with the IMU inertial navigation module.

The beneficial effects of the above-mentioned further scheme are: by adopting the concept of heterogeneous treatment, the generalization capability of the NeRF-SLAM radiation nerve field is improved, and the local minimum value is not trapped.

The invention provides a heterogeneous unmanned aerial vehicle vision obstacle avoidance method based on a NeRF neural network, which comprises the following steps:

s1, obtaining an original image through an unmanned plane monocular camera module;

s2, modeling the original image into a continuous 5D radiation field by utilizing a NeRF-SLAM radiation nerve field unit to obtain a mapping result;

s3, transmitting the map building result to a global map unit through a MicroBlaze processor unit to obtain a global map;

s4, obtaining a path planning result by utilizing a laser radar SLAM unit and a laser radar module according to the global map;

s5, transmitting the path planning result to an unmanned aerial vehicle flight control module by using a flight path planning unit, and realizing unmanned aerial vehicle vision obstacle avoidance.

The beneficial effects of the invention are as follows: the NeRF-SLAM model is optimized by utilizing the structure of the Hessian matrix, and meanwhile, an MLP multi-layer sensor in a classical NeRF network is replaced by an SLFNN (single hidden layer feedforward neural network), so that an ELM (extreme learning machine) model is conveniently constructed and the solution is accelerated, and the reliability of autonomous flight of the unmanned aerial vehicle is improved.

Further, the step S2 specifically includes the following steps:

s201, according to the original image, utilizing an image acquisition subunit of a NeRF-SLAM radiation nerve field unit to obtain a continuous frame RGB image;

s202, optimizing a NeRF-SLAM model by utilizing the structure of a Hessian matrix according to the continuous frame RGB image to obtain an improved NeRF-SLAM model;

s203, obtaining a mapping result according to the improved NeRF-SLAM model.

The beneficial effects of the above-mentioned further scheme are: the structure of the Hessian matrix is utilized to optimize the NeRF-SLAM model, so that radiation field parameters can be optimized, and camera gestures can be refined at the same time.

Further, the improved NeRF-SLAM model in step S202 includes a first single hidden layer feedforward neural network SLFNN1, a second single hidden layer feedforward neural network SLFNN2, a calculation and position encoding module, a volume rendering module, and a GRU convolution module;

the computing and position coding module is used for acquiring the scene position and the observation direction;

the first single hidden layer feedforward neural network SLFNN1 is used for obtaining voxel density and feature vectors according to the scene position;

the second single hidden layer feedforward neural network SLFNN2 is used for obtaining color according to the observation direction and the feature vector;

the volume rendering module is used for obtaining voxel output according to the voxel density and the color;

and the GRU convolution module is used for acquiring the output of the GRU convolution module.

The beneficial effects of the above-mentioned further scheme are: the MLP multi-layer perceptron in the classical NeRF network is replaced by SLFNN (single hidden layer feedforward neural network), so that an ELM (extreme learning machine) model is conveniently constructed and the solution is accelerated.

Further, the step S202 specifically includes the following steps:

s2021, according to the continuous frame RGB image, obtaining dense optical flow, optical flow weight and output of the GRU convolution module through the GRU convolution module;

s2022, obtaining a camera matrix by utilizing Schur Shull complement of a depth arrow-shaped block sparse Hessian matrix according to the dense optical flow and the optical flow weight;

s2023, obtaining a first gesture according to Cholesky decomposition of the camera matrix;

s2024, obtaining depth according to the first gesture;

s2025, performing block division on the Hessian matrix to obtain a block division result; the expression of the block segmentation result is:

wherein ,

is a Hessian matrix; c is a matrix of block cameras;Ea Hessian matrix that is camera or depth off-diagonal; />

Transpose of the Hessian matrix for camera/depth off-diagonal;Pa diagonal matrix corresponding to each pixel depth of the key frame; />

Updating for increments on the lie algebra of camera poses in the SE (3) group; />

Updating for increments of inverse depth per pixel;

is a gesture residual error;wis a depth residual;bis a residual vector;

s2026, obtaining depth marginal covariance and attitude marginal covariance according to the block segmentation result; the expressions of the depth marginal covariance and the gesture marginal covariance are as follows:

wherein ,

is depth marginal covariance; />

Inverting the diagonal matrix corresponding to the inverse depth of each key frame pixel; />

Inverse inversion of the diagonal matrix corresponding to the inverse depth of each key frame pixel; />

Is the marginal covariance of the gesture;Lcholesky decomposition factors for the lower triangular matrix; />

Transpose Cholesky decomposition factors of the lower triangular matrix;

s2027, obtaining an improved NeRF-SLAM model according to the first gesture, the depth, the gesture marginal covariance, the depth marginal covariance and the continuous frame RGB image.

The beneficial effects of the above-mentioned further scheme are: and optimizing the NeRF-SLAM model by using the structure of the Hessian matrix to obtain an improved NeRF-SLAM model, and preparing a subsequent mapping result.

Further, the step S203 specifically includes the following steps:

s2031, acquiring a camera gesture through an IMU inertial navigation module, and obtaining a second gesture by utilizing a gesture solution subunit of a NeRF-SLAM radiation nerve field unit;

s2032, performing high-frequency coding through a calculation and position coding module according to the first gesture, the depth, the gesture marginal covariance, the depth marginal covariance and the second gesture to obtain a scene position and an observation direction; the expression of the high-frequency coding is as follows:

wherein ,

as a function of high frequency encoding;pis a position vector; />

Is a sine function; />

Is a cosine function;

s2033, according to the scene position, obtaining voxel density and feature vectors by using a first single hidden layer feedforward neural network SLFNN 1;

s2034, obtaining color by using a second single hidden layer feedforward neural network SLFNN2 according to the observation direction and the feature vector;

s2035, obtaining voxel output through a volume rendering module according to the voxel density and the color; the expression of the voxel output is:

wherein ,

outputting for the voxels; />

For radiation from->

To the point oftCumulative transmittance along rays, i.e. rays +.>

To the point oftProbability of not encountering any particles; />

Is a voxel density function; />

Is a color function; />

The corresponding light is;dis a direction vector;

is the proximal boundary; />

Is a distal boundary;tsampling the distance from the point to the optical center of the camera on the optical line of the camera;sis the position on the light;

s2036, combining the voxel output and the GRU convolution module output, and obtaining a mapping result by utilizing a NeRF-SLAM mapping subunit of a NeRF-SLAM radiation nerve field unit and judging and occupying a grid map through local mapping and obstacle area.

The beneficial effects of the above-mentioned further scheme are: and obtaining a mapping result according to the improved NeRF-SLAM model, and preparing for flight route planning.

Further, the loss function of the mapping result is:

wherein ,

a loss function for the mapping result;Tis a first gesture; />

Is a neural parameter; />

Is a color loss function;Iis a color image to be rendered; />

Is a rendered color image; />

Is a super parameter for balancing depth and color supervision; />

As a depth loss function;Ddepth; />

Is the uncertainty marginal covariance; />

Is the depth of the rendering.

The beneficial effects of the above-mentioned further scheme are: parameters are continuously adjusted according to the graph construction result loss function, and the reliability of autonomous flight of the unmanned aerial vehicle is improved.

Drawings

Fig. 1 is a system configuration diagram of the present invention.

Fig. 2 is a flow chart of the method of the present invention.

Fig. 3 is a schematic structural diagram of a basic framework of the design of the hardware unit of SLFNN in embodiment 3 of the present invention.

FIG. 4 is a training flowchart of the first single hidden layer feedforward neural network or the second single hidden layer feedforward neural network in embodiment 3 of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.

Example 1

As shown in fig. 1, the heterogeneous unmanned aerial vehicle vision obstacle avoidance system based on the NeRF neural network comprises a core module, and a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module which are respectively connected with the core module;

the IMU inertial navigation module is used for acquiring the camera gesture;

The core module comprises an APU application processor sub-module and an FPGA editable logic sub-module connected with the APU application processor sub-module;

In this embodiment, the core module is a development board with a model of Zynq uitrasaccale+mpsocxcczu 5 EV.

The working principle of the invention is as follows: the invention collects images through the unmanned plane monocular camera module, an image collection subunit is arranged in the FPGA editable logic subunit of the core module, the collected images are transmitted to the NeRF-SLAM image building subunit, the NeRF-SLAM image building subunit models a scene into a continuous 5D radiation field by combining with the IMU inertial navigation module and the gesture resolving subunit, and SLAM image building can be performed implicitly through a three-dimensional scene stored in a neural network. And uploading a map construction result to an APU application program processor submodule to update a global map of the obstacle avoidance system through a MicroBlaze processor unit, and combining a laser radar module and a laser radar SLAM unit to realize the planning of a flight path, and sending a path planning result to a unmanned aerial vehicle flight control system to control the unmanned aerial vehicle so as to realize the visual obstacle avoidance of the unmanned aerial vehicle.

Example 2

As shown in fig. 2, the invention provides a heterogeneous unmanned aerial vehicle vision obstacle avoidance method based on a NeRF neural network, which comprises the following steps:

The step S2 specifically includes the following steps:

s203, obtaining a mapping result according to the improved NeRF-SLAM model.

The improved NeRF-SLAM model in the step S202 comprises a first single hidden layer feedforward neural network SLFNN1, a second single hidden layer feedforward neural network SLFNN2, a calculation and position coding module, a volume rendering module and a GRU convolution module;

The step S202 specifically includes the following steps:

s2024, obtaining depth according to the first gesture;

wherein ,

is Hessian momentAn array; c is a matrix of block cameras;Ea Hessian matrix that is camera or depth off-diagonal; />

Updating for increments of inverse depth per pixel;

is a gesture residual error;wis a depth residual;bis a residual vector;

wherein ,

is depth marginal covariance; />

Transpose Cholesky decomposition factors of the lower triangular matrix;

The step S203 specifically includes the following steps:

wherein ,

as a function of high frequency encoding;pis a position vector; />

Is a sine function; />

Is a cosine function;

wherein ,

outputting for the voxels; />

For radiation from->

To the point oftCumulative transmittance along rays, i.e. rays +.>

To the point oftProbability of not encountering any particles; />

Is a voxel density function; />

Is a color function; />

The corresponding light is;dis a direction vector;

is the proximal boundary; />

Is a distal boundary;tsampling the distance from the point to the optical center of the camera on the optical line of the camera;sis the position on the light; />

The loss function of the mapping result is as follows:

wherein ,

a loss function for the mapping result;Tis a first gesture; />

Is a neural parameter; />

Is a color loss function;Iis a color image to be rendered; />

Is a rendered color image; />

Is a super parameter for balancing depth and color supervision; />

As a depth loss function;Ddepth; />

Is the uncertainty marginal covariance; />

Is the depth of the rendering.

Example 3

As shown in fig. 1, a heterogeneous unmanned aerial vehicle vision obstacle avoidance system based on a NeRF neural network mainly comprises a core module, a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module. The core module mainly comprises an APU application program processor submodule and an FPGA editable logic submodule; the core module adopts a development board with the model of Zynq UItrascale+MPSoCXCZU5 EV. The APU application program processor submodule mainly comprises a laser radar SLAM unit, a global map unit and a flight path planning unit. The FPGA editable logic sub-module mainly comprises a MicroBlaze processor unit and a NeRF-SLAM radiation nerve field unit; the NeRF-SLAM radiation nerve field unit comprises a NeRF-SLAM mapping subunit, an image acquisition subunit and a gesture resolving subunit. The invention adopts the concept of heterogeneous processing, wherein an FPGA editable logic sub-module is used for replacing an MLP multi-layer perceptron in a NeRF-SLAM model, a heterogeneous FPGA editable logic sub-module resource acceleration limit learning machine ELM is used for solving a first single hidden layer feedforward neural network SLFNN1 and a second single hidden layer feedforward neural network SLFNN2, the generalization capability of a nerve radiation field of the NeRF-SLAM model is improved, a local minimum is not trapped, and the image construction is completed through a heterogeneous APU application program processor sub-module. In addition, the unmanned plane monocular camera module is used as an unmanned plane vision obstacle avoidance SLAM mapping sensor, and monocular depth estimation is carried out by using a dense optical flow estimation algorithm, so that the weight and the volume of the unmanned plane are reduced, the cost is reduced, and the reliability of autonomous flight of the unmanned plane is improved.

In this embodiment, the MicroBlaze processor unit adopted by the modified NeRF-SLAM model is RISC (reduced instruction set processor) based, which has advantages of high speed, reduced design cost, and improved reliability.

The FPGA editable logic sub-module mainly comprises a MicroBlaze processor unit and a NeRF-SLAM radiation nerve field unit. The MicroBlaze processor unit is mainly used for managing the NeRF-SLAM radiation nerve field unit, completing communication with a general purpose I/O interface (GPIO) and a high-speed peripheral I/O interface and processing clocks; the functions comprise (1) system control, (2) data preprocessing, (3) ELM training and (4) map building control.

Further, the MicroBlaze processor unit is connected with the BRAM through the LMB local memory bus, so that the storage neuron coefficients can be flexibly changed. Furthermore, the use of BRAM is also advantageous in avoiding delays caused by off-chip accesses, given that the implementation of ELM requires a large amount of on-chip memory resources.

Further, the MicroBlaze processor unit is connected to an EMC external storage controller through an XCL cache link to store data into a cache, such as an SD card.

Further, the MicroBlaze processor unit accesses a general purpose I/O interface (GPIO), a high speed peripheral I/O interface, and a clock via the PLB peripheral local bus.

Further, after the MicroBlaze processor unit completes data processing work through the FPU floating point number processor, data processing results are transmitted to the PS through an AXI bus.

Further, the NeRF-SLAM radiation nerve field unit acquires original images by an external unmanned plane monocular camera module, and continuous frame image input is obtained through an image acquisition subunitIThe method comprises the steps of carrying out a first treatment on the surface of the Starting from a series of images img 1 through img n, the modified NeRF-SLAM model first calculates the dense optical flow between pairs of frames, uses the GRU convolution module to calculate a new dense optical flow given the correlation between pairs of frames and a guess of the current dense optical flow, and the weights for each optical flow measurement.

Further, using these streams and weights as metric values, the improved NeRF-SLAM model employed by the present invention solves a dense Beam Adjustment (BA) problem in which the 3D geometry is parameterized as a set of inverse depth maps for each keyframe. Parameterization of this structure results in a very efficient way to solve the dense BA problem by linearizing the system of equations into a familiar camera or depth arrow block sparse Hessian matrix

Expressed as a linear least squares problem, in whichcAndpis the dimension of the camera and the point.

Further, to solve the linear least squares problem, schur complements of the Hessian matrix are used to calculate a simplified camera matrix, since it is not depth dependent and has much smaller dimensions.

Further, the problem of camera pose is solved by Cholesky decomposition, pose is solved by pre-substitution and post-substitutionT. Given these posesTWe can solve for the sample depth. Furthermore, given a gestureTDepth and depth ofDThe modified NeRF-SLAM model proposes to compute the induced optical flow and provide it again as an initial guess to the GRU convolution module.

Further, the modified NeRF-SLAM model needs to be optimized with the structure of the Hessian matrix, and the block division is performed as follows:

；/>

；

wherein ,His a Hessian matrix of the matrix,bis the residual error that is present in the sample,Cis a matrix of block cameras and,Pis a diagonal matrix corresponding to the depth per pixel of the key frame. By using

Incremental update on lie algebra representing camera pose in SE (3) group, and +.>

Is an incremental update to the inverse depth per pixel.EHessian matrix, which is camera or depth off-diagonal,>

andwcorresponding to pose and depth residuals.

Further, from the block segmentation of the Hessian matrix, dense depth can be efficiently calculated

And posture->

Is a marginal covariance of (b):

wherein ,

is the inverse of the diagonal matrix corresponding to the inverse depth of each keyframe pixel, +.>

Is the inverse transpose of the diagonal matrix corresponding to the inverse depth of each key frame pixel.EIs a Hessian block matrix of camera or depth off-diagonal,>

is a transpose of the block matrix of the camera/depth off-diagonal Hessian,Lcholesky Cholesky factorization factor, which is the lower triangular matrix, ++>

Is the transpose of Cholesky factorization factor of the lower triangular matrix, ++>

Representing taking the inverse matrix.

Further, given all the information computed by the tracking module-pose, depth, their respective marginal covariances, and the input RGB image-radiation field parameters that can be optimized while refining the camera pose.

Furthermore, the MLP multi-layer perceptron in the classical NeRF network is replaced by the first single hidden layer feedforward neural network SLFNN1 and the second single hidden layer feedforward neural network SLFNN2 which are realized based on the FPGA editable logic submodule, so that an ELM (extreme learning machine) model is conveniently constructed and the solution is accelerated.

Further, the IMU inertial navigation module collects the camera gesture, obtains a second gesture Q through a gesture resolving subunit, and sends the second gesture Q to calculation and position codingModule, previously generated depth covariance

Joint depthDPosture covariance->

Associating a first poseTAre also connected to the calculation and position coding module. Calculating scene position (x, y, z) and viewing direction of the scene midpoint>

The scene position (x, y, z) is encoded and then sent into the first single hidden layer feedforward neural network SLFNN1, the scene position is in the first single hidden layer feedforward neural network SLFNN1, and the voxel density is output through the maximum output selection module of the first single hidden layer feedforward neural network SLFNN1 after passing through (1) random neurons, (2) activation function lookup tables and (3) output neurons>

And 256-dimensional feature vectors, and then associating the 256-dimensional feature vectors obtained above with the viewing direction +.>

After Concat merging, processing by a second single hidden layer feedforward neural network SLFNN2, and after (1) random neurons, (2) activation function lookup tables and (3) output neurons, outputting color RGB by a maximum output selection module of the second single hidden layer feedforward neural network SLFNN2.

Further, the MicroBlaze processor unit trains a first single hidden layer feedforward neural network SLFNN1 and a second single hidden layer feedforward neural network SLFNN2 through an FPGA editable logic sub-module fast unidirectional link FSL. The first single hidden layer feedforward neural network SLFNN1 and the second single hidden layer feedforward neural network SLFNN2 are composed of (1) random neurons, (2) an activation function lookup table, (3) output neurons and a maximum output selection module.

Further, in the calculation and position coding module, the improvement of the NeRF-SLAM model requires the reconstruction of a high-definition scene from the high-frequency details of the scene. The high-frequency coding function adopted in the coding process is as follows:

wherein ,pas a vector of the position of the object,

representing sine +.>

The representation takes the form of a cosine,π3.1415926.

Wherein, the first single hidden layer feedforward neural network SLFNN1 and the second single hidden layer feedforward neural network SLFNN2 are commonly used processing methods combining parallel and pipeline FPGA editable logic sub-modules, so as to obtain a hardware unit design basic framework, as shown in fig. 3, in the ELM algorithm execution process, a series of values, such as random weights, need to be generated

Bias->

Solving for weights +.>

. M+1 input data realize specific digital coding by random weight from an input module, and the specific digital coding is input into a neuron module through a multiplexer to timely output a calculation result. The input module and the multiplexer adopt parallel processing, and the neuron module and the output module adopt serial processing.

Further, voxel density

And the color RGB, namely c= (r, g, b), is connected to a volume rendering module, and the volume rendering module obtains the output of the voxel after rendering:

wherein ,

representing voxel output, function->

Representing ray from +.>

To the point oftCumulative transmission along the ray, i.e. ray-from

To the point oftProbability of not touching any particle +.>

Is a voxel density function; />

Is a color function; />

The corresponding light is;dis a direction vector.

In the HLS high-level synthesis adopted in the actual FPGA editable logic sub-module, the implementation is carried out in a discretization mode:

wherein ,

；/>

；

wherein

For discretized voxel output,/->

Is discretized transmittance.

In this embodiment, HLS high-level synthesis refers to a process of automatically converting a logic structure described in a high-level language into a circuit model described in a low-level language.

Further, the calculation result of voxel calculation and the output of the former GRU convolution module are combined and output to a NeRF-SLAM mapping subunit for mapping. And after the NeRF-SLAM mapping subunit performs three steps of local mapping, barrier area judgment and grid map occupation, outputting a mapping result to the MicroBlaze processor unit through the FPGA editable logic submodule fast unidirectional link FSL.

Further, considering uncertainty perceived loss, the mapping loss function is expressed as:

given super parameters

Balancing depth, we vs. poseTAnd nerve parameter->

Are minimized and color supervised (will +.>

Set to 1.0).

Further, the depth loss function may be expressed as:

wherein

Is the depth of the rendering, andDdense depth and uncertainty estimated for the tracking module. We render depth as the expected ray termination distance.

Further, the depth of each pixelBy sampling the three-dimensional position along the rays of the pixel, evaluating the sampleiDensity of (3)

And alpha synthesizes the calculated result density similar to standard volume rendering; the expression for the pixel depth is:

from the sampleiAlong the depth of the ray(s),

is the distance between consecutive samples and serves as an input variable for the pixel depth. Wherein->

Is the bulk density by measuring the density of the sampleiIs generated by evaluating an MLP on three-dimensional world coordinates.

Further, the method comprises the steps of,

along the light to the sampleiIs defined as:

further, the color loss function is defined as in the original NeRF:

wherein

Is a rendered color image, synthesized similarly to a depth image, by using volume rendering. Each color of each pixel is also calculated by sampling along the rays of the pixel and alpha synthesizing the resulting densities and colors: />

, wherein />

Also->

Transmittance of (2) and->

Is the color estimated by ELM (extreme learning machine). For a given sampleiAt the same time, density ∈>

And color->

。

Further, in the present invention, the training flow of the first single hidden layer feedforward neural network SLFNN1 and the second single hidden layer feedforward neural network SLFNN2 used in the NeRF-SLAM model is shown in fig. 4. The method mainly comprises the steps of designing an external software unit and an internal hardware unit, and firstly, starting a training process; step two, inputting preprocessing data; thirdly, processing the training label; fourthly, entering a hardware unit design, and distributing random weights; fifthly, calculating hidden layer output; sixthly, calculating matrix generalized inverse; seventh, calculating network output; eighth step, outputting the data accuracy; and ninth, judging whether retraining is needed, if yes, executing the fourth, fifth, sixth, seventh and eighth steps again, and if no, ending the training flow.

In summary, in the autonomous flight process of the unmanned aerial vehicle, the image is acquired through the monocular camera module of the unmanned aerial vehicle, the image acquisition subunit is arranged in the FPGA editable logic subunit of the heterogeneous processor core module, and the acquired image is transmitted to the NeRF-SLAM image construction subunit; the NeRF-SLAM mapping subunit models the scene into a continuous 5D radiation field by combining the IMU inertial navigation module and the gesture resolving subunit, and SLAM mapping can be performed implicitly through a three-dimensional scene stored in a neural network. The method comprises the steps that a mapping result is uploaded to an APU application program processor submodule through a MicroBlaze processor unit based on RISC (reduced instruction set processor), a global map of an obstacle avoidance system is updated, a laser radar module and a laser radar SLAM unit are combined, so that flight path planning is achieved, and a path planning result is sent to an unmanned aerial vehicle flight control module to control an unmanned aerial vehicle.

Claims

1. The heterogeneous unmanned aerial vehicle vision obstacle avoidance system based on the NeRF neural network is characterized by comprising a core module, and a laser radar module, an unmanned aerial vehicle flight control module, an unmanned aerial vehicle monocular camera module and an IMU inertial navigation module which are respectively connected with the core module;

the IMU inertial navigation module is used for acquiring the camera gesture;

the core module is used for obtaining a global map according to the original image and the camera gesture; the core module comprises an APU application processor sub-module and an FPGA editable logic sub-module connected with the APU application processor sub-module;

the FPGA editable logic submodule comprises a MicroBlaze processor unit and a NeRF-SLAM radiation nerve field unit which are connected in sequence; the NeRF-SLAM radiation nerve field unit comprises a NeRF-SLAM image construction subunit, an image acquisition subunit and an attitude solution subunit which are respectively connected with the NeRF-SLAM image construction subunit; the MicroBlaze processor unit is respectively connected with the NeRF-SLAM map building subunit and the global map unit; the image acquisition subunit is connected with the unmanned aerial vehicle monocular camera module; the gesture solution operator unit is connected with the IMU inertial navigation module;

2. The method for avoiding the obstacle of the heterogeneous unmanned aerial vehicle vision obstacle avoidance system based on the NeRF neural network according to claim 1, wherein the method for avoiding the obstacle of the heterogeneous unmanned aerial vehicle vision based on the NeRF neural network comprises the following steps:

3. The obstacle avoidance method of the heterogeneous unmanned aerial vehicle vision obstacle avoidance system based on the NeRF neural network according to claim 2, wherein the step S2 is specifically as follows:

s203, obtaining a mapping result according to the improved NeRF-SLAM model.

4. The method for avoiding obstacle in heterogeneous unmanned aerial vehicle vision system based on NeRF neural network according to claim 3, wherein the improved NeRF-SLAM model in step S202 comprises a first single hidden layer feedforward neural network SLFNN1, a second single hidden layer feedforward neural network SLFNN2, a calculation and position encoding module, a volume rendering module and a GRU convolution module;

5. The method for avoiding the visual obstacle avoidance system of heterogeneous unmanned aerial vehicle based on NeRF neural network according to claim 4, wherein the step S202 is specifically as follows:

s2024, obtaining depth according to the first gesture;

wherein ,

Updating for increments of inverse depth per pixel; />

Is a gesture residual error;wis a depth residual;bis a residual vector;

wherein ,

is depth marginal covariance; />

Diagonal matrix corresponding to inverse depth for each key frame pixelReversing;

Transpose Cholesky decomposition factors of the lower triangular matrix;

6. The method for avoiding the visual obstacle avoidance system of heterogeneous unmanned aerial vehicle based on NeRF neural network according to claim 5, wherein the step S203 is specifically as follows:

wherein ,

as a function of high frequency encoding;pis a position vector; />

Is positive toA chord function; />

Is a cosine function;

wherein ,

outputting for the voxels; />

For radiation from->

To the point oftCumulative transmittance along rays, i.e. rays +.>

To the point oftProbability of not encountering any particles; />

Is a voxel density function; />

Is a color function; />

The corresponding light is;dis a direction vector; />

Is the proximal boundary; />

7. The obstacle avoidance method of the heterogeneous unmanned aerial vehicle vision obstacle avoidance system based on the NeRF neural network of claim 6, wherein the loss function of the mapping result is: