CN111536970A - Infrared inertial integrated navigation method for low-visibility large-scale scene - Google Patents

Infrared inertial integrated navigation method for low-visibility large-scale scene Download PDF

Info

Publication number
CN111536970A
CN111536970A CN202010381178.0A CN202010381178A CN111536970A CN 111536970 A CN111536970 A CN 111536970A CN 202010381178 A CN202010381178 A CN 202010381178A CN 111536970 A CN111536970 A CN 111536970A
Authority
CN
China
Prior art keywords
image
error
inertial
semantic
infrared
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010381178.0A
Other languages
Chinese (zh)
Other versions
CN111536970B (en
Inventor
张磊
刘硕
文鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202010381178.0A priority Critical patent/CN111536970B/en
Publication of CN111536970A publication Critical patent/CN111536970A/en
Application granted granted Critical
Publication of CN111536970B publication Critical patent/CN111536970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

Aiming at the requirements of simultaneous positioning and map construction under complex meteorological conditions and large-scale scenes, the invention provides an infrared inertial integrated navigation method for low-visibility large-scale scenes. Firstly, a short-wave infrared camera is adopted to adapt to scene detection under low-visibility meteorological conditions. Secondly, an image super-resolution reconstruction technology is adopted, and the problem of low resolution of the infrared image is solved. And aiming at the problems that the distance between frame baselines is short and positioning cannot be carried out in a large-scale scene by adopting a short-wave infrared image frame grouping local optimization method. And finally, designing visual/inertial integrated navigation to obtain a motion state with high update rate and high precision. The invention has the characteristics of strong robustness, low cost and high precision.

Description

Infrared inertial integrated navigation method for low-visibility large-scale scene
Technical Field
The invention belongs to the field of computer vision navigation, and particularly relates to an infrared inertial integrated navigation method for a low-visibility large-scale scene.
Background
In recent years, with the emergence of new technologies and new concepts such as big data technology, high-performance computing technology, deep learning technology and the like, artificial intelligence technology has attracted attention and has developed, and unmanned aerial vehicles have become the core elements of the construction of future unmanned intelligent strategic systems due to the advantages of high maneuverability, high flexibility, strong terrain environment adaptability and the like.
At present, the positioning and navigation of the unmanned vehicle mainly comprises various methods such as inertial navigation, global satellite navigation, radio navigation positioning and dead reckoning, visual navigation and the like. The inertial navigation has high precision, but has large volume, heavy weight and high price, thereby limiting the application range of the inertial navigation; the global satellite navigation is restricted by satellite signals, has poor autonomy and instantaneity and is easy to lose efficacy in scenes such as the interior of a building; radio navigation requires a ground station to complete accurate positioning, and is also limited by a large scene. The visual navigation system has the characteristics of autonomy, accuracy, visibility, intelligence and the like, has low dependence on external perception sources, and has good universality on various types of complex scenes, so the visual navigation system is favored by the field of unmanned system autonomous navigation. Meanwhile, with the continuous progress of the visual sensor technology, the machine vision technology and the artificial intelligence technology, the visual navigation technology is rapidly brought to be a research hotspot in the field.
The existing visual navigation technology mainly aims at the navigation requirements in the environments such as indoor, block, road and the like, such as the navigation positioning of a multi-rotor unmanned aerial vehicle or an unmanned vehicle, and a simultaneous positioning and map construction method aiming at an unmanned carrier in a low-visibility and large-scale scene, such as the visual positioning and map construction of a helicopter in a wide-area low-altitude flight process, does not exist. The existing monocular vision navigation technology mainly based on a visible light camera is limited by the detection capability of a sensor and cannot work normally under low-visibility meteorological conditions. The infrared imaging can detect the information of wave bands except visible light, is not influenced by day and night and weather, and has strong penetrating power. However, infrared images inherently have the disadvantages of low resolution, non-uniformity, etc., and need to be compensated for by algorithms. Meanwhile, the mainstream SLAM technology is developed based on a microminiature multi-rotor unmanned aerial vehicle, and is limited in principle, so that the method is only suitable for visual positioning and map construction of a small-scale space. In order to satisfy visual navigation in large-scale scenes, improvement is needed in principle.
At present, the georgia institute of technology proposes a multi-scale visual odometer, which down-samples an image and changes the scale of an input image on the basis of a Direct Sparse Odometer (DSO). However, in the down-sampling process, the resolution of the image is reduced, and the visual positioning accuracy is directly affected. Therefore, in the face of the autonomous navigation demand of the unmanned vehicle in low visibility and large-scale scenes, further research needs to be carried out.
Disclosure of Invention
In order to solve the problems, the invention provides an infrared inertial integrated navigation method for a low-visibility large-scale scene, which adopts the technical scheme that:
the infrared inertial integrated navigation method for the low-visibility large-scale scene comprises the following steps of:
acquiring short-wave infrared images of a target scene and carrier inertial data;
2, preprocessing the short wave infrared image acquired in the step 1 to improve the resolution of the short wave infrared image;
3, increasing the distance between image frames of the image preprocessed in the step 2 by adopting an image frame grouping local optimization method;
and 4, fusing the image with the increased inter-frame distance obtained in the step 3 with the inertial data of the carrier obtained in the step 1, estimating the motion state of the carrier and navigating.
Preferably, step 2 is specifically:
2.1, correcting the non-uniformity of the short wave infrared image;
and 2.2, performing super-resolution reconstruction on the corrected short-wave infrared image.
Wherein the preferable scheme of the step 2.1 is specifically as follows:
and correcting the non-uniformity of the short wave infrared image by adopting a calibration-based multipoint correction method, a scene-based Kalman filtering method, a scene-based time domain high-pass filtering method or a scene-based deep learning method.
Wherein the preferable scheme of the step 2.2 is specifically as follows:
based on prior information acquired by a deep learning neural network, taking the relationship between deep learning and traditional sparse coding as a basis, dividing a convolutional neural network into three stages of image block extraction and representation, nonlinear mapping and high-resolution image reconstruction, and then unifying the three stages into a deep convolutional neural network frame to realize end-to-end learning from a low-resolution image to a high-resolution image so as to acquire corresponding high-resolution characteristics and provide a basis for realizing sub-pixel level matching operation.
The image block extraction and representation of the convolutional neural network in the step 2.2 are specifically as follows:
extracting features from the initial short wave infrared image to obtain a feature map F of the image1(Y),
Is shown as F1(Y)=max(0,W1*Y+B1);
Where Y represents the initial high resolution image, x represents the convolution operation, W1As a convolution kernel, B1Is a neuron bias vector;
the nonlinear mapping of the convolutional neural network in step 2.2 is specifically:
the non-linear mapping transforms the feature vectors from a low-resolution space to a high-resolution space, the process being represented as
F2(Y)=max(0,W2*F1(Y)+B2),
W2As a convolution kernel, B2Is a neuron bias vector;
the high-resolution image reconstruction of the convolutional neural network in the step 2.2 specifically comprises the following steps:
generating a final high-resolution image by using the previously obtained output characteristic diagram based on the blocks, wherein the training process of the whole network is the estimation and optimization process of parameters in the network, and the optimal solution process of obtaining the network parameters by minimizing the error between the reconstructed image F (Y) and the real high-resolution image is expressed as
F(Y)=W3*F2(Y)+B3
Preferably, step 3 is specifically:
3.1, under the geometric constraint, luminosity constraint and semantic constraint between two frames of images, establishing a reprojection error constraint term in a sliding window, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term;
3.2, defining the sum of a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term as an error sum function;
3.3 dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, and calculating the error and the function of each group by a Gauss-Newton iteration method to obtain a solution when the error and the function are minimum;
on the premise of ensuring accurate triangle measurement, the difference between the object distance and the base line is not more than 1 order of magnitude; dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, wherein the baseline distance between every two key frames is obviously greater than that between every two adjacent key frames; optimizing each group, and then jointly optimizing each group;
within each group, the error sum function of ORB feature points within the sliding window is defined as
Figure BDA0002482043500000041
Wherein, ηR、ηPAnd ηsWeighting coefficients of a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term are respectively determined by analyzing image semantic information under the current scene;
calculating the error sum function (5) of each group by a Gauss-Newton iteration method to obtain a solution when the minimum value is obtained;
and 3.4, defining the sum of the error sum functions of all groups as a visual objective function, and calculating the solution when the visual objective function obtains the minimum value by a Gaussian Newton iteration method:
the visual objective function is defined as
Figure BDA0002482043500000051
After each group of key frames is optimized, each group is optimized simultaneously, namely, a solution when a visual target function (6) obtains a minimum value is calculated through a Gauss-Newton iteration method;
estimating the camera motion by using the information of a plurality of key frames in the sliding window is equivalent to finding a solution when the formula (7) is found to obtain the minimum value, so that the estimation of the camera motion state is converted into a nonlinear optimization problem, as shown in the formula (8)
Figure BDA0002482043500000052
When the Gaussian Newton iteration method is adopted to solve the problem, the following matrix needs to be calculated
Figure BDA0002482043500000058
Wherein the content of the first and second substances,
Figure BDA0002482043500000053
in order to be a weight matrix, the weight matrix,
Figure BDA0002482043500000054
in the form of a residual vector, the vector,
Figure BDA0002482043500000055
is the Jacobian matrix.
Preferably, the re-projection error constraint term, the pixel luminosity error constraint term and the semantic error constraint term in the neighborhood established in the step 3.1 are specifically:
(a) reprojection error
The reprojection error refers to the observed point (x) in the imagej,yj)TMap points corresponding thereto
Figure BDA0002482043500000056
The position deviation of the back projection in the image is defined as
Figure BDA0002482043500000057
Wherein T isiIndicating the position, T, of the i-frame image in the world coordinate systemjRepresenting the bits of the j-frame image in the world coordinate systemK represents a camera internal parameter calibration matrix, pi represents the conversion from homogeneous coordinates to Cartesian coordinates, and d represents the estimated depth of the key point;
(b) pixel luminance error in the neighborhood
Any point p ∈ Ω on image frame iiThe photometric error on the other frame image j is defined as
Figure BDA0002482043500000061
Wherein N ispIs the set of pixels in the neighborhood around the p-point, IiAnd IjIs two adjacent frames of images, tiAnd tjIs the exposure time of two adjacent frames of images, ai、bi、ajAnd bjIs to correct affine illumination transformation coefficient, gamma represents Huber norm, omegapIs a gradient dependent weight, p' is the projection onto the image frame IjA point on;
(c) semantic error
Semantic error represents map point with semantic c
Figure BDA0002482043500000062
Probability of pixel point semantic being c in corresponding image
Figure BDA0002482043500000063
Observing the semantic meaning of an image SkWith camera attitude TkAnd three-dimensional point piAssociating, the pixel point semanteme should be re-projected to the image S with the pixel pointkTwo-dimensional pixel coordinates of
Figure BDA0002482043500000064
The distance to the nearest region labeled as semantic c is inversely proportional, and the map point is set
Figure BDA0002482043500000065
Is defined as
Figure BDA0002482043500000066
Wherein the content of the first and second substances,
Figure BDA0002482043500000067
defined as a distance transform function on a binary image, sigma representing the uncertainty of the semantic image segmentation, and a probability vector passing through the corresponding map point piIs calculated from all observations of (a), if by the camera TiObserve that F is the set of keyframes to be optimized, then
Figure BDA0002482043500000068
Preferably, step 4 is specifically:
fusing the image with the increased inter-frame distance with inertial data, and estimating the motion state of the carrier for navigation; performing combined navigation on the visual measurement and the inertial measurement, and simultaneously calculating a visual target function and a solution when the inertial error sum function obtains a minimum value in a sliding window; suppose that the time instants of two key frames correspond to two states s respectivelyiAnd sjThe inertia measurement value of the inertia measurement unit is ai,jAnd ωi,jCovariance matrix between them of
Figure BDA0002482043500000071
The state is predicted to
Figure BDA0002482043500000072
The inertial measurement error is then:
Figure BDA0002482043500000073
the objective function of the visual inertia joint optimization is defined as
Etotal=Evision+Einertial(10)
And calculating a solution when the objective function (10) of the visual-inertial joint optimization is minimum through a nonlinear optimization method, thereby estimating the motion state of the carrier for navigation.
The invention has the beneficial effects that:
1. aiming at the problem that a visible light camera cannot normally detect a target scene under low-visibility meteorological conditions, the infrared inertial integrated navigation method for the low-visibility large-scale scene provided by the invention adopts a short-wave infrared camera to detect, and can clearly detect the details of the target scene under the low-visibility meteorological conditions such as fog, haze, rain, snow, dust and the like;
2. aiming at the problem of low resolution of short-wave infrared images, the infrared inertial integrated navigation method for the low-visibility large-scale scene adopts an image super-resolution reconstruction technology based on deep learning to improve the resolution of the short-wave infrared images;
3. aiming at the problem that the baseline distance is short and the method cannot be applied to a large-scale scene in the current mainstream SLAM algorithm, the infrared inertial integrated navigation method for the low-visibility large-scale scene provided by the invention adopts a short-wave infrared image frame grouping local optimization method to increase the inter-frame distance;
4. aiming at the problems that the data update rate of a visual sensor is low and the estimation of the motion state of a high maneuvering carrier cannot be met, the infrared inertial integrated navigation method for the low-visibility large-scale scene adopts a visual/inertial integrated navigation strategy to fuse visual measurement and inertial data so as to obtain the motion state with high update rate and high precision;
5. the infrared inertial integrated navigation method for the low-visibility large-scale scene provided by the invention can realize simultaneous positioning and map construction (SLAM) in the low-visibility large-scale scene without depending on the assistance of a Global Navigation Satellite System (GNSS) and prior geographic information, and has the navigation capability of high precision, strong robustness, autonomy and expandability.
Drawings
FIG. 1 is a flow chart of an infrared inertial integrated navigation method for a low visibility large scale scene according to the present invention;
fig. 2 is a schematic diagram of sliding window packet optimization in the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides an infrared inertial integrated navigation method for a low-visibility large-scale scene, a flow diagram of which is shown in figure 1 and can be implemented mainly by the following four steps:
step 1, detecting a target scene by adopting a short-wave infrared camera on a carrier to obtain a short-wave infrared image, and obtaining inertial data by adopting an inertial measurement unit on the carrier.
Because infrared imaging can observe the information of wave band beyond the visible light, and do not receive day and night and weather effect, have stronger ability of penetrating through smoke and dust and haze, consequently choose for use shortwave infrared camera to survey the target scene.
And 2, carrying out image preprocessing on the short wave infrared image to improve the resolution of the short wave infrared image.
Although the short-wave infrared camera can detect under a low visibility condition, the resolution of a short-wave infrared image is low, and the visual positioning precision is affected, so that image preprocessing is required, and the method specifically comprises the following two steps:
and 2.1, correcting the non-uniformity of the short wave infrared image.
As a core component of an Infrared imaging system, an Infrared Focal plane array (IRFPA) is formed by arranging and combining a plurality of detectors with single pixels, and under the irradiation of a uniform background, the irradiance received by each detection unit in the Focal plane array is the same, but the obtained response is different, and the imaging shows image interference which is basically unchanged along with time, namely the non-uniformity of the Infrared imaging. The non-uniformity not only seriously affects the quality of infrared imaging, but also interferes with the accuracy of a visual odometer, so that the improvement of the non-uniformity of the infrared imaging is necessary.
And correcting the non-uniformity of the short wave infrared image by adopting an algorithm with stronger instantaneity, such as a calibration-based multipoint correction method, a scene-based Kalman filtering method, a scene-based time domain high-pass filtering method or a scene-based deep learning method.
And 2.2, performing super-resolution reconstruction on the corrected short-wave infrared image.
In the image feature extraction and learning process, based on prior information acquired by the deep learning neural network, super-resolution reconstruction can be performed on the processed image and the key frame through the convolutional neural network. In the invention, a high-resolution reconstruction layer is added on the basis of a feature extraction layer and a nonlinear mapping layer in a convolutional neural network, and a key frame in an SLAM is operated to realize high-resolution reconstruction so as to obtain the feature representation of an image under the condition of high resolution.
In the super-resolution feature extraction process, the relation between deep learning and traditional sparse coding is taken as a basis, the convolutional neural network is divided into three stages of image block extraction and representation, nonlinear mapping and image reconstruction, and the three stages are unified into a deep convolutional neural network frame, so that end-to-end learning from a low-resolution image to a high-resolution image is realized, corresponding high-resolution features are obtained, and a basis is provided for realizing sub-pixel level matching operation.
In the above framework, the feature extraction is to extract blocks from the initial short wave infrared image to obtain a feature map F of the image1(Y) may be represented by F1(Y)=max(0,W1*Y+B1) Where Y represents the initial high resolution image, x represents the convolution operation, W1As a convolution kernel, B1Are neuron bias vectors. And processing the feature map obtained by convolution by a ReLU activation function. The non-linear mapping transforms the feature vectors from a low-resolution space to a high-resolution space, a process which may be denoted as F2(Y)=max(0,W2*F1(Y)+B2) Likewise, W2As a convolution kernel, B2Are neuron bias vectors. High resolution image reconstruction utilizes a previously derived block-based output feature map to generate a final high resolution image, a process which may be denoted as f (y) ═ W3*F2(Y)+B3. The training process of the whole network is the estimation and optimization process of the parameters in the network. By minimizing the difference between the reconstructed image F (Y) and the true high resolution imageThe error yields the optimal solution for the network parameters.
And 3, increasing the distance between image frames of the preprocessed image by adopting an image frame grouping local optimization method.
The method is implemented according to the following steps:
and 3.1, under the geometric constraint, luminosity constraint and semantic constraint between two frames of images, establishing a reprojection error constraint term in a sliding window, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term.
(a) Reprojection error
The reprojection error refers to the observed point (x) in the imagej,yj)TMap points corresponding thereto
Figure BDA0002482043500000102
The position deviation of the back projection in the image is defined as
Figure BDA0002482043500000101
Wherein T isiIndicating the position, T, of the i-frame image in the world coordinate systemjExpressing the position of the j frame image in a world coordinate system, K expressing a camera internal parameter calibration matrix, pi expressing the conversion from homogeneous coordinates to Cartesian coordinates, and d expressing the estimated depth of the key point;
(b) pixel luminance error in the neighborhood
Any point p ∈ Ω on image frame iiThe photometric error on the other frame image j is defined as
Figure BDA0002482043500000111
Wherein N ispIs the set of pixels in the neighborhood around the p-point, IiAnd IjIs two adjacent frames of images, tiAnd tjIs the exposure time of two adjacent frames of images, ai、bi、ajAnd bjIs to correct affine illumination transformation coefficient, gamma represents Huber norm, omegapIs a gradient-related weight, p'Is projected to an image frame IjA point on;
(c) semantic error
Semantic error represents map point with semantic c
Figure BDA0002482043500000112
Probability of pixel point semantic being c in corresponding image
Figure BDA0002482043500000113
To observe the image S semanticallykWith camera attitude TkAnd three-dimensional point piAssociated, the pixel point semantics should be re-projected to the image S with the pixel pointkTwo-dimensional pixel coordinates of
Figure BDA0002482043500000114
The distance to the nearest region labeled as semantic c is inversely proportional, and the map point is set
Figure BDA0002482043500000115
Is defined as
Figure BDA0002482043500000116
Wherein the content of the first and second substances,
Figure BDA0002482043500000117
defined as a distance transform function on a binary image, sigma representing the uncertainty of the semantic image segmentation, and a probability vector passing through the corresponding map point piIs calculated from all observations of (a), if by the camera TiObserve that F is the set of keyframes to be optimized, then
Figure BDA0002482043500000118
And 3.2, defining the sum of the reprojection error constraint term, the pixel luminosity error constraint term in the neighborhood and the semantic error constraint term as an error sum function.
And 3.3, dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, and calculating the error and the function of each group by a Gauss-Newton iteration method to obtain the solution when the error and the function of each group obtain the minimum value.
On the premise of ensuring accurate triangle measurement, the difference between the object distance and the base line is not more than 1 order of magnitude. Because the length of the base line between the key frames is short, the traditional visual odometer is only suitable for small-scale spaces such as indoor spaces and streets, and in order to meet the requirement of accurate visual positioning of a carrier in a large-scale space, the length of the base line between the key frames needs to be increased. For this purpose, a method of image frame grouping local optimization is adopted, that is, the key frames in the sliding window are divided into a plurality of groups according to fixed intervals, and the baseline distance between every two key frames is obviously greater than that between two adjacent key frames, as shown in fig. 2. Meanwhile, in order to ensure the state estimation precision after filtering, each group is optimized firstly, and then all the groups are optimized together, so that the visual positioning in a large-scale space is met, and the state estimation precision is also ensured.
Within each group, the error sum function of ORB feature points within the sliding window is defined as
Figure BDA0002482043500000121
Wherein, ηR、ηPAnd ηsThe weight coefficients are respectively a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term, and the coefficients are determined by analyzing the image semantic information under the current scene.
And calculating the error of each group and solving the minimum value of the function (5) by a Gauss-Newton iteration method.
And 3.4, defining the sum of the error sum functions of all groups as a visual objective function, and calculating the solution of the visual objective function when the minimum value is obtained by a Gauss-Newton iteration method.
The visual objective function is defined as
Figure BDA0002482043500000122
After the optimization of each group of key frames is completed, each group is optimized simultaneously, namely, the solution when the visual objective function (6) is calculated by a Gauss-Newton iteration method to obtain the minimum value is obtained.
Estimating the camera motion by using the information of a plurality of key frames in the sliding window is equivalent to finding a solution when the formula (7) is found to obtain the minimum value, so that the estimation of the camera motion state is converted into a nonlinear optimization problem, as shown in the formula (8)
Figure BDA0002482043500000131
When the Gaussian Newton iteration method is adopted to solve the problem, the following matrix needs to be calculated
Figure BDA0002482043500000139
Wherein the content of the first and second substances,
Figure BDA0002482043500000132
in order to be a weight matrix, the weight matrix,
Figure BDA0002482043500000133
in the form of a residual vector, the vector,
Figure BDA0002482043500000134
is the Jacobian matrix.
And 4, fusing the image with the increased inter-frame distance with inertial data, and estimating the motion state of the carrier for navigation.
In order to obtain high-update-rate and high-precision state estimation, the visual measurement and the inertial measurement are combined for navigation, and a visual objective function and a solution when an inertial error sum function obtains a minimum value are simultaneously calculated in a sliding window. Suppose that the time instants of two key frames correspond to two states s respectivelyiAnd sjThe inertia measurement value of the inertia measurement unit is ai,jAnd ωi,jCovariance matrix between them of
Figure BDA0002482043500000135
The state is predicted to
Figure BDA0002482043500000136
The inertial measurement error is then:
Figure BDA0002482043500000137
the objective function of the visual inertia joint optimization is defined as
Etotal=Evision+Einertial(10)
And calculating a solution when the objective function (10) of the visual inertia joint optimization is minimum through a nonlinear optimization method, such as a Gauss-Newton iteration method, thereby estimating the motion state of the carrier for navigation.
According to the method, an NVIDIA Jetson TX2 embedded computer module is used as a computing platform, the computer is an AI single-module super computer based on an NVIDIA Pascal framework, and technical parameters are detailed in a table 1.
TABLE 1 NVIDIA Jetson TX computer Module hardware resources
Figure BDA0002482043500000138
Figure BDA0002482043500000141
And performing algorithm test and verification on a ground experiment platform by adopting four authoritative open data sets and one self-owned data set. Wherein, the ground experiment platform is a NVIDIA Jetson TX2 development kit. Selecting a corresponding test data set aiming at a typical flight environment of the micro unmanned aerial vehicle, wherein the test verification items comprise: navigation accuracy, computational complexity, illumination change robustness, scene change robustness, and the like, as detailed in table 2.
TABLE 2 data set correspondence test item
Data set Test environment Characteristics of the test
KITTI data set Indoor and city block Dynamic scenes
EuRoC data set Indoor room, industrial area Dynamic scenes
TUM data set Indoor corridor and jungle Indoor circular path
Oxford data set City block (building and road) Outdoor circular path
ICL-NUIM dataset Indoor room Indoor circular path

Claims (8)

1. An infrared inertial integrated navigation method for a low-visibility large-scale scene is characterized by comprising the following steps:
acquiring short-wave infrared images of a target scene and carrier inertial data;
2, preprocessing the short wave infrared image acquired in the step 1 to improve the resolution of the short wave infrared image;
3, increasing the distance between image frames of the image preprocessed in the step 2 by adopting an image frame grouping local optimization method;
and 4, fusing the image with the increased inter-frame distance obtained in the step 3 with the inertial data of the carrier obtained in the step 1, estimating the motion state of the carrier and navigating.
2. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 1, characterized in that:
the step 2 specifically comprises the following steps:
2.1, correcting the non-uniformity of the short wave infrared image;
and 2.2, performing super-resolution reconstruction on the corrected short-wave infrared image.
3. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 2, characterized in that:
the step 2.1 is specifically as follows:
and correcting the non-uniformity of the short wave infrared image by adopting a calibration-based multipoint correction method, a scene-based Kalman filtering method, a scene-based time domain high-pass filtering method or a scene-based deep learning method.
4. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 2, characterized in that:
the step 2.2 is specifically as follows:
based on prior information acquired by a deep learning neural network, taking the relationship between deep learning and traditional sparse coding as a basis, dividing a convolutional neural network into three stages of image block extraction and representation, nonlinear mapping and high-resolution image reconstruction, and then unifying the three stages into a deep convolutional neural network frame to realize end-to-end learning from a low-resolution image to a high-resolution image so as to acquire corresponding high-resolution characteristics and provide a basis for realizing sub-pixel level matching operation.
5. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 4, characterized in that:
the image block extraction and representation of the convolutional neural network in the step 2.2 are specifically as follows:
extracting features from the initial short wave infrared image to obtain a feature map F of the image1(Y),
Is shown as F1(Y)=max(0,W1*Y+B1);
Where Y represents the initial high resolution image, x represents the convolution operation, W1As a convolution kernel, B1Is a neuron bias vector;
the nonlinear mapping of the convolutional neural network in step 2.2 is specifically:
the non-linear mapping transforms the feature vectors from a low-resolution space to a high-resolution space, the process being represented as
F2(Y)=max(0,W2*F1(Y)+B2),
W2As a convolution kernel, B2Is a neuron bias vector;
the high-resolution image reconstruction of the convolutional neural network in the step 2.2 specifically comprises the following steps:
generating a final high-resolution image by using the previously obtained output characteristic diagram based on the blocks, wherein the training process of the whole network is the estimation and optimization process of parameters in the network, and the optimal solution process of obtaining the network parameters by minimizing the error between the reconstructed image F (Y) and the real high-resolution image is expressed as
F(Y)=W3*F2(Y)+B3
6. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 1, characterized in that:
the step 3 specifically comprises the following steps:
3.1, under the geometric constraint, luminosity constraint and semantic constraint between two frames of images, establishing a reprojection error constraint term in a sliding window, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term;
3.2, defining the sum of a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term as an error sum function;
3.3 dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, and calculating the error and the function of each group by a Gauss-Newton iteration method to obtain a solution when the error and the function are minimum;
on the premise of ensuring accurate triangle measurement, the difference between the object distance and the base line is not more than 1 order of magnitude; dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, wherein the baseline distance between every two key frames is obviously greater than that between every two adjacent key frames; optimizing each group, and then jointly optimizing each group;
within each group, the error sum function of ORB feature points within the sliding window is defined as
Figure FDA0002482043490000031
Wherein, ηR、ηPAnd ηsWeighting coefficients of a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term are respectively determined by analyzing image semantic information under the current scene;
calculating the error sum function (5) of each group by a Gauss-Newton iteration method to obtain a solution when the minimum value is obtained;
3.4, defining the sum of the error sum functions of each group as a visual objective function, and calculating the solution when the visual objective function obtains the minimum value through a Gauss-Newton iteration method;
the visual objective function is defined as
Figure FDA0002482043490000032
After each group of key frames is optimized, each group is optimized simultaneously, namely, a solution when a visual target function (6) obtains a minimum value is calculated through a Gauss-Newton iteration method;
estimating the camera motion by using the information of a plurality of key frames in the sliding window is equivalent to finding a solution when the formula (7) is found to obtain the minimum value, so that the estimation of the camera motion state is converted into a nonlinear optimization problem, as shown in the formula (8)
Figure FDA0002482043490000041
When the Gaussian Newton iteration method is adopted to solve the problem, the following matrix needs to be calculated
Figure FDA0002482043490000042
Wherein the content of the first and second substances,
Figure FDA0002482043490000043
in order to be a weight matrix, the weight matrix,
Figure FDA0002482043490000044
in the form of a residual vector, the vector,
Figure FDA0002482043490000045
is the Jacobian matrix.
7. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 6, characterized in that:
the re-projection error constraint term, the pixel luminosity error constraint term and the semantic error constraint term in the neighborhood established in the step 3.1 are specifically as follows:
(a) reprojection error
The reprojection error refers to the observed point (x) in the imagej,yj)TMap points corresponding thereto
Figure FDA0002482043490000046
The position deviation of the back projection in the image is defined as
Figure FDA0002482043490000047
Wherein T isiIndicating the position, T, of the i-frame image in the world coordinate systemjExpressing the position of the j frame image in a world coordinate system, K expressing a camera internal parameter calibration matrix, pi expressing the conversion from homogeneous coordinates to Cartesian coordinates, and d expressing the estimated depth of the key point;
(b) pixel luminance error in the neighborhood
Any point p ∈ Ω on image frame iiThe photometric error on the other frame image j is defined as
Figure FDA0002482043490000051
Wherein N ispIs the set of pixels in the neighborhood around the p-point, IiAnd IjIs two adjacent frames of images, tiAnd tjIs the exposure time of two adjacent frames of images, ai、bi、ajAnd bjIs to correct affine illumination transformation coefficient, gamma represents Huber norm, omegapIs a gradient dependent weight, p' is the projection onto the image frame IjA point on;
(c) semantic error
Semantic error represents map point with semantic c
Figure FDA0002482043490000052
Probability of pixel point semantic being c in corresponding image
Figure FDA0002482043490000053
Observing the semantic meaning of an image SkWith camera attitude TkAnd three-dimensional point piAssociating, the pixel point semanteme should be re-projected to the image S with the pixel pointkTwo-dimensional pixel coordinates of
Figure FDA0002482043490000054
The distance to the nearest region labeled as semantic c is inversely proportionalPointing the map
Figure FDA0002482043490000055
Is defined as
Figure FDA0002482043490000056
Wherein the content of the first and second substances,
Figure FDA0002482043490000057
defined as a distance transform function on a binary image, sigma representing the uncertainty of the semantic image segmentation, and a probability vector passing through the corresponding map point piIs calculated from all observations of (a), if by the camera TiObserve that F is the set of keyframes to be optimized, then
Figure FDA0002482043490000058
8. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 1, characterized in that:
the step 4 specifically comprises the following steps:
fusing the image with the increased inter-frame distance with inertial data, and estimating the motion state of the carrier for navigation; performing combined navigation on the visual measurement and the inertial measurement, and simultaneously calculating a visual target function and a solution when the inertial error sum function obtains a minimum value in a sliding window; suppose that the time instants of two key frames correspond to two states s respectivelyiAnd sjThe inertia measurement value of the inertia measurement unit is ai,jAnd ωi,jCovariance matrix between them of
Figure FDA0002482043490000061
The state is predicted to
Figure FDA0002482043490000062
The inertial measurement error is
Figure FDA0002482043490000063
The objective function of the visual inertia joint optimization is defined as
Etotal=Evision+Einertial(10)
And calculating a solution when the objective function (10) of the visual-inertial joint optimization is minimum through a nonlinear optimization method, thereby estimating the motion state of the carrier for navigation.
CN202010381178.0A 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene Active CN111536970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010381178.0A CN111536970B (en) 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010381178.0A CN111536970B (en) 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene

Publications (2)

Publication Number Publication Date
CN111536970A true CN111536970A (en) 2020-08-14
CN111536970B CN111536970B (en) 2022-01-25

Family

ID=71973440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010381178.0A Active CN111536970B (en) 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene

Country Status (1)

Country Link
CN (1) CN111536970B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489119A (en) * 2020-12-21 2021-03-12 北京航空航天大学 Monocular vision positioning method for enhancing reliability
CN112659128A (en) * 2020-12-28 2021-04-16 智动时代(北京)科技有限公司 Robot brain and human-computer cooperative control brain parallel cooperative control method
CN113239936A (en) * 2021-04-26 2021-08-10 大连理工大学 Unmanned aerial vehicle visual navigation method based on deep learning and feature point extraction
CN117455762A (en) * 2023-12-25 2024-01-26 盯盯拍(深圳)技术股份有限公司 Method and system for improving resolution of recorded picture based on panoramic automobile data recorder

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060293854A1 (en) * 2005-06-23 2006-12-28 Raytheon Company System and method for geo-registration with global positioning and inertial navigation
CN105487557A (en) * 2015-12-07 2016-04-13 浙江大学 Unmanned aerial vehicle autonomous landing guidance system based on solar-blind region ultraviolet imaging
CN109341700A (en) * 2018-12-04 2019-02-15 中国航空工业集团公司西安航空计算技术研究所 Fixed wing aircraft vision assists landing navigation method under a kind of low visibility
CN109341724A (en) * 2018-12-04 2019-02-15 中国航空工业集团公司西安航空计算技术研究所 A kind of Airborne Camera-Inertial Measurement Unit relative pose online calibration method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060293854A1 (en) * 2005-06-23 2006-12-28 Raytheon Company System and method for geo-registration with global positioning and inertial navigation
CN105487557A (en) * 2015-12-07 2016-04-13 浙江大学 Unmanned aerial vehicle autonomous landing guidance system based on solar-blind region ultraviolet imaging
CN109341700A (en) * 2018-12-04 2019-02-15 中国航空工业集团公司西安航空计算技术研究所 Fixed wing aircraft vision assists landing navigation method under a kind of low visibility
CN109341724A (en) * 2018-12-04 2019-02-15 中国航空工业集团公司西安航空计算技术研究所 A kind of Airborne Camera-Inertial Measurement Unit relative pose online calibration method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUOFENG ZHANG 等,: ""Efficient Non-Consecutive Feature Tracking for Robust Structure-From-Motion"", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
LEI ZHANG 等,: ""Infrared-Inertial Navigation for Commercial Aircraft Precision Landing in Low Visibility and GPS-Denied Environments"", 《SENSORS》 *
王民 等,: ""基于优化卷积神经网络的图像超分辨率重建"", 《激光与光电子学进展》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489119A (en) * 2020-12-21 2021-03-12 北京航空航天大学 Monocular vision positioning method for enhancing reliability
CN112489119B (en) * 2020-12-21 2023-01-31 北京航空航天大学 Monocular vision positioning method for enhancing reliability
CN112659128A (en) * 2020-12-28 2021-04-16 智动时代(北京)科技有限公司 Robot brain and human-computer cooperative control brain parallel cooperative control method
CN113239936A (en) * 2021-04-26 2021-08-10 大连理工大学 Unmanned aerial vehicle visual navigation method based on deep learning and feature point extraction
CN113239936B (en) * 2021-04-26 2024-05-28 大连理工大学 Unmanned aerial vehicle visual navigation method based on deep learning and feature point extraction
CN117455762A (en) * 2023-12-25 2024-01-26 盯盯拍(深圳)技术股份有限公司 Method and system for improving resolution of recorded picture based on panoramic automobile data recorder
CN117455762B (en) * 2023-12-25 2024-04-12 盯盯拍(深圳)技术股份有限公司 Method and system for improving resolution of recorded picture based on panoramic automobile data recorder

Also Published As

Publication number Publication date
CN111536970B (en) 2022-01-25

Similar Documents

Publication Publication Date Title
CN111536970B (en) Infrared inertial integrated navigation method for low-visibility large-scale scene
CN111862126B (en) Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm
CN108665496B (en) End-to-end semantic instant positioning and mapping method based on deep learning
CN107564061B (en) Binocular vision mileage calculation method based on image gradient joint optimization
CN108242079B (en) VSLAM method based on multi-feature visual odometer and graph optimization model
CN111693047B (en) Visual navigation method for micro unmanned aerial vehicle in high-dynamic scene
CN106097304B (en) A kind of unmanned plane real-time online ground drawing generating method
CN110009674B (en) Monocular image depth of field real-time calculation method based on unsupervised depth learning
US11430087B2 (en) Using maps comprising covariances in multi-resolution voxels
CN111340939B (en) Indoor three-dimensional semantic map construction method
CN114972748B (en) Infrared semantic segmentation method capable of explaining edge attention and gray scale quantization network
CN115900710A (en) Dynamic environment navigation method based on visual information
CN111860651B (en) Monocular vision-based semi-dense map construction method for mobile robot
CN109443354B (en) Visual-inertial tight coupling combined navigation method based on firefly group optimized PF
Cattaneo et al. Cmrnet++: Map and camera agnostic monocular visual localization in lidar maps
Hu et al. An indoor positioning framework based on panoramic visual odometry for visually impaired people
CN112016478A (en) Complex scene identification method and system based on multispectral image fusion
CN111812978B (en) Cooperative SLAM method and system for multiple unmanned aerial vehicles
Alcantarilla et al. Large-scale dense 3D reconstruction from stereo imagery
Liu et al. D-lc-nets: Robust denoising and loop closing networks for lidar slam in complicated circumstances with noisy point clouds
Jin et al. Beyond learning: Back to geometric essence of visual odometry via fusion-based paradigm
Li et al. Vehicle object detection based on rgb-camera and radar sensor fusion
Tao et al. Automated processing of mobile mapping image sequences
CN112927294B (en) Satellite orbit and attitude determination method based on single sensor
CN116188550A (en) Self-supervision depth vision odometer based on geometric constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant