CN111536970B - Infrared inertial integrated navigation method for low-visibility large-scale scene - Google Patents

Infrared inertial integrated navigation method for low-visibility large-scale scene Download PDF

Info

Publication number
CN111536970B
CN111536970B CN202010381178.0A CN202010381178A CN111536970B CN 111536970 B CN111536970 B CN 111536970B CN 202010381178 A CN202010381178 A CN 202010381178A CN 111536970 B CN111536970 B CN 111536970B
Authority
CN
China
Prior art keywords
image
error
inertial
resolution
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010381178.0A
Other languages
Chinese (zh)
Other versions
CN111536970A (en
Inventor
张磊
刘硕
文鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202010381178.0A priority Critical patent/CN111536970B/en
Publication of CN111536970A publication Critical patent/CN111536970A/en
Application granted granted Critical
Publication of CN111536970B publication Critical patent/CN111536970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/10Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration
    • G01C21/12Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning
    • G01C21/16Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation
    • G01C21/165Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 by using measurements of speed or acceleration executed aboard the object being navigated; Dead reckoning by integrating acceleration or speed, i.e. inertial navigation combined with non-inertial navigation instruments
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

Aiming at the requirements of simultaneous positioning and map construction under complex meteorological conditions and large-scale scenes, the invention provides an infrared inertial integrated navigation method for low-visibility large-scale scenes. Firstly, a short-wave infrared camera is adopted to adapt to scene detection under low-visibility meteorological conditions. Secondly, an image super-resolution reconstruction technology is adopted, and the problem of low resolution of the infrared image is solved. And aiming at the problems that the distance between frame baselines is short and positioning cannot be carried out in a large-scale scene by adopting a short-wave infrared image frame grouping local optimization method. And finally, designing visual/inertial integrated navigation to obtain a motion state with high update rate and high precision. The invention has the characteristics of strong robustness, low cost and high precision.

Description

Infrared inertial integrated navigation method for low-visibility large-scale scene
Technical Field
The invention belongs to the field of computer vision navigation, and particularly relates to an infrared inertial integrated navigation method for a low-visibility large-scale scene.
Background
In recent years, with the emergence of new technologies and new concepts such as big data technology, high-performance computing technology, deep learning technology and the like, artificial intelligence technology has attracted attention and has developed, and unmanned aerial vehicles have become the core elements of the construction of future unmanned intelligent strategic systems due to the advantages of high maneuverability, high flexibility, strong terrain environment adaptability and the like.
At present, the positioning and navigation of the unmanned vehicle mainly comprises various methods such as inertial navigation, global satellite navigation, radio navigation positioning and dead reckoning, visual navigation and the like. The inertial navigation has high precision, but has large volume, heavy weight and high price, thereby limiting the application range of the inertial navigation; the global satellite navigation is restricted by satellite signals, has poor autonomy and instantaneity and is easy to lose efficacy in scenes such as the interior of a building; radio navigation requires a ground station to complete accurate positioning, and is also limited by a large scene. The visual navigation system has the characteristics of autonomy, accuracy, visibility, intelligence and the like, has low dependence on external perception sources, and has good universality on various types of complex scenes, so the visual navigation system is favored by the field of unmanned system autonomous navigation. Meanwhile, with the continuous progress of the visual sensor technology, the machine vision technology and the artificial intelligence technology, the visual navigation technology is rapidly brought to be a research hotspot in the field.
The existing visual navigation technology mainly aims at the navigation requirements in the environments such as indoor, block, road and the like, such as the navigation positioning of a multi-rotor unmanned aerial vehicle or an unmanned vehicle, and a simultaneous positioning and map construction method aiming at an unmanned carrier in a low-visibility and large-scale scene, such as the visual positioning and map construction of a helicopter in a wide-area low-altitude flight process, does not exist. The existing monocular vision navigation technology mainly based on a visible light camera is limited by the detection capability of a sensor and cannot work normally under low-visibility meteorological conditions. The infrared imaging can detect the information of wave bands except visible light, is not influenced by day and night and weather, and has strong penetrating power. However, infrared images inherently have the disadvantages of low resolution, non-uniformity, etc., and need to be compensated for by algorithms. Meanwhile, the mainstream SLAM technology is developed based on a microminiature multi-rotor unmanned aerial vehicle, and is limited in principle, so that the method is only suitable for visual positioning and map construction of a small-scale space. In order to satisfy visual navigation in large-scale scenes, improvement is needed in principle.
At present, the georgia institute of technology proposes a multi-scale visual odometer, which down-samples an image and changes the scale of an input image on the basis of a Direct Sparse Odometer (DSO). However, in the down-sampling process, the resolution of the image is reduced, and the visual positioning accuracy is directly affected. Therefore, in the face of the autonomous navigation demand of the unmanned vehicle in low visibility and large-scale scenes, further research needs to be carried out.
Disclosure of Invention
In order to solve the problems, the invention provides an infrared inertial integrated navigation method for a low-visibility large-scale scene, which adopts the technical scheme that:
the infrared inertial integrated navigation method for the low-visibility large-scale scene comprises the following steps of:
acquiring short-wave infrared images of a target scene and carrier inertial data;
2, preprocessing the short wave infrared image acquired in the step 1 to improve the resolution of the short wave infrared image;
3, increasing the distance between image frames of the image preprocessed in the step 2 by adopting an image frame grouping local optimization method;
and 4, fusing the image with the increased inter-frame distance obtained in the step 3 with the inertial data of the carrier obtained in the step 1, estimating the motion state of the carrier and navigating.
Preferably, step 2 is specifically:
2.1, correcting the non-uniformity of the short wave infrared image;
and 2.2, performing super-resolution reconstruction on the corrected short-wave infrared image.
Wherein the preferable scheme of the step 2.1 is specifically as follows:
and correcting the non-uniformity of the short wave infrared image by adopting a calibration-based multipoint correction method, a scene-based Kalman filtering method, a scene-based time domain high-pass filtering method or a scene-based deep learning method.
Wherein the preferable scheme of the step 2.2 is specifically as follows:
based on prior information acquired by a deep learning neural network, taking the relationship between deep learning and traditional sparse coding as a basis, dividing a convolutional neural network into three stages of image block extraction and representation, nonlinear mapping and high-resolution image reconstruction, and then unifying the three stages into a deep convolutional neural network frame to realize end-to-end learning from a low-resolution image to a high-resolution image so as to acquire corresponding high-resolution characteristics and provide a basis for realizing sub-pixel level matching operation.
The image block extraction and representation of the convolutional neural network in the step 2.2 are specifically as follows:
extracting features from the initial short wave infrared image to obtain a feature map F of the image1(Y) is represented by F1(Y)=max(0,W1*Y+B1);
Where Y represents the initial low resolution image, x represents the convolution operation, W1As a convolution kernel, B1Is a neuron bias vector;
the nonlinear mapping of the convolutional neural network in step 2.2 is specifically:
the non-linear mapping transforms the feature vectors from a low-resolution space to a high-resolution space, the process being represented as
F2(Y)=max(0,W2*F1(Y)+B2),
W2As a convolution kernel, B2Is a neuron bias vector;
the high-resolution image reconstruction of the convolutional neural network in the step 2.2 specifically comprises the following steps:
generating a final high-resolution image by using the previously obtained output characteristic diagram based on the blocks, wherein the training process of the whole network is an estimation and optimization process of parameters in the network, and the optimal solution of the network parameters is obtained by minimizing the error between the reconstructed image F (Y) and the real high-resolution image, and the process is expressed as
F(Y)=W3*F2(Y)+B3
Preferably, step 3 is specifically:
3.1, under the geometric constraint, luminosity constraint and semantic constraint between two frames of images, establishing a reprojection error constraint term in a sliding window, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term;
3.2, defining the sum of a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term as an error sum function;
3.3 dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, and calculating the error and the function of each group by a Gauss-Newton iteration method to obtain a solution when the error and the function are minimum;
on the premise of ensuring accurate triangle measurement, the difference between the object distance and the base line is not more than 1 order of magnitude; dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, wherein the baseline distance between every two key frames is obviously greater than that between every two adjacent key frames; optimizing each group, and then jointly optimizing each group;
within each group, the error sum function of ORB feature points within the sliding window is defined as
Figure GDA0003322155860000041
Wherein eta isR、ηPAnd ηsWeighting coefficients of a reprojection error constraint term, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term are respectively determined by analyzing image semantic information under the current scene;
calculating the error sum function (5) of each group by a Gauss-Newton iteration method to obtain a solution when the minimum value is obtained;
3.4, defining the sum of the error sum functions of each group as a visual objective function, and calculating the solution when the visual objective function obtains the minimum value through a Gauss-Newton iteration method;
the visual objective function is defined as
Figure GDA0003322155860000051
After each group of key frames is optimized, each group is optimized simultaneously, namely, a solution when a visual target function (6) obtains a minimum value is calculated through a Gauss-Newton iteration method;
estimating the camera motion by using a plurality of key frame information in the sliding window is equivalent to finding a solution when the formula (6) obtains the minimum value, so that the estimation of the camera motion state is converted into a nonlinear optimization problem, as shown in the formula (7)
Figure GDA0003322155860000052
F is a key frame set to be optimized, and the following matrix needs to be calculated when solving by adopting a Gauss-Newton iteration method
Figure GDA0003322155860000053
Wherein the content of the first and second substances,
Figure GDA0003322155860000054
in order to be a weight matrix, the weight matrix,
Figure GDA0003322155860000055
in the form of a residual vector, the vector,
Figure GDA0003322155860000056
is a Jacobian matrix.
Preferably, the re-projection error constraint term, the pixel luminosity error constraint term and the semantic error constraint term in the neighborhood established in the step 3.1 are specifically:
(a) reprojection error
The reprojection error refers to the observed point (x) in the imagej,yj)TMap points corresponding thereto
Figure GDA0003322155860000057
The position deviation of the back projection in the image is defined as
Figure GDA0003322155860000058
Wherein T isiRepresents the pose of the camera in the world coordinate system when shooting the ith frame image, TjRepresenting the pose of the camera in a world coordinate system when the camera shoots a jth frame image, K representing a camera internal parameter calibration matrix, pi representing the transformation from a homogeneous coordinate to a Cartesian coordinate, and d representing the estimated depth of a key point;
(b) pixel luminance error in the neighborhood
Image IiAny point p is belonged to omegaiIn another frame image IjAbove photometric error is defined as
Figure GDA0003322155860000061
Wherein N ispIs the set of pixels in the neighborhood around the p-point, IiAnd IjIs two adjacent frames of images, tiAnd tjIs the exposure time of two adjacent frames of images, ai、bi、ajAnd bjIs to correct affine illumination transformation coefficient, and gamma represents Huber normNumber, omegapIs a gradient dependent weight, p' is the projection onto the image frame IjA point on;
(c) semantic error
Semantic error represents map point with semantic c
Figure GDA0003322155860000062
Probability of c corresponding to semantics of pixel points in image
Figure GDA0003322155860000063
Observing the semantic meaning of an image SkPose with camera TkAssociating with the map point, and re-projecting the pixel point semantics and the pixel point to the image SkTwo-dimensional pixel coordinates of
Figure GDA0003322155860000064
The distance to the nearest region labeled as semantic c is inversely proportional, and the map point is set
Figure GDA0003322155860000065
Is defined as
Figure GDA0003322155860000066
Wherein the content of the first and second substances,
Figure GDA0003322155860000067
defined as a distance transformation function on a binary image, sigma representing the uncertainty of the semantic image segmentation, and a probability vector calculated from all observations of the corresponding map points, provided that the camera is in position TkUnder the observation, there are
Figure GDA0003322155860000068
Preferably, step 4 is specifically:
fusing the image with increased inter-frame distance with inertial data, and estimating the motion state of the carrierNavigating; performing combined navigation on the visual measurement and the inertial measurement, and calculating a solution when a sum function of a visual target function and an inertial measurement error obtains a minimum value in a sliding window; suppose that the time instants of two key frames correspond to two states s respectivelyiAnd sjThe inertia measurement value of the inertia measurement unit is ai,jAnd ωi,jThe covariance matrix between the two states is
Figure GDA0003322155860000071
The state is predicted to
Figure GDA0003322155860000072
The inertial measurement error is
Figure GDA0003322155860000073
The objective function of the visual inertia joint optimization is defined as
Etotal=Evision+Einertial (10)
And calculating a solution when the objective function (10) of the visual-inertial joint optimization is minimum through a nonlinear optimization method, thereby estimating the motion state of the carrier for navigation.
The invention has the beneficial effects that:
1. aiming at the problem that a visible light camera cannot normally detect a target scene under low-visibility meteorological conditions, the infrared inertial integrated navigation method for the low-visibility large-scale scene provided by the invention adopts a short-wave infrared camera to detect, and can clearly detect the details of the target scene under the low-visibility meteorological conditions such as fog, haze, rain, snow, dust and the like;
2. aiming at the problem of low resolution of short-wave infrared images, the infrared inertial integrated navigation method for the low-visibility large-scale scene adopts an image super-resolution reconstruction technology based on deep learning to improve the resolution of the short-wave infrared images;
3. aiming at the problem that the baseline distance is short and the method cannot be applied to a large-scale scene in the current mainstream SLAM algorithm, the infrared inertial integrated navigation method for the low-visibility large-scale scene provided by the invention adopts a short-wave infrared image frame grouping local optimization method to increase the inter-frame distance;
4. aiming at the problems that the data update rate of a visual sensor is low and the estimation of the motion state of a high maneuvering carrier cannot be met, the infrared inertial integrated navigation method for the low-visibility large-scale scene adopts a visual/inertial integrated navigation strategy to fuse visual measurement and inertial data so as to obtain the motion state with high update rate and high precision;
5. the infrared inertial integrated navigation method for the low-visibility large-scale scene provided by the invention can realize simultaneous positioning and map construction (SLAM) in the low-visibility large-scale scene without depending on the assistance of a Global Navigation Satellite System (GNSS) and prior geographic information, and has the navigation capability of high precision, strong robustness, autonomy and expandability.
Drawings
FIG. 1 is a flow chart of an infrared inertial integrated navigation method for a low visibility large scale scene according to the present invention;
fig. 2 is a schematic diagram of sliding window packet optimization in the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention provides an infrared inertial integrated navigation method for a low-visibility large-scale scene, a flow diagram of which is shown in figure 1 and can be implemented mainly by the following four steps:
step 1, detecting a target scene by adopting a short-wave infrared camera on a carrier to obtain a short-wave infrared image, and obtaining inertial data by adopting an inertial measurement unit on the carrier.
Because infrared imaging can observe the information of wave band beyond the visible light, and do not receive day and night and weather effect, have stronger ability of penetrating through smoke and dust and haze, consequently choose for use shortwave infrared camera to survey the target scene.
And 2, carrying out image preprocessing on the short wave infrared image to improve the resolution of the short wave infrared image.
Although the short-wave infrared camera can detect under a low visibility condition, the resolution of a short-wave infrared image is low, and the visual positioning precision is affected, so that image preprocessing is required, and the method specifically comprises the following two steps:
and 2.1, correcting the non-uniformity of the short wave infrared image.
As a core component of an Infrared imaging system, an Infrared Focal Plane array (IRFPA) is formed by arranging and combining a plurality of detectors with single pixels, and under the irradiation of a uniform background, the irradiance received by each detection unit in the Focal Plane array is the same, but the obtained response is different, and the imaging shows image interference which is basically unchanged along with time, namely the non-uniformity of the Infrared imaging. The non-uniformity not only seriously affects the quality of infrared imaging, but also interferes with the accuracy of a visual odometer, so that the improvement of the non-uniformity of the infrared imaging is necessary.
And correcting the non-uniformity of the short wave infrared image by adopting an algorithm with stronger instantaneity, such as a calibration-based multipoint correction method, a scene-based Kalman filtering method, a scene-based time domain high-pass filtering method or a scene-based deep learning method.
And 2.2, performing super-resolution reconstruction on the corrected short-wave infrared image.
In the image feature extraction and learning process, based on prior information acquired by the deep learning neural network, super-resolution reconstruction can be performed on the processed image and the key frame through the convolutional neural network. In the invention, a high-resolution reconstruction layer is added on the basis of a feature extraction layer and a nonlinear mapping layer in a convolutional neural network, and a key frame in an SLAM is operated to realize high-resolution reconstruction so as to obtain the feature representation of an image under the condition of high resolution.
In the super-resolution feature extraction process, the relation between deep learning and traditional sparse coding is taken as a basis, the convolutional neural network is divided into three stages of image block extraction and representation, nonlinear mapping and image reconstruction, and the three stages are unified into a deep convolutional neural network frame, so that end-to-end learning from a low-resolution image to a high-resolution image is realized, corresponding high-resolution features are obtained, and a basis is provided for realizing sub-pixel level matching operation.
In the above framework, the feature extraction is to extract blocks from the initial short wave infrared image to obtain a feature map F of the image1(Y) may be represented by F1(Y)=max(0,W1*Y+B1) Where Y represents the initial low resolution image, x represents the convolution operation, W1As a convolution kernel, B1Are neuron bias vectors. And processing the feature map obtained by convolution by a ReLU activation function. The non-linear mapping transforms the feature vectors from a low-resolution space to a high-resolution space, a process which may be denoted as F2(Y)=max(0,W2*F1(Y)+B2) Likewise, W2As a convolution kernel, B2Are neuron bias vectors. High resolution image reconstruction utilizes a previously derived block-based output feature map to generate a final high resolution image, a process which may be denoted as f (y) ═ W3*F2(Y)+B3. The training process of the whole network is the estimation and optimization process of the parameters in the network. The optimal solution of the network parameters is obtained by minimizing the error between the reconstructed image f (y) and the true high resolution image.
And 3, increasing the distance between image frames of the preprocessed image by adopting an image frame grouping local optimization method.
The method is implemented according to the following steps:
and 3.1, under the geometric constraint, luminosity constraint and semantic constraint between two frames of images, establishing a reprojection error constraint term in a sliding window, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term.
(a) Reprojection error
The reprojection error refers to the observed point (x) in the imagej,yj)TMap points corresponding thereto
Figure GDA0003322155860000101
The position deviation of the back projection in the image is defined as
Figure GDA0003322155860000102
Wherein T isiRepresents the pose of the camera in the world coordinate system when shooting the ith frame image, TjRepresenting the pose of the camera in a world coordinate system when the camera shoots a jth frame image, K representing a camera internal parameter calibration matrix, pi representing the transformation from a homogeneous coordinate to a Cartesian coordinate, and d representing the estimated depth of a key point;
(b) pixel luminance error in the neighborhood
Image IiAny point p is belonged to omegaiIn another frame image IjAbove photometric error is defined as
Figure GDA0003322155860000111
Wherein N ispIs the set of pixels in the neighborhood around the p-point, IiAnd IjIs two adjacent frames of images, tiAnd tjIs the exposure time of two adjacent frames of images, ai、bi、ajAnd bjIs to correct affine illumination transformation coefficient, gamma represents Huber norm, omegapIs a gradient dependent weight, p' is the projection onto the image frame IjA point on;
(c) semantic error
Semantic error represents map point with semantic c
Figure GDA0003322155860000112
Probability of c corresponding to semantics of pixel points in image
Figure GDA0003322155860000113
Observing the semantic meaning of an image SkPose with camera TkAssociating with the map point, and re-projecting the pixel point semantics and the pixel point to the image SkTwo-dimensional pixel coordinates of
Figure GDA0003322155860000114
The distance to the nearest region labeled as semantic c is inversely proportional, and the map point is set
Figure GDA0003322155860000115
Is defined as
Figure GDA0003322155860000116
Wherein the content of the first and second substances,
Figure GDA0003322155860000117
defined as a distance transformation function on a binary image, sigma representing the uncertainty of the semantic image segmentation, and a probability vector calculated from all observations of the corresponding map points, provided that the camera is in position TkUnder the observation, there are
Figure GDA0003322155860000118
And 3.2, defining the sum of the reprojection error constraint term, the pixel luminosity error constraint term in the neighborhood and the semantic error constraint term as an error sum function.
And 3.3, dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, and calculating the error and the function of each group by a Gauss-Newton iteration method to obtain the solution when the error and the function of each group obtain the minimum value.
On the premise of ensuring accurate triangle measurement, the difference between the object distance and the base line is not more than 1 order of magnitude. Because the length of the base line between the key frames is short, the traditional visual odometer is only suitable for small-scale spaces such as indoor spaces and streets, and in order to meet the requirement of accurate visual positioning of a carrier in a large-scale space, the length of the base line between the key frames needs to be increased. For this purpose, a method of image frame grouping local optimization is adopted, that is, the key frames in the sliding window are divided into a plurality of groups according to fixed intervals, and the baseline distance between every two key frames is obviously larger than that between two adjacent key frames, as shown in fig. 2. Meanwhile, in order to ensure the state estimation precision after filtering, each group is optimized firstly, and then all the groups are optimized together, so that the visual positioning in a large-scale space is met, and the state estimation precision is also ensured.
Within each group, the error sum function of ORB feature points within the sliding window is defined as
Figure GDA0003322155860000121
Wherein eta isR、ηPAnd ηsAnd the weight coefficients are respectively a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term, and the coefficients are determined by analyzing the image semantic information under the current scene.
And calculating the error of each group and solving the minimum value of the function (5) by a Gauss-Newton iteration method.
And 3.4, defining the sum of the error sum functions of all groups as a visual objective function, and calculating the solution of the visual objective function when the minimum value is obtained by a Gauss-Newton iteration method.
The visual objective function is defined as
Figure GDA0003322155860000122
After the optimization of each group of key frames is completed, each group is optimized simultaneously, namely, the solution when the visual objective function (6) is calculated by a Gauss-Newton iteration method to obtain the minimum value is obtained.
Estimating the camera motion by using a plurality of key frame information in the sliding window is equivalent to finding a solution when the formula (6) obtains the minimum value, so that the estimation of the camera motion state is converted into a nonlinear optimization problem, as shown in the formula (7)
Figure GDA0003322155860000123
F is a key frame set to be optimized, and the following matrix needs to be calculated when solving by adopting a Gauss-Newton iteration method
Figure GDA0003322155860000131
Wherein the content of the first and second substances,
Figure GDA0003322155860000132
in order to be a weight matrix, the weight matrix,
Figure GDA0003322155860000133
in the form of a residual vector, the vector,
Figure GDA0003322155860000134
is a Jacobian matrix.
And 4, fusing the image with the increased inter-frame distance with inertial data, and estimating the motion state of the carrier for navigation.
In order to obtain a high update rate and high accuracy state estimation, a solution is calculated in a sliding window when a sum function of a visual objective function and an inertial measurement error takes a minimum value by performing combined navigation on visual measurement and inertial measurement. Suppose that the time instants of two key frames correspond to two states s respectivelyiAnd sjThe inertia measurement value of the inertia measurement unit is ai,jAnd ωi,jThe covariance matrix between the two states is
Figure GDA0003322155860000135
The state is predicted to
Figure GDA0003322155860000136
The inertial measurement error is then:
Figure GDA0003322155860000137
the objective function of the visual inertia joint optimization is defined as
Etotal=Evision+Einertial (10)
And calculating a solution when the objective function (10) of the visual inertia joint optimization is minimum through a nonlinear optimization method, such as a Gauss-Newton iteration method, thereby estimating the motion state of the carrier for navigation.
According to the method, an NVIDIA Jetson TX2 embedded computer module is used as a computing platform, the computer is an AI single-module super computer based on an NVIDIA Pascal framework, and technical parameters are detailed in a table 1.
TABLE 1 NVIDIA Jetson TX computer Module hardware resources
Figure GDA0003322155860000138
Figure GDA0003322155860000141
And performing algorithm test and verification on a ground experiment platform by adopting four authoritative open data sets and one self-owned data set. Wherein, the ground experiment platform is a NVIDIA Jetson TX2 development kit. Selecting a corresponding test data set aiming at a typical flight environment of the micro unmanned aerial vehicle, wherein the test verification items comprise: navigation accuracy, computational complexity, illumination change robustness, scene change robustness, and the like, as detailed in table 2.
TABLE 2 data set correspondence test item
Data set Test environment Characteristics of the test
KITTI data set Indoor and city block Dynamic scenes
EuRoC data set Indoor room, industrial area Dynamic scenes
TUM data set Indoor corridor and jungle Indoor circular path
Oxford data set City block (building and road) Outdoor circular path
ICL-NUIM dataset Indoor room Indoor circular path

Claims (6)

1. An infrared inertial integrated navigation method for a low-visibility large-scale scene is characterized by comprising the following steps:
acquiring short-wave infrared images of a target scene and carrier inertial data;
2, preprocessing the short wave infrared image acquired in the step 1 to improve the resolution of the short wave infrared image;
and 3, increasing the distance between image frames by adopting an image frame grouping local optimization method for the image preprocessed in the step 2:
3.1, under the geometric constraint, luminosity constraint and semantic constraint between two frames of images, establishing a reprojection error constraint term in a sliding window, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term;
(a) reprojection error
The reprojection error refers to the observed point (x) in the imagej,yj)TMap points corresponding thereto
Figure FDA0003322155850000011
The position deviation of the back projection in the image is defined as
Figure FDA0003322155850000012
Wherein T isiRepresents the pose of the camera in the world coordinate system when shooting the ith frame image, TjRepresenting the pose of the camera in a world coordinate system when the camera shoots a jth frame image, K representing a camera internal parameter calibration matrix, pi representing the transformation from a homogeneous coordinate to a Cartesian coordinate, and d representing the estimated depth of a key point;
(b) pixel luminance error in the neighborhood
Image IiAny point p is belonged to omegaiIn another frame image IjAbove photometric error is defined as
Figure FDA0003322155850000013
Wherein N ispIs the set of pixels in the neighborhood around the p-point, IiAnd IjIs two adjacent frames of images, tiAnd tjIs the exposure time of two adjacent frames of images, ai、bi、ajAnd bjIs to correct affine illumination transformation coefficient, gamma represents Huber norm, omegapIs a gradient dependent weight, p' is the projection onto image IjA point on;
(c) semantic error
Semantic error represents map point with semantic c
Figure FDA0003322155850000021
Probability of c corresponding to semantics of pixel points in image
Figure FDA0003322155850000022
Observing the semantic meaning of an image SkPose with camera TkAssociating with the map point, and re-projecting the pixel point semantics and the pixel point to the image SkTwo-dimensional pixel coordinates of
Figure FDA0003322155850000023
The distance to the nearest region labeled as semantic c is inversely proportional, and the map point is set
Figure FDA0003322155850000024
Is defined as
Figure FDA0003322155850000025
Wherein the content of the first and second substances,
Figure FDA0003322155850000026
defined as a distance transformation function on a binary image, sigma representing the uncertainty of the semantic image segmentation, and a probability vector calculated from all observations of the corresponding map points, provided that the camera is in position TkUnder the observation, there are
Figure FDA0003322155850000027
3.2, defining the sum of a reprojection error constraint term, a pixel luminosity error constraint term in the neighborhood and a semantic error constraint term as an error sum function;
3.3 dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, and calculating the error and the function of each group by a Gauss-Newton iteration method to obtain a solution when the error and the function are minimum;
on the premise of ensuring accurate triangle measurement, the difference between the object distance and the base line is not more than 1 order of magnitude; dividing the key frames in the sliding window into a plurality of groups according to fixed intervals, wherein the baseline distance between every two key frames is obviously greater than that between every two adjacent key frames; optimizing each group, and then jointly optimizing each group;
within each group, the error sum function of ORB feature points within the sliding window is defined as
Figure FDA0003322155850000028
Wherein eta isR、ηPAnd ηsWeighting coefficients of a reprojection error constraint term, a pixel luminosity error constraint term in a neighborhood and a semantic error constraint term are respectively determined by analyzing image semantic information under the current scene;
calculating the error sum function (5) of each group by a Gauss-Newton iteration method to obtain a solution when the minimum value is obtained;
3.4, defining the sum of the error sum functions of each group as a visual objective function, and calculating the solution when the visual objective function obtains the minimum value through a Gauss-Newton iteration method;
the visual objective function is defined as
Figure FDA0003322155850000031
After each group of key frames is optimized, each group is optimized simultaneously, namely, a solution when a visual target function (6) obtains a minimum value is calculated through a Gauss-Newton iteration method;
estimating the camera motion by using a plurality of key frame information in the sliding window is equivalent to finding a solution when the formula (6) obtains the minimum value, so that the estimation of the camera motion state is converted into a nonlinear optimization problem, as shown in the formula (7)
Figure FDA0003322155850000032
F is a key frame set to be optimized, and the following matrix needs to be calculated when solving by adopting a Gauss-Newton iteration method
Figure FDA0003322155850000033
Wherein the content of the first and second substances,
Figure FDA0003322155850000034
in order to be a weight matrix, the weight matrix,
Figure FDA0003322155850000035
in the form of a residual vector, the vector,
Figure FDA0003322155850000036
is a Jacobian matrix;
and 4, fusing the image with the increased inter-frame distance obtained in the step 3 with the inertial data of the carrier obtained in the step 1, estimating the motion state of the carrier and navigating.
2. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 1, characterized in that:
the step 2 specifically comprises the following steps:
2.1, correcting the non-uniformity of the short wave infrared image;
and 2.2, performing super-resolution reconstruction on the corrected short-wave infrared image.
3. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 2, characterized in that:
the step 2.1 is specifically as follows:
and correcting the non-uniformity of the short wave infrared image by adopting a calibration-based multipoint correction method, a scene-based Kalman filtering method, a scene-based time domain high-pass filtering method or a scene-based deep learning method.
4. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 2, characterized in that:
the step 2.2 is specifically as follows:
based on prior information acquired by a deep learning neural network, taking the relationship between deep learning and traditional sparse coding as a basis, dividing a convolutional neural network into three stages of image block extraction and representation, nonlinear mapping and high-resolution image reconstruction, and then unifying the three stages into a deep convolutional neural network frame to realize end-to-end learning from a low-resolution image to a high-resolution image so as to acquire corresponding high-resolution characteristics and provide a basis for realizing sub-pixel level matching operation.
5. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 4, characterized in that:
the image block extraction and representation of the convolutional neural network in the step 2.2 are specifically as follows:
extracting features from the initial short wave infrared image to obtain a feature map F of the image1(Y),
Is shown as F1(Y)=max(0,W1*Y+B1);
Where Y represents the initial low resolution image, x represents the convolution operation, W1As a convolution kernel, B1Is a neuron bias vector;
the nonlinear mapping of the convolutional neural network in step 2.2 is specifically:
the non-linear mapping transforms the feature vectors from a low-resolution space to a high-resolution space, the process being represented as
F2(Y)=max(0,W2*F1(Y)+B2),
W2As a convolution kernel, B2Is a neuron bias vector;
the high-resolution image reconstruction of the convolutional neural network in the step 2.2 specifically comprises the following steps:
generating a final high-resolution image by using the previously obtained output characteristic diagram based on the blocks, wherein the training process of the whole network is an estimation and optimization process of parameters in the network, and the optimal solution of the network parameters is obtained by minimizing the error between the reconstructed image F (Y) and the real high-resolution image, and the process is expressed as
F(Y)=W3*F2(Y)+B3
6. The infrared inertial integrated navigation method for low visibility large scale scenes according to claim 1, characterized in that:
the step 4 specifically comprises the following steps:
fusing the image with the increased inter-frame distance with inertial data, and estimating the motion state of the carrier for navigation; performing combined navigation on the visual measurement and the inertial measurement, and calculating a solution when a sum function of a visual target function and an inertial measurement error obtains a minimum value in a sliding window; suppose that the time instants of two key frames correspond to two states s respectivelyiAnd sjThe inertia measurement value of the inertia measurement unit is ai,jAnd ωi,jThe covariance matrix between the two states is
Figure FDA0003322155850000051
The state is predicted to
Figure FDA0003322155850000052
The inertial measurement error is
Figure FDA0003322155850000053
The objective function of the visual inertia joint optimization is defined as
Etotal=Evision+Einertial (10)
And calculating a solution when the objective function (10) of the visual-inertial joint optimization is minimum through a nonlinear optimization method, thereby estimating the motion state of the carrier for navigation.
CN202010381178.0A 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene Active CN111536970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010381178.0A CN111536970B (en) 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010381178.0A CN111536970B (en) 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene

Publications (2)

Publication Number Publication Date
CN111536970A CN111536970A (en) 2020-08-14
CN111536970B true CN111536970B (en) 2022-01-25

Family

ID=71973440

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010381178.0A Active CN111536970B (en) 2020-05-08 2020-05-08 Infrared inertial integrated navigation method for low-visibility large-scale scene

Country Status (1)

Country Link
CN (1) CN111536970B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112489119B (en) * 2020-12-21 2023-01-31 北京航空航天大学 Monocular vision positioning method for enhancing reliability
CN112659128A (en) * 2020-12-28 2021-04-16 智动时代(北京)科技有限公司 Robot brain and human-computer cooperative control brain parallel cooperative control method
CN113239936B (en) * 2021-04-26 2024-05-28 大连理工大学 Unmanned aerial vehicle visual navigation method based on deep learning and feature point extraction
CN117455762B (en) * 2023-12-25 2024-04-12 盯盯拍(深圳)技术股份有限公司 Method and system for improving resolution of recorded picture based on panoramic automobile data recorder

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7395156B2 (en) * 2005-06-23 2008-07-01 Raytheon Company System and method for geo-registration with global positioning and inertial navigation
CN105487557B (en) * 2015-12-07 2018-06-19 浙江大学 A kind of unmanned plane independent landing guiding system based on day-old chick ultraviolet imagery
CN109341700B (en) * 2018-12-04 2023-06-30 中国航空工业集团公司西安航空计算技术研究所 Visual auxiliary landing navigation method for fixed-wing aircraft under low visibility
CN109341724B (en) * 2018-12-04 2023-05-05 中国航空工业集团公司西安航空计算技术研究所 On-line calibration method for relative pose of airborne camera-inertial measurement unit

Also Published As

Publication number Publication date
CN111536970A (en) 2020-08-14

Similar Documents

Publication Publication Date Title
CN111536970B (en) Infrared inertial integrated navigation method for low-visibility large-scale scene
CN111862126B (en) Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm
CN108665496B (en) End-to-end semantic instant positioning and mapping method based on deep learning
CN107564061B (en) Binocular vision mileage calculation method based on image gradient joint optimization
CN111693047B (en) Visual navigation method for micro unmanned aerial vehicle in high-dynamic scene
CN108596101B (en) Remote sensing image multi-target detection method based on convolutional neural network
CN108242079B (en) VSLAM method based on multi-feature visual odometer and graph optimization model
CN106097304B (en) A kind of unmanned plane real-time online ground drawing generating method
CN105865454B (en) A kind of Navigation of Pilotless Aircraft method generated based on real-time online map
CN105809687A (en) Monocular vision ranging method based on edge point information in image
US11430087B2 (en) Using maps comprising covariances in multi-resolution voxels
CN114972748B (en) Infrared semantic segmentation method capable of explaining edge attention and gray scale quantization network
CN109443354B (en) Visual-inertial tight coupling combined navigation method based on firefly group optimized PF
Hu et al. An indoor positioning framework based on panoramic visual odometry for visually impaired people
CN112016478A (en) Complex scene identification method and system based on multispectral image fusion
CN111812978B (en) Cooperative SLAM method and system for multiple unmanned aerial vehicles
CN115900710A (en) Dynamic environment navigation method based on visual information
CN111860651A (en) Monocular vision-based semi-dense map construction method for mobile robot
CN111811502B (en) Motion carrier multi-source information fusion navigation method and system
Lacroix et al. Digital elevation map building from low altitude stereo imagery
Liu et al. D-lc-nets: Robust denoising and loop closing networks for lidar slam in complicated circumstances with noisy point clouds
Jin et al. Beyond learning: Back to geometric essence of visual odometry via fusion-based paradigm
CN116188550A (en) Self-supervision depth vision odometer based on geometric constraint
CN113971697A (en) Air-ground cooperative vehicle positioning and orienting method
Le Besnerais et al. Dense height map estimation from oblique aerial image sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant