CN113989318A - Monocular vision odometer pose optimization and error correction method based on deep learning - Google Patents

Monocular vision odometer pose optimization and error correction method based on deep learning Download PDF

Info

Publication number
CN113989318A
CN113989318A CN202111221271.6A CN202111221271A CN113989318A CN 113989318 A CN113989318 A CN 113989318A CN 202111221271 A CN202111221271 A CN 202111221271A CN 113989318 A CN113989318 A CN 113989318A
Authority
CN
China
Prior art keywords
motion
time
data
pose
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111221271.6A
Other languages
Chinese (zh)
Other versions
CN113989318B (en
Inventor
肖卓凌
宋濡君
朱然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202111221271.6A priority Critical patent/CN113989318B/en
Publication of CN113989318A publication Critical patent/CN113989318A/en
Application granted granted Critical
Publication of CN113989318B publication Critical patent/CN113989318B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C25/00Manufacturing, calibrating, cleaning, or repairing instruments or devices referred to in the other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Manufacturing & Machinery (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a monocular vision odometer pose optimization and error correction method based on deep learning, which comprises the steps of obtaining image data and calculating a corresponding optical flow image sequence; segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder; inputting the high-dimensional motion characteristics into an artificial neural network; performing motion similarity modeling on the time sequence relation of the motion and local context information of the motion by using a pose transformation similarity calculation module, and guiding and optimizing pose characteristics by using an attention mechanism to obtain motion characteristics purified by motion similarity; and inputting the motion characteristics after the motion similarity is purified into a pose correction prediction network to realize pose optimization and error correction. The invention fully excavates and models the time sequence relation and the similarity of continuous motion in the image motion data, and improves the robustness.

Description

Monocular vision odometer pose optimization and error correction method based on deep learning
Technical Field
The invention relates to the field of computer vision, in particular to a monocular vision odometer pose optimization and error correction method based on deep learning.
Background
In recent years, with the rapid development of the related applications of the internet of things, the demand of Location Based Services (LBS) is driven to rise, which makes the demand for high-precision real-time positioning schemes increasingly urgent. A stable, accurate and real-time positioning system is an important guarantee for realizing application of the Internet of things such as robot control, unmanned driving, Virtual Reality (VR), commodity retail and the like.
Although positioning by a Global Navigation Satellite System (GNSS) such as a Global Positioning System (GPS), a beidou satellite navigation system (BDS), a Galileo satellite positioning system (Galileo), and a GLONASS (GLONASS) positioning system has been very popular at present, positioning by satellites may be inaccurate for some outdoor environments with severe shielding (such as tunnels, forests, etc.) or indoor scenes affected by shielding and interference of building structures to satellite signals. A Visual Odometer (VO) using a visual sensor is an effective way to solve the above problems, and has many advantages of abundant visual input information, wide applicable scenes, low cost, and the like, and is a common means for implementing positioning applications.
However, the monocular visual odometer mainly predicts the inter-frame pose transformation of the camera carrier by using image input at adjacent acquisition moments, and then accumulates to obtain the overall motion trajectory, and accumulated errors are generated to cause the trajectory estimation to diverge with the increase of the motion distance. Therefore, the key to realizing the high-precision monocular vision odometer positioning system is to effectively eliminate the accumulated error of the pose prediction of the vision odometer. At the present stage, common methods for relieving the accumulated error of the monocular vision odometer and improving the pose prediction precision include: 1) and constructing a pose graph of monocular camera carrier motion and loop detection, and performing rear-end optimization on the predicted pose. For example, the ORB-SLAM positioning system locally and globally optimizes the predicted trajectory of the positioning system based on the principle of co-visualization of landmarks. 2) And correcting the visual odometer positioning system by using other kinds of information through a data fusion method. For example, a visual-inertial odometer (VIO) is a high-precision positioning system that eliminates data drift of a visual measurement unit by combining inertial navigation information. 3) And performing motion correlation modeling on the image sequence data in a time dimension to optimize pose prediction. For example, the deep learning monocular visual odometer SRNN model is a system scheme for guiding and optimizing the predicted pose by constructing the correlation of pose transformation at adjacent moments. However, the first solution for eliminating the cumulative error of the visual odometer prediction has certain limitations, which are mainly reflected in high dependency on environmental scenes and weak universality. For example, for a real motion scene which does not occur, environmental landmarks cannot be constructed in advance, and the possibility that the motion trajectory cannot be closed is existed, so that the pose graph optimization and loop detection module is likely to fail. In addition, the second scheme for eliminating the prediction accumulated error of the monocular vision odometer also has certain limitation, which is mainly shown in that when the quality of the measured data of other types of sensors is poor, the prediction precision of the original monocular vision odometer can be obviously influenced, and meanwhile, the data fusion algorithm can also have great influence on the final prediction effect.
Disclosure of Invention
Aiming at the defects in the prior art, the method for optimizing the pose and correcting the error of the monocular vision odometer based on the deep learning solves the problem that the pose optimization and the error correction in the prior art are poor in accuracy and robustness.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the method for optimizing the pose and correcting the error of the monocular vision odometer based on the deep learning comprises the following steps:
s1, acquiring image data and calculating a corresponding optical flow image sequence;
s2, segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder;
s3, inputting the high-dimensional motion characteristics into an artificial neural network to obtain the time sequence relation of motion and local context information of the motion;
s4, inputting the result of the step S3 into a pose transformation similarity calculation module for motion similarity modeling to obtain motion correlation characteristics of motion time sequence relation and motion correlation characteristics of motion local context information; optimizing the pose characteristics by utilizing an attention mechanism based on the motion correlation characteristics to obtain motion characteristics after motion similarity purification;
and S5, inputting the motion characteristics purified through the motion similarity into a pose correction prediction network for pose optimization and error correction.
Further, the specific method of step S1 is:
s1-1, setting the sampling frequency of the monocular vision sensor, and sampling to obtain a three-channel color RGB image sequence;
s1-2, Flo according to formulat=F(It-1,It) Calculating an optical flow image sequence of the three-channel color RGB image sequence; wherein FlotIs an optical flow image at the time t, F (-) is an optical flow calculation formula, It-1Three-channel color RGB image at time t-1, and ItThree-channel color RGB image at time t.
Further: the sampling frequency of the monocular vision sensor is set to be 20 Hz; the data dimension of the three-channel color RGB image is (1226,370,3), and the data dimension of the optical flow image is (1226,370, 2); and correspondingly calculating each two three-channel color RGB image frames to obtain a corresponding optical flow image frame.
Further, the specific method for obtaining the plurality of segmented input sequence data in step S2 is as follows:
segmenting the optical flow image sequence by utilizing a sliding window with the length of 9 and the step length of 9 to obtain input sequence data with the length of 9; wherein each input sequence data is four-dimensional tensor data with the dimension of (9,1226,370,2), and comprises data of the optical flow image in three dimensions under the length of a sliding window.
Further, the specific method in step S3 is:
inputting the high-dimensional motion characteristics into an artificial neural network comprising two layers of long-time and short-time memory networks connected in series, and according to a formula:
it=σ(ωixxtihht-1+bi)
gt=tanh(ωgxxtghht-1+bg)
ft=σ(ωfxxtfhht-1+bf)
ct=ft⊙ct-1+it⊙gt
ot=σ(ωoxxtohht-1+bo)
ht=ot⊙tanh(ct-1)
obtaining local context information h of motiontI.e. hidden unit state at time t, and temporal relation of motion otThe output of the network at the time t is memorized in long and short time; wherein itFor memorizing the state of an input gate at the moment t of the network in long and short time, sigma (-) is a sigmoid activation function, omegaixAs weights of the input data, xtIs the input state at time t, ωihFor input data corresponding to the weight of the hidden unit, ht-1Hidden unit state at time t-1, biFor the corresponding offset of the input data, gtFor the candidate information of the input data at time t, tanh (-) is the activation function, ωgxAs weights, omega, of the input data candidate informationghFor input data candidates corresponding to the weight of the hidden unit, bgFor the corresponding offset of the input data candidate information, ftForgetting the door state at time t, ωfxWeight to forget gate state, ωfhWeight of hidden unit corresponding to forgetting door state, bfFor forgetting the corresponding offset of the door state, ctNeuron state at time t, ct-1Neuron state at time t-1, ωoxIs the weight of the output gate state, ωohFor output gate states corresponding to the weight of the hidden unit, boAn offset corresponding to the output gate state; a Hadamard product of a vector;
the output of the last layer of long and short time memory network is the time sequence relation of motion, the dimension is (1,1024), the hidden unit states of the two layers of long and short time memory networks store the local context information of the motion, and the dimension is (2,1024).
Further, the specific method for obtaining the motion characteristics of the motion similarity purification in step S4 is as follows:
according to the formula:
Figure BDA0003312714350000051
Figure BDA0003312714350000052
X″t=f1×1([X′t,H′t])
obtaining the optimized pose characteristic X' output at the time ttMotion characteristics refined by motion similarity; wherein X'tIn order to obtain the motion characteristics of the purified t moment under the guidance of attention mechanism based on motion similarity, exp (-) is a logarithmic function taking a natural logarithm as a base, S (-) is a cosine similarity function, and X (-) ist-1Motion features, X, extracted for the artificial neural network at time t-1tThe motion feature extracted from the artificial neural network at the moment t, namely the motion correlation feature of local context information of the motion, W is the vector dimension of the motion feature, H'tFor the refined motion local context information at time t under the guidance of attention mechanism based on motion similarity, HnThe local context information of the motion stored in the hidden unit state of the last layer of the long-time memory network of the artificial neural network, namely the motion correlation characteristic of the motion time sequence relation, f1×1(. h) is a convolution layer with convolution kernel size of 1 × 1, [ X't,H′t]The method is a splicing process of the purified motion characteristics and the purified motion local context information.
Further, the pose correction prediction network in step S5 includes a first long-short time memory network, a second long-short time memory network, a first fully-connected layer, and a second fully-connected layer, which are connected in sequence; the output dimensionalities of the two long and short time memory networks are 1024; the number of neurons of the first fully-connected layer is 128, and the activation function is included; the number of neurons in the second fully-connected layer was 6, with no activation function.
The invention has the beneficial effects that:
1. based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, a time sequence relation of continuous motion and local context information of the motion are mined by designing a pose optimization and error correction method based on a deep learning method, similarity information of the continuous motion in sensor data is modeled, high-dimensional motion characteristics are guided and optimized through an attention mechanism, and finally pose change prediction of a camera carrier between adjacent camera sampling points is obtained through fitting, so that the goals of pose optimization and error correction are achieved, and the accuracy of a system is ensured.
2. Under the condition that the camera parameters of a visual sensor, the scene point landmarks and the depth information of the scene points are not needed, the time sequence relation of continuous motion is extracted through an artificial neural network based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, the characteristic data is optimized under the guidance of an attention mechanism by utilizing motion similarity characteristics, and finally, the optimization prediction and error correction of the pose can be accurately and robustly obtained, namely, the robustness of the system is ensured, and the absolute scale of the motion trail is completely and autonomously recovered under the condition of only depending on monocular image data.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, the method for optimizing the pose and correcting the error of the monocular vision odometer based on the deep learning comprises the following steps:
s1, acquiring image data and calculating a corresponding optical flow image sequence;
s2, segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder;
s3, inputting the high-dimensional motion characteristics into an artificial neural network to obtain the time sequence relation of motion and local context information of the motion;
s4, inputting the result of the step S3 into a pose transformation similarity calculation module for motion similarity modeling to obtain motion correlation characteristics of motion time sequence relation and motion correlation characteristics of motion local context information; optimizing the pose characteristics by utilizing an attention mechanism based on the motion correlation characteristics to obtain motion characteristics after motion similarity purification;
and S5, inputting the motion characteristics purified through the motion similarity into a pose correction prediction network for pose optimization and error correction.
The specific method of step S1 is:
s1-1, setting the sampling frequency of the monocular vision sensor, and sampling to obtain a three-channel color RGB image sequence;
s1-2, Flo according to formulat=F(It-1,It) Calculating an optical flow image sequence of the three-channel color RGB image sequence; wherein FlotIs an optical flow image at the time t, F (-) is an optical flow calculation formula, It-1Three-channel color RGB image at time t-1, and ItThree-channel color RGB image at time t.
The sampling frequency of the monocular vision sensor is set to be 20 Hz; the data dimension of the three-channel color RGB image is (1226,370,3), and the data dimension of the optical flow image is (1226,370, 2); and correspondingly calculating each two three-channel color RGB image frames to obtain a corresponding optical flow image frame.
The specific method for obtaining the plurality of segmented input sequence data in step S2 is as follows:
segmenting the optical flow image sequence by utilizing a sliding window with the length of 9 and the step length of 9 to obtain input sequence data with the length of 9; wherein each input sequence data is four-dimensional tensor data with the dimension of (9,1226,370,2), and comprises data of the optical flow image in three dimensions under the length of a sliding window.
The specific method in step S3 is:
inputting the high-dimensional motion characteristics into an artificial neural network comprising two layers of long-time and short-time memory networks connected in series, and according to a formula:
it=σ(ωixxtihht-1+bi)
gt=tanh(ωgxxtghht-1+bg)
ft=σ(ωfxxtfhht-1+bf)
ct=ft⊙ct-1+it⊙gt
ot=σ(ωoxxtohht-1+bo)
ht=ot⊙tanh(ct-1)
obtaining local context information h of motiontI.e. hidden unit state at time t, and temporal relation of motion otThe output of the network at the time t is memorized in long and short time; wherein itFor memorizing the state of an input gate at the moment t of the network in long and short time, sigma (-) is a sigmoid activation function, omegaixAs weights of the input data, xtIs the input state at time t, ωihFor input data corresponding to the weight of the hidden unit, ht-1Hidden unit state at time t-1, biFor the corresponding offset of the input data, gtFor the candidate information of the input data at time t, tanh (-) is the activation function, ωgxFor inputting data candidatesWeight, ωghFor input data candidates corresponding to the weight of the hidden unit, bgFor the corresponding offset of the input data candidate information, ftForgetting the door state at time t, ωfxWeight to forget gate state, ωfhWeight of hidden unit corresponding to forgetting door state, bfFor forgetting the corresponding offset of the door state, ctNeuron state at time t, ct-1Neuron state at time t-1, ωoxIs the weight of the output gate state, ωohFor output gate states corresponding to the weight of the hidden unit, boAn offset corresponding to the output gate state; a Hadamard product of a vector;
the output of the last layer of long and short time memory network is the time sequence relation of motion, the dimension is (1,1024), the hidden unit states of the two layers of long and short time memory networks store the local context information of the motion, and the dimension is (2,1024).
The specific method for obtaining the motion characteristics of the motion similarity purification in the step S4 is as follows:
according to the formula:
Figure BDA0003312714350000091
Figure BDA0003312714350000092
X″t=f1×1([X′t,H′t])
obtaining the optimized pose characteristic X' output at the time ttMotion characteristics refined by motion similarity; wherein X'tIn order to obtain the motion characteristics of the purified t moment under the guidance of attention mechanism based on motion similarity, exp (-) is a logarithmic function taking a natural logarithm as a base, S (-) is a cosine similarity function, and X (-) ist-1Motion features, X, extracted for the artificial neural network at time t-1tThe motion characteristics extracted for the artificial neural network at the time t, namely the motion correlation characteristics of the local context information of the motion, and W isVector dimension of motion feature, H'tFor the refined motion local context information at time t under the guidance of attention mechanism based on motion similarity, HnThe local context information of the motion stored in the hidden unit state of the last layer of the long-time memory network of the artificial neural network, namely the motion correlation characteristic of the motion time sequence relation, f1×1(. h) is a convolution layer with convolution kernel size of 1 × 1, [ X't,H′t]The method is a splicing process of the purified motion characteristics and the purified motion local context information.
The pose correction prediction network in the step S5 includes a first long-short term memory network, a second long-short term memory network, a first fully-connected layer and a second fully-connected layer, which are connected in sequence; the output dimensionalities of the two long and short time memory networks are 1024; the number of neurons of the first fully-connected layer is 128, and the activation function is included; the number of neurons in the second fully-connected layer was 6, with no activation function.
The output of the first full-link layer of the pose correction prediction network is F1
Figure BDA0003312714350000093
Where Relu (. cndot.) is the activation function for the non-linear mapping, x1×iFor a 1 x i dimension input data matrix,
Figure BDA0003312714350000094
is a weight matrix to be trained with dimensions of j × i fully connected layers, b1×jIs the offset matrix for the fully connected layer with dimension 1 x j, and T is the transpose of the matrix.
The method is based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, a time sequence relation of continuous motion and local context information of the motion are mined by designing a pose optimization and error correction method based on a deep learning method, similarity information of the continuous motion in sensor data is modeled, high-dimensional motion characteristics are guided and optimized through an attention mechanism, and finally pose change prediction of a camera carrier between adjacent camera sampling points is obtained through fitting, so that the goals of pose optimization and error correction are achieved, and the accuracy of the system is ensured.
Under the condition that the camera parameters of a visual sensor, the scene point landmarks and the depth information of the scene points are not needed, the time sequence relation of continuous motion is extracted through an artificial neural network based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, the characteristic data is optimized under the guidance of an attention mechanism by utilizing motion similarity characteristics, and finally, the optimization prediction and error correction of the pose can be accurately and robustly obtained, namely, the robustness of the system is ensured, and the absolute scale of the motion trail is completely and autonomously recovered under the condition of only depending on monocular image data.

Claims (7)

1. A monocular vision odometer pose optimization and error correction method based on deep learning is characterized by comprising the following steps:
s1, acquiring image data and calculating a corresponding optical flow image sequence;
s2, segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder;
s3, inputting the high-dimensional motion characteristics into an artificial neural network to obtain the time sequence relation of motion and local context information of the motion;
s4, inputting the result of the step S3 into a pose transformation similarity calculation module for motion similarity modeling to obtain motion correlation characteristics of motion time sequence relation and motion correlation characteristics of motion local context information; optimizing the pose characteristics by utilizing an attention mechanism based on the motion correlation characteristics to obtain motion characteristics after motion similarity purification;
and S5, inputting the motion characteristics purified through the motion similarity into a pose correction prediction network for pose optimization and error correction.
2. The deep learning-based monocular vision odometer pose optimization and error correction method according to claim 1, wherein the specific method of step S1 is:
s1-1, setting the sampling frequency of the monocular vision sensor, and sampling to obtain a three-channel color RGB image sequence;
s1-2, Flo according to formulat=F(It-1,It) Calculating an optical flow image sequence of the three-channel color RGB image sequence; wherein FlotIs an optical flow image at the time t, F (-) is an optical flow calculation formula, It-1Three-channel color RGB image at time t-1, and ItThree-channel color RGB image at time t.
3. The deep learning-based monocular vision odometer pose optimization and error correction method of claim 2, characterized in that: the sampling frequency of the monocular vision sensor is set to be 20 Hz; the data dimension of the three-channel color RGB image is (1226,370,3), and the data dimension of the optical flow image is (1226,370, 2); and correspondingly calculating each two three-channel color RGB image frames to obtain a corresponding optical flow image frame.
4. The deep learning-based monocular vision odometer pose optimization and error correction method according to claim 1, wherein the specific method for obtaining the plurality of segmented input sequence data in step S2 is as follows:
segmenting the optical flow image sequence by utilizing a sliding window with the length of 9 and the step length of 9 to obtain input sequence data with the length of 9; wherein each input sequence data is four-dimensional tensor data with the dimension of (9,1226,370,2), and comprises data of the optical flow image in three dimensions under the length of a sliding window.
5. The deep learning-based monocular vision odometer pose optimization and error correction method according to claim 1, wherein the specific method in step S3 is:
inputting the high-dimensional motion characteristics into an artificial neural network comprising two layers of long-time and short-time memory networks connected in series, and according to a formula:
it=σ(ωixxtihht-1+bi)
gt=tanh(ωgxxtghht-1+bg)
ft=σ(ωfxxtfhht-1+bf)
ct=ft⊙ct-1+it⊙gt
ot=σ(ωoxxtohht-1+bo)
ht=ot⊙tanh(ct-1)
obtaining local context information h of motiontI.e. hidden unit state at time t, and temporal relation of motion otThe output of the network at the time t is memorized in long and short time; wherein itFor memorizing the state of an input gate at the moment t of the network in long and short time, sigma (-) is a sigmoid activation function, omegaixAs weights of the input data, xtIs the input state at time t, ωihFor input data corresponding to the weight of the hidden unit, ht-1Hidden unit state at time t-1, biFor the corresponding offset of the input data, gtFor the candidate information of the input data at time t, tanh (-) is the activation function, ωgxAs weights, omega, of the input data candidate informationghFor input data candidates corresponding to the weight of the hidden unit, bgFor the corresponding offset of the input data candidate information, ftForgetting the door state at time t, ωfxWeight to forget gate state, ωfhWeight of hidden unit corresponding to forgetting door state, bfFor forgetting the corresponding offset of the door state, ctNeuron state at time t, ct-1Neuron state at time t-1, ωoxIs the weight of the output gate state, ωohFor output gate states corresponding to the weight of the hidden unit, boAn offset corresponding to the output gate state; a Hadamard product of a vector;
the output of the last layer of long and short time memory network is the time sequence relation of motion, the dimension is (1,1024), the hidden unit states of the two layers of long and short time memory networks store the local context information of the motion, and the dimension is (2,1024).
6. The deep learning-based monocular vision odometer pose optimization and error correction method of claim 1, wherein the specific method for obtaining the motion features with refined motion similarity in step S4 is as follows:
according to the formula:
Figure FDA0003312714340000031
Figure FDA0003312714340000032
Xt″=f1×1([Xt′,Ht′])
obtaining optimized pose feature X output at time tt", i.e., the motion characteristics refined by motion similarity; wherein Xt' is the motion characteristic of t moment after purification based on motion similarity under the guidance of attention mechanism, exp (-) is a logarithmic function taking natural logarithm as the base, S (-) is a cosine similarity function, Xt-1Motion features, X, extracted for the artificial neural network at time t-1tThe motion characteristics extracted for the artificial neural network at the time t, namely the motion correlation characteristics of the local context information of the motion, W is the vector dimension of the motion characteristics, Ht' is the refined motion local context information at the t moment under the guidance of attention mechanism based on motion similarity, HnThe local context information of the motion stored in the hidden unit state of the last layer of the long-time memory network of the artificial neural network, namely the motion correlation characteristic of the motion time sequence relation, f1×1(. h) is a convolutional layer with a convolutional kernel size of 1 × 1, [ X ]t′,Ht′]For the purified motion characteristics andand (5) splicing the purified motion local context information.
7. The deep learning-based monocular visual odometry pose optimization and error correction method of claim 1, wherein the pose correction prediction network in step S5 comprises a first long-short term memory network, a second long-short term memory network, a first fully-connected layer and a second fully-connected layer which are connected in sequence; the output dimensionalities of the two long and short time memory networks are 1024; the number of neurons of the first fully-connected layer is 128, and the activation function is included; the number of neurons in the second fully-connected layer was 6, with no activation function.
CN202111221271.6A 2021-10-20 2021-10-20 Monocular vision odometer pose optimization and error correction method based on deep learning Active CN113989318B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111221271.6A CN113989318B (en) 2021-10-20 2021-10-20 Monocular vision odometer pose optimization and error correction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111221271.6A CN113989318B (en) 2021-10-20 2021-10-20 Monocular vision odometer pose optimization and error correction method based on deep learning

Publications (2)

Publication Number Publication Date
CN113989318A true CN113989318A (en) 2022-01-28
CN113989318B CN113989318B (en) 2023-04-07

Family

ID=79739627

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111221271.6A Active CN113989318B (en) 2021-10-20 2021-10-20 Monocular vision odometer pose optimization and error correction method based on deep learning

Country Status (1)

Country Link
CN (1) CN113989318B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485750A (en) * 2016-09-13 2017-03-08 电子科技大学 A kind of estimation method of human posture based on supervision Local Subspace
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
CN108830220A (en) * 2018-06-15 2018-11-16 山东大学 The building of vision semantic base and global localization method based on deep learning
CN111080699A (en) * 2019-12-11 2020-04-28 中国科学院自动化研究所 Monocular vision odometer method and system based on deep learning
CN111127557A (en) * 2019-12-13 2020-05-08 中国电子科技集团公司第二十研究所 Visual SLAM front-end attitude estimation method based on deep learning
CN111623797A (en) * 2020-06-10 2020-09-04 电子科技大学 Step number measuring method based on deep learning
CN112115786A (en) * 2020-08-13 2020-12-22 北京工商大学 Monocular vision odometer method based on attention U-net
CN112233179A (en) * 2020-10-20 2021-01-15 湘潭大学 Visual odometer measuring method
US10911775B1 (en) * 2020-03-11 2021-02-02 Fuji Xerox Co., Ltd. System and method for vision-based joint action and pose motion forecasting
US20210042937A1 (en) * 2019-08-08 2021-02-11 Nec Laboratories America, Inc. Self-supervised visual odometry framework using long-term modeling and incremental learning
CN112634438A (en) * 2020-12-24 2021-04-09 北京工业大学 Single-frame depth image three-dimensional model reconstruction method and device based on countermeasure network
CN112991447A (en) * 2021-03-16 2021-06-18 华东理工大学 Visual positioning and static map construction method and system in dynamic environment
CN113065546A (en) * 2021-02-25 2021-07-02 湖南大学 Target pose estimation method and system based on attention mechanism and Hough voting
CN113159043A (en) * 2021-04-01 2021-07-23 北京大学 Feature point matching method and system based on semantic information
CN113221647A (en) * 2021-04-08 2021-08-06 湖南大学 6D pose estimation method fusing point cloud local features

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106485750A (en) * 2016-09-13 2017-03-08 电子科技大学 A kind of estimation method of human posture based on supervision Local Subspace
CN107403430A (en) * 2017-06-15 2017-11-28 中山大学 A kind of RGBD image, semantics dividing method
CN108830220A (en) * 2018-06-15 2018-11-16 山东大学 The building of vision semantic base and global localization method based on deep learning
US20210042937A1 (en) * 2019-08-08 2021-02-11 Nec Laboratories America, Inc. Self-supervised visual odometry framework using long-term modeling and incremental learning
CN111080699A (en) * 2019-12-11 2020-04-28 中国科学院自动化研究所 Monocular vision odometer method and system based on deep learning
CN111127557A (en) * 2019-12-13 2020-05-08 中国电子科技集团公司第二十研究所 Visual SLAM front-end attitude estimation method based on deep learning
US10911775B1 (en) * 2020-03-11 2021-02-02 Fuji Xerox Co., Ltd. System and method for vision-based joint action and pose motion forecasting
CN111623797A (en) * 2020-06-10 2020-09-04 电子科技大学 Step number measuring method based on deep learning
CN112115786A (en) * 2020-08-13 2020-12-22 北京工商大学 Monocular vision odometer method based on attention U-net
CN112233179A (en) * 2020-10-20 2021-01-15 湘潭大学 Visual odometer measuring method
CN112634438A (en) * 2020-12-24 2021-04-09 北京工业大学 Single-frame depth image three-dimensional model reconstruction method and device based on countermeasure network
CN113065546A (en) * 2021-02-25 2021-07-02 湖南大学 Target pose estimation method and system based on attention mechanism and Hough voting
CN112991447A (en) * 2021-03-16 2021-06-18 华东理工大学 Visual positioning and static map construction method and system in dynamic environment
CN113159043A (en) * 2021-04-01 2021-07-23 北京大学 Feature point matching method and system based on semantic information
CN113221647A (en) * 2021-04-08 2021-08-06 湖南大学 6D pose estimation method fusing point cloud local features

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
RAN ZHU等: "DeepAVO: Efficient pose refining with feature distilling for deep Visual Odometry" *
XIANGYU LI等: "Transformer guided geometry model for flow-based unsupervised visual odometry" *
YULIANG ZOU等: "Learning monocular visual odometry via self-supervised long-term modeling" *
ZIBIN GUO等: "LightVO: Lightweight Inertial-Assisted Monocular Visual Odometry with Dense Neural Networks" *
孔德磊等: "基于事件的视觉传感器及其应用综述" *
梁水波等: "注意力和多重特征融合的图像局部特征检测及描述" *

Also Published As

Publication number Publication date
CN113989318B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN112639502B (en) Robot pose estimation
CN110595466B (en) Lightweight inertial-assisted visual odometer implementation method based on deep learning
CN110660083A (en) Multi-target tracking method combined with video scene feature perception
CN112113566B (en) Inertial navigation data correction method based on neural network
CN115285143B (en) Automatic driving vehicle navigation method based on scene classification
US20230243658A1 (en) Systems, Methods and Devices for Map-Based Object's Localization Deep Learning and Object's Motion Trajectories on Geospatial Maps Using Neural Network
CN114719848B (en) Unmanned aerial vehicle height estimation method based on vision and inertial navigation information fusion neural network
CN114612556A (en) Training method of visual inertial odometer model, pose estimation method and pose estimation device
CN113739795A (en) Underwater synchronous positioning and mapping method based on polarized light/inertia/vision combined navigation
Xian et al. A bionic autonomous navigation system by using polarization navigation sensor and stereo camera
CN111739066B (en) Visual positioning method, system and storage medium based on Gaussian process
Azam et al. N 2 C: neural network controller design using behavioral cloning
CN116416277A (en) Multi-target tracking method and device based on motion equation track prediction
CN114067142A (en) Method for realizing scene structure prediction, target detection and lane level positioning
CN117685953A (en) UWB and vision fusion positioning method and system for multi-unmanned aerial vehicle co-positioning
CN114047766B (en) Mobile robot data acquisition system and method for long-term application of indoor and outdoor scenes
CN115147576A (en) Underwater robot docking monocular vision guiding method based on key characteristics
Xu et al. Vision-aided intelligent and adaptive vehicle pose estimation during GNSS outages
CN112945233B (en) Global drift-free autonomous robot simultaneous positioning and map construction method
Guo et al. Model-based deep learning for low-cost IMU dead reckoning of wheeled mobile robot
CN113989318B (en) Monocular vision odometer pose optimization and error correction method based on deep learning
Jo et al. Mixture density-PoseNet and its application to monocular camera-based global localization
CN117830879B (en) Indoor-oriented distributed unmanned aerial vehicle cluster positioning and mapping method
He et al. NINT: Neural Inertial Navigation Based on Time Interval Information in Underwater Environments
CN114894191B (en) Unmanned aerial vehicle navigation method suitable for dynamic complex environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant