CN113989318A - Monocular vision odometer pose optimization and error correction method based on deep learning - Google Patents
Monocular vision odometer pose optimization and error correction method based on deep learning Download PDFInfo
- Publication number
- CN113989318A CN113989318A CN202111221271.6A CN202111221271A CN113989318A CN 113989318 A CN113989318 A CN 113989318A CN 202111221271 A CN202111221271 A CN 202111221271A CN 113989318 A CN113989318 A CN 113989318A
- Authority
- CN
- China
- Prior art keywords
- motion
- time
- data
- pose
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/215—Motion-based segmentation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C25/00—Manufacturing, calibrating, cleaning, or repairing instruments or devices referred to in the other groups of this subclass
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Manufacturing & Machinery (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a monocular vision odometer pose optimization and error correction method based on deep learning, which comprises the steps of obtaining image data and calculating a corresponding optical flow image sequence; segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder; inputting the high-dimensional motion characteristics into an artificial neural network; performing motion similarity modeling on the time sequence relation of the motion and local context information of the motion by using a pose transformation similarity calculation module, and guiding and optimizing pose characteristics by using an attention mechanism to obtain motion characteristics purified by motion similarity; and inputting the motion characteristics after the motion similarity is purified into a pose correction prediction network to realize pose optimization and error correction. The invention fully excavates and models the time sequence relation and the similarity of continuous motion in the image motion data, and improves the robustness.
Description
Technical Field
The invention relates to the field of computer vision, in particular to a monocular vision odometer pose optimization and error correction method based on deep learning.
Background
In recent years, with the rapid development of the related applications of the internet of things, the demand of Location Based Services (LBS) is driven to rise, which makes the demand for high-precision real-time positioning schemes increasingly urgent. A stable, accurate and real-time positioning system is an important guarantee for realizing application of the Internet of things such as robot control, unmanned driving, Virtual Reality (VR), commodity retail and the like.
Although positioning by a Global Navigation Satellite System (GNSS) such as a Global Positioning System (GPS), a beidou satellite navigation system (BDS), a Galileo satellite positioning system (Galileo), and a GLONASS (GLONASS) positioning system has been very popular at present, positioning by satellites may be inaccurate for some outdoor environments with severe shielding (such as tunnels, forests, etc.) or indoor scenes affected by shielding and interference of building structures to satellite signals. A Visual Odometer (VO) using a visual sensor is an effective way to solve the above problems, and has many advantages of abundant visual input information, wide applicable scenes, low cost, and the like, and is a common means for implementing positioning applications.
However, the monocular visual odometer mainly predicts the inter-frame pose transformation of the camera carrier by using image input at adjacent acquisition moments, and then accumulates to obtain the overall motion trajectory, and accumulated errors are generated to cause the trajectory estimation to diverge with the increase of the motion distance. Therefore, the key to realizing the high-precision monocular vision odometer positioning system is to effectively eliminate the accumulated error of the pose prediction of the vision odometer. At the present stage, common methods for relieving the accumulated error of the monocular vision odometer and improving the pose prediction precision include: 1) and constructing a pose graph of monocular camera carrier motion and loop detection, and performing rear-end optimization on the predicted pose. For example, the ORB-SLAM positioning system locally and globally optimizes the predicted trajectory of the positioning system based on the principle of co-visualization of landmarks. 2) And correcting the visual odometer positioning system by using other kinds of information through a data fusion method. For example, a visual-inertial odometer (VIO) is a high-precision positioning system that eliminates data drift of a visual measurement unit by combining inertial navigation information. 3) And performing motion correlation modeling on the image sequence data in a time dimension to optimize pose prediction. For example, the deep learning monocular visual odometer SRNN model is a system scheme for guiding and optimizing the predicted pose by constructing the correlation of pose transformation at adjacent moments. However, the first solution for eliminating the cumulative error of the visual odometer prediction has certain limitations, which are mainly reflected in high dependency on environmental scenes and weak universality. For example, for a real motion scene which does not occur, environmental landmarks cannot be constructed in advance, and the possibility that the motion trajectory cannot be closed is existed, so that the pose graph optimization and loop detection module is likely to fail. In addition, the second scheme for eliminating the prediction accumulated error of the monocular vision odometer also has certain limitation, which is mainly shown in that when the quality of the measured data of other types of sensors is poor, the prediction precision of the original monocular vision odometer can be obviously influenced, and meanwhile, the data fusion algorithm can also have great influence on the final prediction effect.
Disclosure of Invention
Aiming at the defects in the prior art, the method for optimizing the pose and correcting the error of the monocular vision odometer based on the deep learning solves the problem that the pose optimization and the error correction in the prior art are poor in accuracy and robustness.
In order to achieve the purpose of the invention, the invention adopts the technical scheme that:
the method for optimizing the pose and correcting the error of the monocular vision odometer based on the deep learning comprises the following steps:
s1, acquiring image data and calculating a corresponding optical flow image sequence;
s2, segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder;
s3, inputting the high-dimensional motion characteristics into an artificial neural network to obtain the time sequence relation of motion and local context information of the motion;
s4, inputting the result of the step S3 into a pose transformation similarity calculation module for motion similarity modeling to obtain motion correlation characteristics of motion time sequence relation and motion correlation characteristics of motion local context information; optimizing the pose characteristics by utilizing an attention mechanism based on the motion correlation characteristics to obtain motion characteristics after motion similarity purification;
and S5, inputting the motion characteristics purified through the motion similarity into a pose correction prediction network for pose optimization and error correction.
Further, the specific method of step S1 is:
s1-1, setting the sampling frequency of the monocular vision sensor, and sampling to obtain a three-channel color RGB image sequence;
s1-2, Flo according to formulat=F(It-1,It) Calculating an optical flow image sequence of the three-channel color RGB image sequence; wherein FlotIs an optical flow image at the time t, F (-) is an optical flow calculation formula, It-1Three-channel color RGB image at time t-1, and ItThree-channel color RGB image at time t.
Further: the sampling frequency of the monocular vision sensor is set to be 20 Hz; the data dimension of the three-channel color RGB image is (1226,370,3), and the data dimension of the optical flow image is (1226,370, 2); and correspondingly calculating each two three-channel color RGB image frames to obtain a corresponding optical flow image frame.
Further, the specific method for obtaining the plurality of segmented input sequence data in step S2 is as follows:
segmenting the optical flow image sequence by utilizing a sliding window with the length of 9 and the step length of 9 to obtain input sequence data with the length of 9; wherein each input sequence data is four-dimensional tensor data with the dimension of (9,1226,370,2), and comprises data of the optical flow image in three dimensions under the length of a sliding window.
Further, the specific method in step S3 is:
inputting the high-dimensional motion characteristics into an artificial neural network comprising two layers of long-time and short-time memory networks connected in series, and according to a formula:
it=σ(ωixxt+ωihht-1+bi)
gt=tanh(ωgxxt+ωghht-1+bg)
ft=σ(ωfxxt+ωfhht-1+bf)
ct=ft⊙ct-1+it⊙gt
ot=σ(ωoxxt+ωohht-1+bo)
ht=ot⊙tanh(ct-1)
obtaining local context information h of motiontI.e. hidden unit state at time t, and temporal relation of motion otThe output of the network at the time t is memorized in long and short time; wherein itFor memorizing the state of an input gate at the moment t of the network in long and short time, sigma (-) is a sigmoid activation function, omegaixAs weights of the input data, xtIs the input state at time t, ωihFor input data corresponding to the weight of the hidden unit, ht-1Hidden unit state at time t-1, biFor the corresponding offset of the input data, gtFor the candidate information of the input data at time t, tanh (-) is the activation function, ωgxAs weights, omega, of the input data candidate informationghFor input data candidates corresponding to the weight of the hidden unit, bgFor the corresponding offset of the input data candidate information, ftForgetting the door state at time t, ωfxWeight to forget gate state, ωfhWeight of hidden unit corresponding to forgetting door state, bfFor forgetting the corresponding offset of the door state, ctNeuron state at time t, ct-1Neuron state at time t-1, ωoxIs the weight of the output gate state, ωohFor output gate states corresponding to the weight of the hidden unit, boAn offset corresponding to the output gate state; a Hadamard product of a vector;
the output of the last layer of long and short time memory network is the time sequence relation of motion, the dimension is (1,1024), the hidden unit states of the two layers of long and short time memory networks store the local context information of the motion, and the dimension is (2,1024).
Further, the specific method for obtaining the motion characteristics of the motion similarity purification in step S4 is as follows:
according to the formula:
X″t=f1×1([X′t,H′t])
obtaining the optimized pose characteristic X' output at the time ttMotion characteristics refined by motion similarity; wherein X'tIn order to obtain the motion characteristics of the purified t moment under the guidance of attention mechanism based on motion similarity, exp (-) is a logarithmic function taking a natural logarithm as a base, S (-) is a cosine similarity function, and X (-) ist-1Motion features, X, extracted for the artificial neural network at time t-1tThe motion feature extracted from the artificial neural network at the moment t, namely the motion correlation feature of local context information of the motion, W is the vector dimension of the motion feature, H'tFor the refined motion local context information at time t under the guidance of attention mechanism based on motion similarity, HnThe local context information of the motion stored in the hidden unit state of the last layer of the long-time memory network of the artificial neural network, namely the motion correlation characteristic of the motion time sequence relation, f1×1(. h) is a convolution layer with convolution kernel size of 1 × 1, [ X't,H′t]The method is a splicing process of the purified motion characteristics and the purified motion local context information.
Further, the pose correction prediction network in step S5 includes a first long-short time memory network, a second long-short time memory network, a first fully-connected layer, and a second fully-connected layer, which are connected in sequence; the output dimensionalities of the two long and short time memory networks are 1024; the number of neurons of the first fully-connected layer is 128, and the activation function is included; the number of neurons in the second fully-connected layer was 6, with no activation function.
The invention has the beneficial effects that:
1. based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, a time sequence relation of continuous motion and local context information of the motion are mined by designing a pose optimization and error correction method based on a deep learning method, similarity information of the continuous motion in sensor data is modeled, high-dimensional motion characteristics are guided and optimized through an attention mechanism, and finally pose change prediction of a camera carrier between adjacent camera sampling points is obtained through fitting, so that the goals of pose optimization and error correction are achieved, and the accuracy of a system is ensured.
2. Under the condition that the camera parameters of a visual sensor, the scene point landmarks and the depth information of the scene points are not needed, the time sequence relation of continuous motion is extracted through an artificial neural network based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, the characteristic data is optimized under the guidance of an attention mechanism by utilizing motion similarity characteristics, and finally, the optimization prediction and error correction of the pose can be accurately and robustly obtained, namely, the robustness of the system is ensured, and the absolute scale of the motion trail is completely and autonomously recovered under the condition of only depending on monocular image data.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.
As shown in fig. 1, the method for optimizing the pose and correcting the error of the monocular vision odometer based on the deep learning comprises the following steps:
s1, acquiring image data and calculating a corresponding optical flow image sequence;
s2, segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder;
s3, inputting the high-dimensional motion characteristics into an artificial neural network to obtain the time sequence relation of motion and local context information of the motion;
s4, inputting the result of the step S3 into a pose transformation similarity calculation module for motion similarity modeling to obtain motion correlation characteristics of motion time sequence relation and motion correlation characteristics of motion local context information; optimizing the pose characteristics by utilizing an attention mechanism based on the motion correlation characteristics to obtain motion characteristics after motion similarity purification;
and S5, inputting the motion characteristics purified through the motion similarity into a pose correction prediction network for pose optimization and error correction.
The specific method of step S1 is:
s1-1, setting the sampling frequency of the monocular vision sensor, and sampling to obtain a three-channel color RGB image sequence;
s1-2, Flo according to formulat=F(It-1,It) Calculating an optical flow image sequence of the three-channel color RGB image sequence; wherein FlotIs an optical flow image at the time t, F (-) is an optical flow calculation formula, It-1Three-channel color RGB image at time t-1, and ItThree-channel color RGB image at time t.
The sampling frequency of the monocular vision sensor is set to be 20 Hz; the data dimension of the three-channel color RGB image is (1226,370,3), and the data dimension of the optical flow image is (1226,370, 2); and correspondingly calculating each two three-channel color RGB image frames to obtain a corresponding optical flow image frame.
The specific method for obtaining the plurality of segmented input sequence data in step S2 is as follows:
segmenting the optical flow image sequence by utilizing a sliding window with the length of 9 and the step length of 9 to obtain input sequence data with the length of 9; wherein each input sequence data is four-dimensional tensor data with the dimension of (9,1226,370,2), and comprises data of the optical flow image in three dimensions under the length of a sliding window.
The specific method in step S3 is:
inputting the high-dimensional motion characteristics into an artificial neural network comprising two layers of long-time and short-time memory networks connected in series, and according to a formula:
it=σ(ωixxt+ωihht-1+bi)
gt=tanh(ωgxxt+ωghht-1+bg)
ft=σ(ωfxxt+ωfhht-1+bf)
ct=ft⊙ct-1+it⊙gt
ot=σ(ωoxxt+ωohht-1+bo)
ht=ot⊙tanh(ct-1)
obtaining local context information h of motiontI.e. hidden unit state at time t, and temporal relation of motion otThe output of the network at the time t is memorized in long and short time; wherein itFor memorizing the state of an input gate at the moment t of the network in long and short time, sigma (-) is a sigmoid activation function, omegaixAs weights of the input data, xtIs the input state at time t, ωihFor input data corresponding to the weight of the hidden unit, ht-1Hidden unit state at time t-1, biFor the corresponding offset of the input data, gtFor the candidate information of the input data at time t, tanh (-) is the activation function, ωgxFor inputting data candidatesWeight, ωghFor input data candidates corresponding to the weight of the hidden unit, bgFor the corresponding offset of the input data candidate information, ftForgetting the door state at time t, ωfxWeight to forget gate state, ωfhWeight of hidden unit corresponding to forgetting door state, bfFor forgetting the corresponding offset of the door state, ctNeuron state at time t, ct-1Neuron state at time t-1, ωoxIs the weight of the output gate state, ωohFor output gate states corresponding to the weight of the hidden unit, boAn offset corresponding to the output gate state; a Hadamard product of a vector;
the output of the last layer of long and short time memory network is the time sequence relation of motion, the dimension is (1,1024), the hidden unit states of the two layers of long and short time memory networks store the local context information of the motion, and the dimension is (2,1024).
The specific method for obtaining the motion characteristics of the motion similarity purification in the step S4 is as follows:
according to the formula:
X″t=f1×1([X′t,H′t])
obtaining the optimized pose characteristic X' output at the time ttMotion characteristics refined by motion similarity; wherein X'tIn order to obtain the motion characteristics of the purified t moment under the guidance of attention mechanism based on motion similarity, exp (-) is a logarithmic function taking a natural logarithm as a base, S (-) is a cosine similarity function, and X (-) ist-1Motion features, X, extracted for the artificial neural network at time t-1tThe motion characteristics extracted for the artificial neural network at the time t, namely the motion correlation characteristics of the local context information of the motion, and W isVector dimension of motion feature, H'tFor the refined motion local context information at time t under the guidance of attention mechanism based on motion similarity, HnThe local context information of the motion stored in the hidden unit state of the last layer of the long-time memory network of the artificial neural network, namely the motion correlation characteristic of the motion time sequence relation, f1×1(. h) is a convolution layer with convolution kernel size of 1 × 1, [ X't,H′t]The method is a splicing process of the purified motion characteristics and the purified motion local context information.
The pose correction prediction network in the step S5 includes a first long-short term memory network, a second long-short term memory network, a first fully-connected layer and a second fully-connected layer, which are connected in sequence; the output dimensionalities of the two long and short time memory networks are 1024; the number of neurons of the first fully-connected layer is 128, and the activation function is included; the number of neurons in the second fully-connected layer was 6, with no activation function.
The output of the first full-link layer of the pose correction prediction network is F1:
Where Relu (. cndot.) is the activation function for the non-linear mapping, x1×iFor a 1 x i dimension input data matrix,is a weight matrix to be trained with dimensions of j × i fully connected layers, b1×jIs the offset matrix for the fully connected layer with dimension 1 x j, and T is the transpose of the matrix.
The method is based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, a time sequence relation of continuous motion and local context information of the motion are mined by designing a pose optimization and error correction method based on a deep learning method, similarity information of the continuous motion in sensor data is modeled, high-dimensional motion characteristics are guided and optimized through an attention mechanism, and finally pose change prediction of a camera carrier between adjacent camera sampling points is obtained through fitting, so that the goals of pose optimization and error correction are achieved, and the accuracy of the system is ensured.
Under the condition that the camera parameters of a visual sensor, the scene point landmarks and the depth information of the scene points are not needed, the time sequence relation of continuous motion is extracted through an artificial neural network based on high-dimensional motion characteristics calculated by an encoder through optical flow data corresponding to an image sequence, the characteristic data is optimized under the guidance of an attention mechanism by utilizing motion similarity characteristics, and finally, the optimization prediction and error correction of the pose can be accurately and robustly obtained, namely, the robustness of the system is ensured, and the absolute scale of the motion trail is completely and autonomously recovered under the condition of only depending on monocular image data.
Claims (7)
1. A monocular vision odometer pose optimization and error correction method based on deep learning is characterized by comprising the following steps:
s1, acquiring image data and calculating a corresponding optical flow image sequence;
s2, segmenting the optical flow picture sequence by adopting a fixed step length sliding window to obtain a plurality of segmented input sequence data, and obtaining high-dimensional motion characteristics of each input sequence data by utilizing an encoder;
s3, inputting the high-dimensional motion characteristics into an artificial neural network to obtain the time sequence relation of motion and local context information of the motion;
s4, inputting the result of the step S3 into a pose transformation similarity calculation module for motion similarity modeling to obtain motion correlation characteristics of motion time sequence relation and motion correlation characteristics of motion local context information; optimizing the pose characteristics by utilizing an attention mechanism based on the motion correlation characteristics to obtain motion characteristics after motion similarity purification;
and S5, inputting the motion characteristics purified through the motion similarity into a pose correction prediction network for pose optimization and error correction.
2. The deep learning-based monocular vision odometer pose optimization and error correction method according to claim 1, wherein the specific method of step S1 is:
s1-1, setting the sampling frequency of the monocular vision sensor, and sampling to obtain a three-channel color RGB image sequence;
s1-2, Flo according to formulat=F(It-1,It) Calculating an optical flow image sequence of the three-channel color RGB image sequence; wherein FlotIs an optical flow image at the time t, F (-) is an optical flow calculation formula, It-1Three-channel color RGB image at time t-1, and ItThree-channel color RGB image at time t.
3. The deep learning-based monocular vision odometer pose optimization and error correction method of claim 2, characterized in that: the sampling frequency of the monocular vision sensor is set to be 20 Hz; the data dimension of the three-channel color RGB image is (1226,370,3), and the data dimension of the optical flow image is (1226,370, 2); and correspondingly calculating each two three-channel color RGB image frames to obtain a corresponding optical flow image frame.
4. The deep learning-based monocular vision odometer pose optimization and error correction method according to claim 1, wherein the specific method for obtaining the plurality of segmented input sequence data in step S2 is as follows:
segmenting the optical flow image sequence by utilizing a sliding window with the length of 9 and the step length of 9 to obtain input sequence data with the length of 9; wherein each input sequence data is four-dimensional tensor data with the dimension of (9,1226,370,2), and comprises data of the optical flow image in three dimensions under the length of a sliding window.
5. The deep learning-based monocular vision odometer pose optimization and error correction method according to claim 1, wherein the specific method in step S3 is:
inputting the high-dimensional motion characteristics into an artificial neural network comprising two layers of long-time and short-time memory networks connected in series, and according to a formula:
it=σ(ωixxt+ωihht-1+bi)
gt=tanh(ωgxxt+ωghht-1+bg)
ft=σ(ωfxxt+ωfhht-1+bf)
ct=ft⊙ct-1+it⊙gt
ot=σ(ωoxxt+ωohht-1+bo)
ht=ot⊙tanh(ct-1)
obtaining local context information h of motiontI.e. hidden unit state at time t, and temporal relation of motion otThe output of the network at the time t is memorized in long and short time; wherein itFor memorizing the state of an input gate at the moment t of the network in long and short time, sigma (-) is a sigmoid activation function, omegaixAs weights of the input data, xtIs the input state at time t, ωihFor input data corresponding to the weight of the hidden unit, ht-1Hidden unit state at time t-1, biFor the corresponding offset of the input data, gtFor the candidate information of the input data at time t, tanh (-) is the activation function, ωgxAs weights, omega, of the input data candidate informationghFor input data candidates corresponding to the weight of the hidden unit, bgFor the corresponding offset of the input data candidate information, ftForgetting the door state at time t, ωfxWeight to forget gate state, ωfhWeight of hidden unit corresponding to forgetting door state, bfFor forgetting the corresponding offset of the door state, ctNeuron state at time t, ct-1Neuron state at time t-1, ωoxIs the weight of the output gate state, ωohFor output gate states corresponding to the weight of the hidden unit, boAn offset corresponding to the output gate state; a Hadamard product of a vector;
the output of the last layer of long and short time memory network is the time sequence relation of motion, the dimension is (1,1024), the hidden unit states of the two layers of long and short time memory networks store the local context information of the motion, and the dimension is (2,1024).
6. The deep learning-based monocular vision odometer pose optimization and error correction method of claim 1, wherein the specific method for obtaining the motion features with refined motion similarity in step S4 is as follows:
according to the formula:
Xt″=f1×1([Xt′,Ht′])
obtaining optimized pose feature X output at time tt", i.e., the motion characteristics refined by motion similarity; wherein Xt' is the motion characteristic of t moment after purification based on motion similarity under the guidance of attention mechanism, exp (-) is a logarithmic function taking natural logarithm as the base, S (-) is a cosine similarity function, Xt-1Motion features, X, extracted for the artificial neural network at time t-1tThe motion characteristics extracted for the artificial neural network at the time t, namely the motion correlation characteristics of the local context information of the motion, W is the vector dimension of the motion characteristics, Ht' is the refined motion local context information at the t moment under the guidance of attention mechanism based on motion similarity, HnThe local context information of the motion stored in the hidden unit state of the last layer of the long-time memory network of the artificial neural network, namely the motion correlation characteristic of the motion time sequence relation, f1×1(. h) is a convolutional layer with a convolutional kernel size of 1 × 1, [ X ]t′,Ht′]For the purified motion characteristics andand (5) splicing the purified motion local context information.
7. The deep learning-based monocular visual odometry pose optimization and error correction method of claim 1, wherein the pose correction prediction network in step S5 comprises a first long-short term memory network, a second long-short term memory network, a first fully-connected layer and a second fully-connected layer which are connected in sequence; the output dimensionalities of the two long and short time memory networks are 1024; the number of neurons of the first fully-connected layer is 128, and the activation function is included; the number of neurons in the second fully-connected layer was 6, with no activation function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111221271.6A CN113989318B (en) | 2021-10-20 | 2021-10-20 | Monocular vision odometer pose optimization and error correction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111221271.6A CN113989318B (en) | 2021-10-20 | 2021-10-20 | Monocular vision odometer pose optimization and error correction method based on deep learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113989318A true CN113989318A (en) | 2022-01-28 |
CN113989318B CN113989318B (en) | 2023-04-07 |
Family
ID=79739627
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111221271.6A Active CN113989318B (en) | 2021-10-20 | 2021-10-20 | Monocular vision odometer pose optimization and error correction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113989318B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106485750A (en) * | 2016-09-13 | 2017-03-08 | 电子科技大学 | A kind of estimation method of human posture based on supervision Local Subspace |
CN107403430A (en) * | 2017-06-15 | 2017-11-28 | 中山大学 | A kind of RGBD image, semantics dividing method |
CN108830220A (en) * | 2018-06-15 | 2018-11-16 | 山东大学 | The building of vision semantic base and global localization method based on deep learning |
CN111080699A (en) * | 2019-12-11 | 2020-04-28 | 中国科学院自动化研究所 | Monocular vision odometer method and system based on deep learning |
CN111127557A (en) * | 2019-12-13 | 2020-05-08 | 中国电子科技集团公司第二十研究所 | Visual SLAM front-end attitude estimation method based on deep learning |
CN111623797A (en) * | 2020-06-10 | 2020-09-04 | 电子科技大学 | Step number measuring method based on deep learning |
CN112115786A (en) * | 2020-08-13 | 2020-12-22 | 北京工商大学 | Monocular vision odometer method based on attention U-net |
CN112233179A (en) * | 2020-10-20 | 2021-01-15 | 湘潭大学 | Visual odometer measuring method |
US10911775B1 (en) * | 2020-03-11 | 2021-02-02 | Fuji Xerox Co., Ltd. | System and method for vision-based joint action and pose motion forecasting |
US20210042937A1 (en) * | 2019-08-08 | 2021-02-11 | Nec Laboratories America, Inc. | Self-supervised visual odometry framework using long-term modeling and incremental learning |
CN112634438A (en) * | 2020-12-24 | 2021-04-09 | 北京工业大学 | Single-frame depth image three-dimensional model reconstruction method and device based on countermeasure network |
CN112991447A (en) * | 2021-03-16 | 2021-06-18 | 华东理工大学 | Visual positioning and static map construction method and system in dynamic environment |
CN113065546A (en) * | 2021-02-25 | 2021-07-02 | 湖南大学 | Target pose estimation method and system based on attention mechanism and Hough voting |
CN113159043A (en) * | 2021-04-01 | 2021-07-23 | 北京大学 | Feature point matching method and system based on semantic information |
CN113221647A (en) * | 2021-04-08 | 2021-08-06 | 湖南大学 | 6D pose estimation method fusing point cloud local features |
-
2021
- 2021-10-20 CN CN202111221271.6A patent/CN113989318B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106485750A (en) * | 2016-09-13 | 2017-03-08 | 电子科技大学 | A kind of estimation method of human posture based on supervision Local Subspace |
CN107403430A (en) * | 2017-06-15 | 2017-11-28 | 中山大学 | A kind of RGBD image, semantics dividing method |
CN108830220A (en) * | 2018-06-15 | 2018-11-16 | 山东大学 | The building of vision semantic base and global localization method based on deep learning |
US20210042937A1 (en) * | 2019-08-08 | 2021-02-11 | Nec Laboratories America, Inc. | Self-supervised visual odometry framework using long-term modeling and incremental learning |
CN111080699A (en) * | 2019-12-11 | 2020-04-28 | 中国科学院自动化研究所 | Monocular vision odometer method and system based on deep learning |
CN111127557A (en) * | 2019-12-13 | 2020-05-08 | 中国电子科技集团公司第二十研究所 | Visual SLAM front-end attitude estimation method based on deep learning |
US10911775B1 (en) * | 2020-03-11 | 2021-02-02 | Fuji Xerox Co., Ltd. | System and method for vision-based joint action and pose motion forecasting |
CN111623797A (en) * | 2020-06-10 | 2020-09-04 | 电子科技大学 | Step number measuring method based on deep learning |
CN112115786A (en) * | 2020-08-13 | 2020-12-22 | 北京工商大学 | Monocular vision odometer method based on attention U-net |
CN112233179A (en) * | 2020-10-20 | 2021-01-15 | 湘潭大学 | Visual odometer measuring method |
CN112634438A (en) * | 2020-12-24 | 2021-04-09 | 北京工业大学 | Single-frame depth image three-dimensional model reconstruction method and device based on countermeasure network |
CN113065546A (en) * | 2021-02-25 | 2021-07-02 | 湖南大学 | Target pose estimation method and system based on attention mechanism and Hough voting |
CN112991447A (en) * | 2021-03-16 | 2021-06-18 | 华东理工大学 | Visual positioning and static map construction method and system in dynamic environment |
CN113159043A (en) * | 2021-04-01 | 2021-07-23 | 北京大学 | Feature point matching method and system based on semantic information |
CN113221647A (en) * | 2021-04-08 | 2021-08-06 | 湖南大学 | 6D pose estimation method fusing point cloud local features |
Non-Patent Citations (6)
Title |
---|
RAN ZHU等: "DeepAVO: Efficient pose refining with feature distilling for deep Visual Odometry" * |
XIANGYU LI等: "Transformer guided geometry model for flow-based unsupervised visual odometry" * |
YULIANG ZOU等: "Learning monocular visual odometry via self-supervised long-term modeling" * |
ZIBIN GUO等: "LightVO: Lightweight Inertial-Assisted Monocular Visual Odometry with Dense Neural Networks" * |
孔德磊等: "基于事件的视觉传感器及其应用综述" * |
梁水波等: "注意力和多重特征融合的图像局部特征检测及描述" * |
Also Published As
Publication number | Publication date |
---|---|
CN113989318B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112639502B (en) | Robot pose estimation | |
CN110595466B (en) | Lightweight inertial-assisted visual odometer implementation method based on deep learning | |
CN110660083A (en) | Multi-target tracking method combined with video scene feature perception | |
CN112113566B (en) | Inertial navigation data correction method based on neural network | |
CN115285143B (en) | Automatic driving vehicle navigation method based on scene classification | |
US20230243658A1 (en) | Systems, Methods and Devices for Map-Based Object's Localization Deep Learning and Object's Motion Trajectories on Geospatial Maps Using Neural Network | |
CN114719848B (en) | Unmanned aerial vehicle height estimation method based on vision and inertial navigation information fusion neural network | |
CN114612556A (en) | Training method of visual inertial odometer model, pose estimation method and pose estimation device | |
CN113739795A (en) | Underwater synchronous positioning and mapping method based on polarized light/inertia/vision combined navigation | |
Xian et al. | A bionic autonomous navigation system by using polarization navigation sensor and stereo camera | |
CN111739066B (en) | Visual positioning method, system and storage medium based on Gaussian process | |
Azam et al. | N 2 C: neural network controller design using behavioral cloning | |
CN116416277A (en) | Multi-target tracking method and device based on motion equation track prediction | |
CN114067142A (en) | Method for realizing scene structure prediction, target detection and lane level positioning | |
CN117685953A (en) | UWB and vision fusion positioning method and system for multi-unmanned aerial vehicle co-positioning | |
CN114047766B (en) | Mobile robot data acquisition system and method for long-term application of indoor and outdoor scenes | |
CN115147576A (en) | Underwater robot docking monocular vision guiding method based on key characteristics | |
Xu et al. | Vision-aided intelligent and adaptive vehicle pose estimation during GNSS outages | |
CN112945233B (en) | Global drift-free autonomous robot simultaneous positioning and map construction method | |
Guo et al. | Model-based deep learning for low-cost IMU dead reckoning of wheeled mobile robot | |
CN113989318B (en) | Monocular vision odometer pose optimization and error correction method based on deep learning | |
Jo et al. | Mixture density-PoseNet and its application to monocular camera-based global localization | |
CN117830879B (en) | Indoor-oriented distributed unmanned aerial vehicle cluster positioning and mapping method | |
He et al. | NINT: Neural Inertial Navigation Based on Time Interval Information in Underwater Environments | |
CN114894191B (en) | Unmanned aerial vehicle navigation method suitable for dynamic complex environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |