CN112344922A - Monocular vision odometer positioning method and system - Google Patents

Monocular vision odometer positioning method and system Download PDF

Info

Publication number
CN112344922A
CN112344922A CN202011153385.7A CN202011153385A CN112344922A CN 112344922 A CN112344922 A CN 112344922A CN 202011153385 A CN202011153385 A CN 202011153385A CN 112344922 A CN112344922 A CN 112344922A
Authority
CN
China
Prior art keywords
loss value
video sequence
image
monocular
odometer positioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011153385.7A
Other languages
Chinese (zh)
Other versions
CN112344922B (en
Inventor
高伟
万一鸣
吴毅红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202011153385.7A priority Critical patent/CN112344922B/en
Publication of CN112344922A publication Critical patent/CN112344922A/en
Application granted granted Critical
Publication of CN112344922B publication Critical patent/CN112344922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/005Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 with correlation of navigation data from several sources, e.g. map or contour matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The invention relates to a monocular vision odometer positioning method and a monocular vision odometer positioning system, wherein the monocular vision odometer positioning method comprises the following steps: acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images; establishing a monocular vision odometer positioning model according to each video sequence; the method specifically comprises the following steps: stacking each adjacent frame image to obtain a corresponding stacked image; extracting high-dimensional features from each stacked image through a FlowNet encoder; sequentially extracting local information and global information from the high-dimensional features through an LCGR module; obtaining a relative pose through full-connection regression processing according to the local information and the global information; based on the monocular vision odometer positioning model, the relative pose can be accurately determined according to the video sequence to be detected, and the positioning precision is improved.

Description

Monocular vision odometer positioning method and system
Technical Field
The invention relates to the technical field of computer vision, in particular to a monocular vision odometer positioning method and a monocular vision odometer positioning system based on local global information fusion and dynamic object perception of SLAM (Simultaneous localization and mapping).
Background
The visual odometer is an important link in mobile robots, autonomous navigation and augmented reality. The visual odometer may be classified into a Monocular visual odometer (Monocular VO) and a binocular visual odometer (Stereo VO) according to the number of cameras used. Monocular VOs are generally more challenging than binocular VOs, but are widely studied because they require only one camera, are lighter and cheaper. Classical visual mileage calculation methods include camera rectification, feature detection, feature matching, outlier rejection, motion estimation, scale estimation, and back-end optimization. The algorithm can achieve a good effect under most conditions, but still fails in the case of scenes such as occlusion, large illumination change, no texture and the like.
In recent years, deep learning techniques have been successfully applied to face recognition, target tracking, speech recognition, machine translation, and the like. Deep learning methods represented by convolutional neural networks play a very important role in the field of computer vision, and the deep networks have a remarkable effect in the aspects of extracting picture features, finding out potential rules and the like compared with the traditional method, so that a plurality of students consider applying deep learning to the fields of pose estimation and the like, directly learn the geometric relationship between pictures by the deep networks, and realize end-to-end pose estimation. The end-to-end mode completely abandons the steps of feature extraction, feature matching, camera calibration, image optimization and the like in the traditional method, and the camera posture is directly obtained according to the input image. Although the convolutional network can deal with some extreme situations, the overall accuracy is lower than that of the traditional method, and in addition, the generalization capability of the network is also an important reason influencing the practical application of the deep network. In addition, most deep learning methods do not consider the influence of dynamic objects in the scene, so that the positioning accuracy is also low.
Disclosure of Invention
In order to solve the above problems in the prior art, i.e. to improve the positioning accuracy, the present invention aims to provide a monocular vision odometer positioning method and system.
In order to solve the technical problems, the invention provides the following scheme:
a monocular visual odometer positioning method, the monocular visual odometer positioning method comprising:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
Optionally, the extracting, by the LCGR module, local information and global information from the high-dimensional features sequentially includes:
performing convolution operation on each high-dimensional feature and K groups of 3D convolution kernels to obtain local information with different lengths, wherein the size of the kth group of convolution kernels is K multiplied by 3, and
Figure BDA0002741865860000031
based on Bi-ConvLSTM, global information of the video sequence is extracted from each high-dimensional feature.
Optionally, the relative pose comprises a displacement and a pose;
the establishing of the monocular visual odometer positioning model according to each video sequence further comprises:
calculating the displacement loss value L according to the following formulatransAnd attitude loss value Lrot
Figure BDA0002741865860000032
Figure BDA0002741865860000033
Wherein the content of the first and second substances,
Figure BDA0002741865860000034
in order for the displacement to be predicted,
Figure BDA0002741865860000035
for the predicted angle, pt,φtFor the true value, T denotes the image number, T is 1,2, …, T denotes the number of images;
Figure BDA0002741865860000036
represents a two-norm;
according to the displacement loss value LtransAnd attitude loss value LrotDetermining a total loss value;
and adjusting the monocular vision odometer positioning model according to the total loss value.
Optionally, the total loss value further comprises an optical flow loss value, a constraint loss value, and a pole pair loss value;
the establishing of the monocular visual odometer positioning model according to each video sequence further comprises:
determining an optical flow loss value L according to each adjacent frame image and the optical flow output by the FlowNet encoder through an optical flow and mask estimation moduleptotometricConstraint loss value LregAnd a pole pair loss value Le
According to the displacement loss value LtransAttitude loss value LrotOptical flow loss value LptotometricConstraint loss value LregAnd a pole pair loss value LeThe total loss value is determined.
Optionally, the optical flow loss value L is determined according to the following formulaptotometricConstraint loss value LregPole pair loss value LeAnd total loss value Ltotal
Figure BDA0002741865860000041
Figure BDA0002741865860000042
Figure BDA0002741865860000043
Le=|qTA-T[r]×RK-1p|;
Ltotal=Ltrans+100Lrot+Lptotometric+Le+Lreg
Wherein (I, j) represents the pixel coordinate position, ItRepresents the image of the t-th frame, I' (I, j, t +1) represents the images I (I, j, t) and I (I, j, t +1) of two consecutive frames and the FlowNet encoder outputs the image of optical flow synthesis; c (i, j) is the mask value for the (i, j) location, indicating the confidence that the pixel can be successfully synthesized; two continuous frames of images I (I, j, t) and I (I, j, t +1), the pixel correspondence between the source image and the target image is provided through the estimated optical flow, q is the pixel position in the target image, p represents the corresponding pixel position in the source image, A is the camera internal parameter, and R and R are the relative poses between the source image and the target image.
Optionally, the monocular visual odometer positioning method further comprises:
the size of each image is adjusted to a uniform size.
In order to solve the technical problems, the invention also provides the following scheme:
a monocular visual odometer positioning system, the monocular visual odometer positioning system comprising:
an obtaining unit, configured to obtain a training data set, where the training data set includes a plurality of video sequences, and each video sequence includes multiple frames of continuous images;
the modeling unit is used for establishing a monocular vision odometer positioning model according to each video sequence;
wherein the modeling unit includes:
the stacking module is used for stacking each adjacent frame image to obtain a corresponding stacked image;
the characteristic extraction module is used for extracting high-dimensional characteristics from each stacked image through a FlowNet encoder;
the information extraction module is used for sequentially extracting local information and global information from the high-dimensional features through the LCGR module;
the pose determining module is used for obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and the positioning unit is used for obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
Optionally, the modeling unit comprises:
and the preprocessing module is respectively connected with the acquisition unit and the stacking module, and is used for adjusting the size of each image to be uniform and sending the image to the stacking module.
In order to solve the technical problems, the invention also provides the following scheme:
a monocular visual odometer positioning system comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
In order to solve the technical problems, the invention also provides the following scheme:
a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
According to the embodiment of the invention, the invention discloses the following technical effects:
according to the invention, high-dimensional features are extracted from each stacked image through the FlowNet encoder, local information and global information are sequentially extracted from the high-dimensional features through the LCGR module, then the relative pose is obtained through full-connection regression processing, a monocular vision odometer positioning model is established, a video sequence to be measured is positioned through the monocular vision odometer positioning model, the relative pose can be accurately determined, and the positioning precision is improved.
Drawings
FIG. 1 is a flow chart of a monocular visual odometer positioning method of the present invention;
fig. 2 is a schematic block diagram of a monocular visual odometer positioning system of the present invention.
Description of the symbols:
the system comprises an acquisition unit-1, a modeling unit-2, a preprocessing module-20, a stacking module-21, a feature extraction module-22, an information extraction module-23, a pose determination module-24 and a positioning unit-3.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
The invention aims to provide a monocular vision odometer positioning method, which comprises the steps of extracting high-dimensional features from each stacked image through a FlowNet encoder, sequentially extracting local information and global information from the high-dimensional features through an LCGR (link control group) module, further obtaining a relative pose through full-connection regression processing, establishing a monocular vision odometer positioning model, positioning a video sequence to be measured through the monocular vision odometer positioning model, accurately determining the relative pose, and improving the positioning precision.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in FIG. 1, the monocular visual odometer positioning method of the present invention comprises:
step 10: acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
step 20: establishing a monocular vision odometer positioning model according to each video sequence;
step 30: and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
In step 20, the establishing a monocular visual odometer positioning model according to each video sequence specifically includes:
step 210: stacking each adjacent frame image to obtain a corresponding stacked image;
step 220: extracting high-dimensional features from each stacked image through a Flow Networks (optical Flow network) encoder;
step 230: sequentially extracting Local information and Global information from the high-dimensional features through an LCGR (Local constraint and Global recovery neural network) module;
step 240: and obtaining the relative pose through full-connection regression processing according to the local information and the global information.
Preferably, the establishing a monocular visual odometry positioning model according to each video sequence further includes: step 200: the size of each image is adjusted to a uniform size. In this embodiment, the size is 384 x 1280 pixels.
In step 220, given N +1 frames of consecutive images from t to t + N, after stacking, N sets of 6 × 20 × 1024 high-dimensional features can be obtained by the FlowNet encoder.
In step 230, the extracting, by the LCGR module (as shown in table 1), local information and global information from the high-dimensional features sequentially includes:
step 231: performing convolution operation on each high-dimensional feature and K groups of 3D convolution kernels to obtain local information with different lengths, wherein the size of the kth group of convolution kernels is K multiplied by 3, and
Figure BDA0002741865860000081
to ensure that a more compact high dimensional feature can be obtained after convolution.
In this embodiment, K is 2, and the convolution step size is 1, CkAre both 128.
TABLE 1
Figure BDA0002741865860000091
Step 232: based on Bi-ConvLSTM (Bidirectional consistent Long Short-Term Memory convolution), extracting global information of the video sequence from each high-dimensional feature.
Further, the relative pose includes displacement and attitude.
In step 240, the creating a monocular visual odometer positioning model from each video sequence further comprises:
step 241: calculating the displacement loss value L according to the following formulatransAnd attitude loss value Lrot
Figure BDA0002741865860000092
Figure BDA0002741865860000093
Wherein the content of the first and second substances,
Figure BDA0002741865860000094
in order for the displacement to be predicted,
Figure BDA0002741865860000095
for the predicted angle, pt,φtFor the true value, T denotes the image number, T is 1,2, …, T denotes the number of images;
Figure BDA0002741865860000096
represents a two-norm;
step 242: according to the displacement loss value LtransAnd attitude loss value LrotDetermining a total loss value;
step 243: and adjusting the monocular vision odometer positioning model according to the total loss value.
Further, the total loss value also includes an optical flow loss value, a constraint loss value, and a pole pair loss value.
Correspondingly, the establishing of the monocular visual odometer positioning model according to each video sequence further comprises:
determining an optical flow loss value L according to each adjacent frame image and the optical flow output by the FlowNet encoder through an optical flow and mask estimation moduleptotometricConstraint loss value LregAnd a pole pair loss value Le
According to the displacement loss value LtransAttitude loss value LrotOptical flow loss value LptotometricConstraint loss value LregAnd a pole pair loss value LeThe total loss value is determined.
Wherein the optical flow loss value L is determined according to the following formulaptotometricConstraint loss value LregPole pair loss value LeAnd total loss value Ltotal
Figure BDA0002741865860000101
Figure BDA0002741865860000102
Figure BDA0002741865860000103
Le=|qTA-T[r]×RK-1p|;
Ltotal=Ltrans+100Lrot+Lptotometric+Le+Lreg
Wherein (i, j) representsPixel coordinate position, ItRepresents the image of the t-th frame, I' (I, j, t +1) represents the images I (I, j, t) and I (I, j, t +1) of two consecutive frames and the FlowNet encoder outputs the image of optical flow synthesis; c (i, j) is the mask value for the (i, j) location, indicating the confidence that the pixel can be successfully synthesized; two continuous frames of images I (I, j, t) and I (I, j, t +1), the pixel correspondence between the source image and the target image is provided through the estimated optical flow, q is the pixel position in the target image, p is the pixel position in the source image which does not correspond to the p, A is the camera internal parameter, and R and R are the relative poses between the source image and the target image.
Wherein the optical flow and mask estimation module: given two consecutive pictures I (I, j, t) and I (I, j, t +1) and the optical flow output by the FlowNet encoder, one can synthesize:
I′(i,j,t+1)=I(i+ui,j,j+vi,j,t),
the photometric error can be obtained by calculating the synthesized I' (I, j, t +1) and the original I (I, j, t +1), i.e.:
Figure BDA0002741865860000111
where u, v are the horizontal and vertical components of the optical flow. This process may be implemented by differentiable bilinear interpolation. Given a sequence of pictures I1,I2,...ITThe overall loss function is:
Figure BDA0002741865860000112
to mitigate the effect of errors in regions of light inconsistency in the scene on gradient propagation, the optical flow prediction component simultaneously evaluates a mask whose value represents the probability that each pixel can be successfully synthesized. The mask is estimated by adding a branched convolutional layer before the last layer of FlowNet, and the activation function of the convolutional layer adopts a sigmoid function. The final loss function becomes:
Figure BDA0002741865860000113
furthermore, to prevent C (i, j) from being optimized to 0, it is necessary to constrain C (i, j) to be 1, with the penalty of this constraint:
Figure BDA0002741865860000114
in order to solve the problems brought by dynamic objects in a scene, the invention utilizes pose true value construction antipodal constraints to explicitly prompt a network to output a lower mask value to a moving object region to relieve the influence on gradient propagation. Given two images of adjacent frames, the estimated optical flow provides a pixel correspondence between the source image and the target image. Assuming that G is the pixel position in the target image, and its corresponding pixel position in the source image is p, the epipolar constraint can be expressed as:
qTK-T[t]×RK-1p=0。
the final antipodal losses are:
Le=|qTA-T[r]×RK-1p|。
the monocular visual odometer localization model of the present invention was trained on KITTIVO/SLAM. The data set comprises 22 video sequences, wherein 00-10 provide pose truth values, and 11-21 only provide original video sequences. Many dynamic objects are contained in these 22 video sequences, which is very challenging for monocular VOs. The pictures in the training are all adjusted to 384 x 1280 pixels, the initial learning rate is 0.0001, the batch size is 2, and the learning rate is halved every 10 epochs.
In addition, the invention also provides a monocular vision odometer positioning system which can improve the positioning precision.
As shown in fig. 2, the monocular visual odometer positioning system of the present invention includes an obtaining unit 1, a modeling unit 2, and a positioning unit 3.
The acquiring unit 1 is configured to acquire a training data set, where the training data set includes a plurality of video sequences, and each video sequence includes multiple frames of continuous images;
the modeling unit 2 is used for establishing a monocular vision odometer positioning model according to each video sequence;
the positioning unit 3 is used for obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
Preferably, the modeling unit 2 includes a stacking module 21, a feature extraction module 22, an information extraction module 23, and a pose determination module 24. In particular, the amount of the solvent to be used,
the stacking module 21 is configured to perform stacking processing on each adjacent frame image to obtain a corresponding stacked image;
the feature extraction module 22 is configured to extract high-dimensional features from each stacked image through a FlowNet encoder;
the information extraction module 23 is configured to sequentially extract local information and global information from the high-dimensional features through the LCGR module;
the pose determining module 24 is configured to obtain a relative pose through full-connected regression processing according to the local information and the global information.
Further, the modeling unit 2 further includes a preprocessing module 20, the preprocessing module 20 is respectively connected to the obtaining unit 1 and the stacking module 21, and the preprocessing module 20 is configured to adjust the size of each image to a uniform size and send the image to the stacking module 21.
In addition, the invention also provides the following scheme: a monocular visual odometer positioning system comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
Further, the present invention also provides a computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to perform operations of:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
Compared with the prior art, the monocular visual odometer positioning system and the computer readable storage medium have the same beneficial effects as the monocular visual odometer positioning method, and are not repeated herein.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A monocular visual odometer positioning method, comprising:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
2. The monocular visual odometer positioning method according to claim 1, wherein the extracting, by the LCGR module, the local information and the global information from the high-dimensional features in sequence specifically includes:
performing convolution operation on each high-dimensional feature and K groups of 3D convolution kernels to obtain local information with different lengths, wherein the size of the kth group of convolution kernels is K multiplied by 3, and
Figure FDA0002741865850000011
based on Bi-ConvLSTM, global information of the video sequence is extracted from each high-dimensional feature.
3. The monocular visual odometer positioning method of claim 1, wherein the relative pose comprises a displacement and a pose;
the establishing of the monocular visual odometer positioning model according to each video sequence further comprises:
calculating the displacement loss value L according to the following formulatransAnd attitude loss value Lrot
Figure FDA0002741865850000021
Figure FDA0002741865850000022
Wherein the content of the first and second substances,
Figure FDA0002741865850000023
in order for the displacement to be predicted,
Figure FDA0002741865850000024
for the predicted angle, pt,φtFor the true value, T denotes the image number, T is 1,2, …, T denotes the number of images;
Figure FDA0002741865850000025
represents a two-norm;
according to the displacement loss value LtransAnd attitude loss value LrotDetermining total lossA value;
and adjusting the monocular vision odometer positioning model according to the total loss value.
4. The monocular visual odometer positioning method of claim 3, wherein the total loss value further comprises an optical flow loss value, a constraint loss value, and a pole pair loss value;
the establishing of the monocular visual odometer positioning model according to each video sequence further comprises:
determining an optical flow loss value L according to each adjacent frame image and the optical flow output by the FlowNet encoder through an optical flow and mask estimation moduleptotometricConstraint loss value LregAnd a pole pair loss value Le
According to the displacement loss value LtransAttitude loss value LrotOptical flow loss value LptotometricConstraint loss value LregAnd a pole pair loss value LeThe total loss value is determined.
5. The monocular visual odometer positioning method of claim 4, wherein the optical flow loss value L is determined according to the following formulaptotometricConstraint loss value LregPole pair loss value LeAnd total loss value Ltotal
Figure FDA0002741865850000026
Figure FDA0002741865850000027
Figure FDA0002741865850000028
Le=|qTA-T[r]×RK-1p|;
Ltotal=Ltrans+100Lrot+Lptotometric+Le+Lreg
Wherein (I, j) represents the pixel coordinate position, ItRepresents the image of the t-th frame, I' (I, j, t +1) represents the images I (I, j, t) and I (I, j, t +1) of two consecutive frames and the FlowNet encoder outputs the image of optical flow synthesis; c (i, j) is the mask value for the (i, j) location, indicating the confidence that the pixel can be successfully synthesized; two continuous frames of images I (I, j, t) and I (u, j, t +1), the pixel correspondence between the source image and the target image is provided through the estimated optical flow, q is the pixel position in the target image, p represents the corresponding pixel position in the source image, A is the camera internal parameter, and R and R are the relative poses between the source image and the target image.
6. The monocular visual odometer positioning method of any one of claims 1-5, further comprising:
the size of each image is adjusted to a uniform size.
7. A monocular visual odometer positioning system, comprising:
an obtaining unit, configured to obtain a training data set, where the training data set includes a plurality of video sequences, and each video sequence includes multiple frames of continuous images;
the modeling unit is used for establishing a monocular vision odometer positioning model according to each video sequence;
wherein the modeling unit includes:
the stacking module is used for stacking each adjacent frame image to obtain a corresponding stacked image;
the characteristic extraction module is used for extracting high-dimensional characteristics from each stacked image through a FlowNet encoder;
the information extraction module is used for sequentially extracting local information and global information from the high-dimensional features through the LCGR module;
the pose determining module is used for obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and the positioning unit is used for obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
8. The monocular visual odometer positioning system of claim 7, wherein the modeling unit further comprises:
and the preprocessing module is respectively connected with the acquisition unit and the stacking module, and is used for adjusting the size of each image to be uniform and sending the image to the stacking module.
9. A monocular visual odometer positioning system comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
10. A computer-readable storage medium storing one or more programs that, when executed by an electronic device including a plurality of application programs, cause the electronic device to:
acquiring a training data set, wherein the training data set comprises a plurality of video sequences, and each video sequence comprises a plurality of frames of continuous images;
establishing a monocular vision odometer positioning model according to each video sequence;
wherein, according to each video sequence, establish monocular vision odometer location model, specifically include:
stacking each adjacent frame image to obtain a corresponding stacked image;
extracting high-dimensional features from each stacked image through a FlowNet encoder;
sequentially extracting local information and global information from the high-dimensional features through an LCGR module;
obtaining a relative pose through full-connection regression processing according to the local information and the global information;
and obtaining the relative pose to be detected according to the video sequence to be detected based on the monocular vision odometer positioning model.
CN202011153385.7A 2020-10-26 2020-10-26 Monocular vision odometer positioning method and system Active CN112344922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011153385.7A CN112344922B (en) 2020-10-26 2020-10-26 Monocular vision odometer positioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011153385.7A CN112344922B (en) 2020-10-26 2020-10-26 Monocular vision odometer positioning method and system

Publications (2)

Publication Number Publication Date
CN112344922A true CN112344922A (en) 2021-02-09
CN112344922B CN112344922B (en) 2022-10-21

Family

ID=74360257

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011153385.7A Active CN112344922B (en) 2020-10-26 2020-10-26 Monocular vision odometer positioning method and system

Country Status (1)

Country Link
CN (1) CN112344922B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106658023A (en) * 2016-12-21 2017-05-10 山东大学 End-to-end visual odometer and method based on deep learning
US20180365579A1 (en) * 2017-06-15 2018-12-20 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for evaluating a matching degree of multi-domain information based on artificial intelligence, device and medium
US20190080167A1 (en) * 2017-09-13 2019-03-14 TuSimple Data acquistion and input of neural network system for deep odometry assisted by static scene optical flow
CN110782490A (en) * 2019-09-24 2020-02-11 武汉大学 Video depth map estimation method and device with space-time consistency
CN110910447A (en) * 2019-10-31 2020-03-24 北京工业大学 Visual odometer method based on dynamic and static scene separation
CN111080699A (en) * 2019-12-11 2020-04-28 中国科学院自动化研究所 Monocular vision odometer method and system based on deep learning
CN111103577A (en) * 2020-01-07 2020-05-05 湖南大学 End-to-end laser radar calibration method based on cyclic neural network
CN111127557A (en) * 2019-12-13 2020-05-08 中国电子科技集团公司第二十研究所 Visual SLAM front-end attitude estimation method based on deep learning
CN111311685A (en) * 2020-05-12 2020-06-19 中国人民解放军国防科技大学 Motion scene reconstruction unsupervised method based on IMU/monocular image
CN111462135A (en) * 2020-03-31 2020-07-28 华东理工大学 Semantic mapping method based on visual S L AM and two-dimensional semantic segmentation
CN111489372A (en) * 2020-03-11 2020-08-04 天津大学 Video foreground and background separation method based on cascade convolution neural network
WO2020186943A1 (en) * 2019-03-15 2020-09-24 京东方科技集团股份有限公司 Mobile device posture determination apparatus and method, and visual odometer
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106658023A (en) * 2016-12-21 2017-05-10 山东大学 End-to-end visual odometer and method based on deep learning
US20180365579A1 (en) * 2017-06-15 2018-12-20 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for evaluating a matching degree of multi-domain information based on artificial intelligence, device and medium
US20190080167A1 (en) * 2017-09-13 2019-03-14 TuSimple Data acquistion and input of neural network system for deep odometry assisted by static scene optical flow
WO2020186943A1 (en) * 2019-03-15 2020-09-24 京东方科技集团股份有限公司 Mobile device posture determination apparatus and method, and visual odometer
CN110782490A (en) * 2019-09-24 2020-02-11 武汉大学 Video depth map estimation method and device with space-time consistency
CN110910447A (en) * 2019-10-31 2020-03-24 北京工业大学 Visual odometer method based on dynamic and static scene separation
CN111080699A (en) * 2019-12-11 2020-04-28 中国科学院自动化研究所 Monocular vision odometer method and system based on deep learning
CN111127557A (en) * 2019-12-13 2020-05-08 中国电子科技集团公司第二十研究所 Visual SLAM front-end attitude estimation method based on deep learning
CN111103577A (en) * 2020-01-07 2020-05-05 湖南大学 End-to-end laser radar calibration method based on cyclic neural network
CN111489372A (en) * 2020-03-11 2020-08-04 天津大学 Video foreground and background separation method based on cascade convolution neural network
CN111462135A (en) * 2020-03-31 2020-07-28 华东理工大学 Semantic mapping method based on visual S L AM and two-dimensional semantic segmentation
CN111311685A (en) * 2020-05-12 2020-06-19 中国人民解放军国防科技大学 Motion scene reconstruction unsupervised method based on IMU/monocular image
CN111783582A (en) * 2020-06-22 2020-10-16 东南大学 Unsupervised monocular depth estimation algorithm based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
EDDY ILG等: "FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
屈豪等: "基于注意力模型的视觉/惯性组合里程计算法研究", 《导航定位与授时》 *

Also Published As

Publication number Publication date
CN112344922B (en) 2022-10-21

Similar Documents

Publication Publication Date Title
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
Park et al. High-precision depth estimation using uncalibrated LiDAR and stereo fusion
CN110766024B (en) Deep learning-based visual odometer feature point extraction method and visual odometer
CN113108771B (en) Movement pose estimation method based on closed-loop direct sparse visual odometer
CN110009675B (en) Method, apparatus, medium, and device for generating disparity map
CN108648216B (en) Visual odometer implementation method and system based on optical flow and deep learning
CN111105432A (en) Unsupervised end-to-end driving environment perception method based on deep learning
CN111798373A (en) Rapid unmanned aerial vehicle image stitching method based on local plane hypothesis and six-degree-of-freedom pose optimization
CN111882602A (en) Visual odometer implementation method based on ORB feature points and GMS matching filter
CN111127522A (en) Monocular camera-based depth optical flow prediction method, device, equipment and medium
CN114037762A (en) Real-time high-precision positioning method based on image and high-precision map registration
CN112767480A (en) Monocular vision SLAM positioning method based on deep learning
CN116402876A (en) Binocular depth estimation method, binocular depth estimation device, embedded equipment and readable storage medium
CN114170304B (en) Camera positioning method based on multi-head self-attention and replacement attention
CN113345032B (en) Initialization map building method and system based on wide-angle camera large distortion map
CN114266823A (en) Monocular SLAM method combining SuperPoint network characteristic extraction
CN112344922B (en) Monocular vision odometer positioning method and system
CN117115271A (en) Binocular camera external parameter self-calibration method and system in unmanned aerial vehicle flight process
CN116824433A (en) Visual-inertial navigation-radar fusion self-positioning method based on self-supervision neural network
CN116740488A (en) Training method and device for feature extraction model for visual positioning
CN116128966A (en) Semantic positioning method based on environmental object
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
CN112419411B (en) Realization method of vision odometer based on convolutional neural network and optical flow characteristics
US20240153120A1 (en) Method to determine the depth from images by self-adaptive learning of a neural network and system thereof
CN113112547A (en) Robot, repositioning method thereof, positioning device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant