CN112465021B - Pose track estimation method based on image frame interpolation method - Google Patents

Pose track estimation method based on image frame interpolation method Download PDF

Info

Publication number
CN112465021B
CN112465021B CN202011352019.4A CN202011352019A CN112465021B CN 112465021 B CN112465021 B CN 112465021B CN 202011352019 A CN202011352019 A CN 202011352019A CN 112465021 B CN112465021 B CN 112465021B
Authority
CN
China
Prior art keywords
image
interpolation
semantic
frame
pose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011352019.4A
Other languages
Chinese (zh)
Other versions
CN112465021A (en
Inventor
梁志伟
郭强
周鼎宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202011352019.4A priority Critical patent/CN112465021B/en
Publication of CN112465021A publication Critical patent/CN112465021A/en
Application granted granted Critical
Publication of CN112465021B publication Critical patent/CN112465021B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a pose track estimation method based on an image frame interpolation method, wherein a new frame is inserted between two image frames, semantics is used as representation of an invariant scene and pose track estimation is constrained together with feature points, tracking loss is reduced by increasing the number of feature point matching between the frames, influence of dynamic object feature points and matching of constraint feature points are reduced by fusing semantic information, and the accuracy of pose estimation and track estimation is improved. Experiments on the public data set show that the method keeps higher precision, has strong robustness on the conditions of moving objects and sparse textures, and obtains good results in the aspect of improving the identification precision of the visual odometer.

Description

Pose track estimation method based on image frame interpolation method
Technical Field
The invention relates to the technical field of computer vision, in particular to a pose track estimation method based on an image frame interpolation method.
Background
The goal of visual odometry is to estimate the motion of the camera from the captured images, and there are two methods commonly used at present, namely, a feature point method and a direct method. The characteristic point method is the mainstream at present, good results can be obtained in places where the camera moves fast, the illumination change is not obvious and the environment is various, but characteristic points are easy to be lost in places where the change is not obvious like a tunnel, and bad results are generated; the direct method does not need to mention features, but is not suitable for an environment where the camera motion is fast. The core of visual odometry is the problem of data correlation, as it establishes pixel-level correlation between images. These correspondingly associated pixels are used to construct a three-dimensional map of the scene and track the pose of the current camera. Such local tracking and mapping may introduce small errors in each frame, which may be larger if the two images are taken too far apart, and distant objects, when viewed close together, may have significant changes in characteristics. This is mainly the case when the frame rate of camera shots is too low and there is a lack of invariance representation for the features.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a pose track estimation method based on an image interpolation method, which solves the problem that the recognition accuracy of pose track estimation of a visual odometer according to a shot image is low in the prior art.
The technical scheme adopted by the invention is as follows: a pose track estimation method based on an image frame interpolation method is shown in FIG. 1 and comprises the following steps:
s01 feature detection: the system acquires a series of pictures and then carries out ORB feature detection on the pictures;
s02 semantic recognition: performing semantic recognition on the image if the matching number of the feature points obtained according to the previous and next frames meets the threshold requirement; otherwise, performing S021 image frame interpolation;
the S021 image frame inserting process: performing image frame interpolation between two adjacent frames, and then performing detection matching of the feature points again; when the image is collected, the problem that the common view part between two adjacent frames is too little due to the fact that the camera moves too fast or the frame rate of the camera is too low is avoided, so that the problem can be solved by applying a video frame interpolation technology, and under the condition that the feature points of the two frames of images are less in matching, the common identification part between the two adjacent frames is increased by using the image frame interpolation technology, so that the number of matched feature points is increased;
and S03 image information fusion: after semantic recognition, fusing semantic image information and feature point image information, and removing feature points detected on the dynamic object; if the number of the characteristic points does not meet the threshold requirement after the characteristic points of the dynamic object are removed, carrying out S021 image frame interpolation;
and S04, after the threshold requirement is met, finally carrying out pose estimation.
Further, in the S021 image interpolation process, a deep convolutional neural network method is used to estimate an appropriate convolution kernel to synthesize each output pixel in the interpolated image.
Further preferably, the output coefficients of the convolution kernel are non-negative and sum to 1.
Furthermore, in the process of frame interpolation of the S021 image, a method of fusing a loss function and a gradient loss function between the interpolated pixel color and the background color is used, so that the generated image is clearer.
Further, in the semantic recognition process of S02, the YOLO algorithm is used to extract semantic information and divide the semantic information into static objects and dynamic objects; removing the feature points detected on the dynamic object, reserving the feature points of the static object, and constructing a new loss function, wherein the new loss function is a constraint added with a semantic loss function on the basis of a classical feature point loss function.
Further, in the fusion process of the image information of S03, after removing the feature points detected on the dynamic object by using the semantic image information, the feature points of the static object are used to construct semantic error and reprojection error fusion, thereby improving the accuracy of pose estimation.
Furthermore, in the pose estimation process of S04, different weights are adopted for semantic errors and reprojection errors in the joint optimization function, and the robustness of the system is improved.
Further, in the pose estimation process of S04, an expectation maximization algorithm is used to minimize the error function, so as to ensure the accuracy of the pose estimation.
Compared with the prior art, the invention has the beneficial effects that:
according to the pose track estimation method based on the image frame interpolation method, a new frame is inserted between two image frames, semantics are used as representation of an invariant scene, pose track estimation is constrained together with feature points, tracking loss is reduced by increasing the number of feature point matching between the frames, influence of dynamic object feature points and matching of constraint feature points are reduced by fusing semantic information, and accuracy of pose estimation and track estimation is improved. Experiments on the public data set show that the method keeps higher precision, has strong robustness on the conditions of moving objects and sparse textures, and obtains good results in the aspect of improving the identification precision of the visual odometer.
Drawings
FIG. 1 is a flow chart of a pose trajectory estimation method based on an image frame interpolation method according to the present invention;
FIG. 2 illustrates the convolution pixel interpolation process according to an embodiment of the present invention;
FIG. 3 illustrates the convolution interpolation process according to an embodiment of the present invention;
FIG. 4 is a visual image contrast using additive gradient loss in an embodiment of the present invention;
fig. 5 shows the effect of using additive gradient loss in an embodiment of the present invention, where (a) is a semantic segmentation image, (b) is a binary image, and (c) and (d) are semantic probabilities when sigma is 10 and sigma is 40, respectively, red indicates 1 and blue indicates 0;
FIG. 6 shows the absolute track error of KITTI05 and KITTI07 in the detailed description;
FIG. 7 is an absolute trajectory error curve of the TUM data sets frl _ xyz, frl _ floor and frl _ long _ office _ house hold under ORB-SLAM and the present algorithm, in accordance with an embodiment.
Detailed Description
Reference will now be made in detail to the present embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present invention and should not be construed as limiting the present invention.
1 image interpolation
When an image is acquired, too fast camera movement or too low frame rate of the camera causes too few common vision parts between two adjacent frames, so that the problem can be effectively solved by applying a video frame interpolation technology.
As a preferred approach, a robust video frame interpolation method is used that utilizes a deep convolutional neural network to achieve frame interpolation without the need to explicitly divide it into multiple steps. The method treats pixel interpolation as convolution of corresponding image blocks in two input image frames and uses depth complete convolutionThe neural network estimates a spatially adaptive convolution kernel. Specifically, for a pixel (x, y) in the interpolated frame, the deep neural network centers on two receptive field blocks R of the pixel 1 And R 2 As input, a convolution kernel K is estimated. The convolution kernel is used for matching with the input block P 1 And P 2 Convolved to synthesize the output pixel, as shown in fig. 2.
1.1 principle of the Algorithm
Given two image frames I 1 And I 2 Intended to temporarily insert a frame in the middle of two input frames
Figure BDA0002801578480000031
Combining motion estimation and pixel synthesis into one step and interpolating the pixels as the input image I 1 And I 2 The partial convolution of the patch in (1). As shown in fig. 3, by convolving the appropriate kernel K to the input patch P 1 (x, y) and P 2 (x, y), the color of the pixel (x, y) in the target image to be interpolated may be obtained, and the input patches are also centered at (x, y) in the respective input images. The convolution kernel K captures the motion and resampling coefficients for pixel synthesis.
It is important to estimate the appropriate convolution kernel, which is estimated using a deep convolutional neural network approach to synthesize each output pixel in the interpolated image. The convolution kernel of a single pixel varies depending on local motion and image structure to provide high quality interpolation results. The deep neural network for the kernel estimation will be described below.
1.2 convolution kernel estimation
As a preferred approach, a fully convolutional neural network is used to estimate the convolution kernel for a single output pixel, the structure of which is detailed in table 1. Specifically, to estimate the convolution kernel K of the output pixel (x, y), the neural network operates to sense the outlier R 1 (x, y) and R 2 (x, y) is an input; r 1 (x, y) and R 2 (x, y) is located at the center of (x, y) in the respective input images; patch P convolved by output kernel for generating color of output pixel (x, y) 1 And P 2 Centered together at the same location as the receptive fields, but smaller in size, as shown in fig. 2. A larger block of receptive fields is used to better handle the aperture problem in motion estimation. In an implementation, the default receptive field size is 79 × 79 pixels. The convolution slice size is 41 × 41 and the kernel size is 41 × 82 for two-slice convolution. The same convolution kernel is applied to each of the three color channels.
TABLE 1 convolutional neural network architecture
Figure BDA0002801578480000041
As shown in table 1, the convolutional neural network consists of several convolutional layers and a lower convolutional layer as an alternative to the max pooling layer. The corrected linear cells are used as activation functions and regularized using batch normalization. The neural network of (a) can be trained end-to-end using widely available video data, which provides a sufficiently large training data set. Data augmentation can also be widely utilized by flipping the training samples horizontally and vertically and reversing their order. Is fully convoluted. Thus, it is not limited to a fixed size input, but is accelerated by using shift and stitch techniques to generate kernels of multiple pixels simultaneously.
One key constraint is that the coefficients of the output convolution kernel should be non-negative and sum to 1. Thus, the final convolution layer is connected to a spatial sofimax layer to output convolution kernels.
1.3 loss function
For clarity, the symbols are first defined. The ith training example includes two input receptive field blocks R i,1 And R i,2 Is located in (x) i ,y i ) Center, corresponding input patch P i,1 And P i,2 Smaller than the receptive field block and also concentrated at the same position, true-to-earth color
Figure BDA0002801578480000051
And true gradient of the earth
Figure BDA0002801578480000052
In (x) i ,y i ) In the interpolated frame. For simplicity, (x) has been omitted from the definition of the penalty function i ,y i )。
One possible loss function for a deep convolutional neural network is the difference between the interpolated pixel color and the ground color, as follows:
Figure BDA0002801578480000053
where the index i denotes the ith training example, K i Is the convolution kernel of the neural network output. Experiments have shown that light is such a color loss that even with a 1 standard value results in a blurred result, as shown in fig. 4. Since differentiation is also a convolution, the combination property of convolution is used to solve this problem, assuming that the kernel varies slowly in the local region: the gradient of the input block is first computed and then convolved with the estimated kernel, which will result in a gradient of the interpolated image at the pixel of interest. Since one pixel (x, y) has eight neighboring pixels, eight different gradients are calculated by finite difference method and all of them are combined into a gradient loss function.
Figure BDA0002801578480000054
Where k represents one of eight methods of calculating the gradient.
Figure BDA0002801578480000055
And
Figure BDA0002801578480000056
is an input block P i,1 And P i,2 The gradient of (a) of (b) is,
Figure BDA0002801578480000057
is the substantially true gradient. The above color and gradient losses are combined as the final loss E c +λ·E g λ ═ 1 was found to work well and used. This color plus gradient penalty can produce sharper interpolation results, as shown in fig. 4.
2 semantic fusion
In a visual odometer based on feature points or optical flow, moving objects in an image can have a great influence on the whole system. And the change of object illumination and the difference of viewpoints influence the extraction of feature points and the estimation of optical flow, however, semantic information can be used as an invariant for scene representation. Although the change of the viewpoint, the illumination and the scale can affect the low-level appearance of the object, the semantic representation of the object is not affected, and the semantic information of the image can identify moving objects (people, vehicles, animals and the like) and help to remove the influence of the dynamic object, so the semantic information of the image is integrated into the system.
2.1 extracting semantic information
Semantic information is acquired by using the YOLO, and each individual step of target detection is integrated into a neural network, so that the network predicts all bounding boxes of all classes based on the characteristics of the whole image (the whole image and all targets in the image are fully concerned), and the purposes of end-to-end training and real-time detection are achieved. As shown in image 5(a), different categories may be represented by different colors.
2.2 semantic visual odometer framework
First, in a standard window-based visual odometry system, a set of input images is given
Figure BDA0002801578480000061
In this case, the visual odometer uses a given set of corresponding observations Z i,k Processing a jointly optimized set of camera poses
Figure BDA0002801578480000062
T k E.g. SE (3), map points
Figure BDA0002801578480000063
To a problem of (a). The observation value may be defined as a key point in the imageLocation. The odometer objective function formula is thus as follows
E base =∑ ki e base (k,i) (3)
For each input image I k Requiring a dense semantic segmentation of the pixels
Figure BDA0002801578480000064
Where each pixel is labeled as one of the | C | classes in the set C, and thus each mapped point is also associated with a domain variable Z i And e C. p (Z) i =c|X i ) Is located at position X i Point P of i Probability of belonging to class C. Each point P i Is represented as a labeled probability vector of
Figure BDA0002801578480000065
Wherein
Figure BDA0002801578480000066
Is a point P i Probability of belonging to class c. To incorporate semantic constraints into the mileage optimization function, a semantic cost function is defined
E sem =∑ ki e sem (k,i) (4)
Wherein each term relates the camera pose T k And point P i (tag Z thereof) i And position X i Representation) and semantic image observation S k And (4) associating. Optimizing base and semantic costs in a join function
Figure BDA0002801578480000067
Where λ represents the weight of the different terms, as described in the following section
2.3 semantic cost function
By a probabilistic method, an observation likelihood model p is first defined (S) k |T k ,X i ,Z i C), the semantic observation value S is observed k With camera attitude T k And point P i Are linked together.The intuition behind the viewing model is that if X is involved i At S k Projection of (n) to (n) ([ pi ] (T)) k ,X i ) The corresponding pixel is marked with c, then the semantic point observes p (S) k |T k ,X i ,Z i C) should be possible. This probability should follow pi (T) k ,X i ) Distance to the nearest region marked c is reduced, using distance transformation
Figure BDA0002801578480000071
Wherein
Figure BDA0002801578480000072
Is the pixel position and B is the binary image defining the distance transform as shown in fig. 5. More precisely, a binary image is computed for each semantic class c
Figure BDA0002801578480000073
So that S k The pixel with the middle label c has a value of 1 and all other pixels have a value of 0 (fig. 5 (b)). A distance transform is then defined based on this binary image (FIG. 5(c))
Figure BDA0002801578480000074
By using
Figure BDA0002801578480000075
Defining the observation probability as
Figure BDA0002801578480000076
Where π is again the projection operator from the world to the image space, and σ represents the uncertainty in the semantic image classification. For the sake of brevity, the normalization factor that ensures a sum of 1 over the probability space is omitted. For the point labeled c, the likelihood decreases in proportion to the distance from the image region labeled c. Intuitively, maximizing likelihood corresponds to adjusting camera pose and point position to move the point projection towards the correctly labeled image area.
Using the observation likelihood (equation 4), the semantic cost term is defined as
Figure BDA0002801578480000077
Wherein
Figure BDA0002801578480000078
Is also P i Is the probability of C ∈ C. Intuitively speaking, a semantic image S is given k And point P i Semantic cost e of sem (k, i) is a weighted average of the 2D distances. Point projection pi (T) k ,X i ) Each distance to the c-type nearest region
Figure BDA0002801578480000079
Are weighted, i.e. P i Probability w of being of class c i . For example, if P i With a highly determined car tag, the cost is that the point projects to S k The distance of the nearest region labeled car. If P is i Pavement and road labels with the same probability have the lowest cost on both types of boundaries.
Point P i Is marked with a probability vector w i Is calculated by taking into account all of its observations. Specifically, if P i Is composed of a set of cameras T i It is observed that
Figure BDA0002801578480000081
Constant alpha ensures
Figure BDA0002801578480000082
The rule allows for the addition of a tag vector w by accumulating semantic observations i And performing increment thinning. If the observations have the same pattern, i.e. they have their maximum in the same class, the elemental multiplication and normalization will result in a vector w i Converge to a single pattern corresponding to a true tag.
2.4 optimizing the objective function
Visual semantic odometer uses EM (expectation visualization) to minimize the error function E joint . The optimization method comprises the following steps:
(1) in E-step, P is held i And T k Constant, calculated by equation (8)
Figure BDA0002801578480000083
(2) In M-step, hold
Figure BDA0002801578480000084
Invariant, optimized three-dimensional point P i And camera pose T k Due to E sem The sparsity of (a) and (b) the optimization in M-step can be quickly achieved. It should be noted that if only semantic information is used to optimize the three-dimensional points and camera pose, the constraints provided are weak because the probability distribution inside the object boundary is uniform. To avoid this, E sem The optimization method comprises the following steps:
(1) semantic constraints are optimized together with the basic visual odometer;
(2) optimizing a camera pose using a plurality of points and semantic constraints;
(3) because semantic cost constraint is weak, the three-dimensional point is not optimized in a basic system (namely the original BA cost function), and only the camera posture is optimized to reduce drift;
(4) with frequent semantic optimization, the visual semantic odometer can reduce the probability of a three-dimensional point being re-projected onto the wrong object.
Results and analysis of the experiments
The CPU configured by the algorithm platform is an Inter i7-4720HQ processor, the main frequency is 2.6GHz, the memory is 16G, the GPU is not used for acceleration, the system is Ubuntu18.04, a KITTI data set and a TUM data set are respectively used, and the comparison with an ORB-SLAM is carried out.
Table 2 is the Root Mean Square Error (RMSE) for the KITTI and TUM data sets for the algorithm and ORB-SLAM herein. And the processing time (meanime) of each frame, as can be seen from table 2, compared with ORB-SLAM, the accuracy is significantly improved due to the constraints of the picture frame interpolation method and semantic information introduced herein, and the time difference is small although the amount of calculation is increased.
TABLE 2 RMSE and MeAntime
Figure BDA0002801578480000091
FIG. 6 is an absolute trajectory error diagram of KITTI data sets 05 and 07 sequences under ORB-SLAM and the algorithm, which can intuitively sense that the error value between the pose estimation and the real trajectory of the algorithm is small.
FIG. 7 is an absolute trajectory error curve of TUM data sets frl _ xyz, frl _ floor and frl _ long _ office _ house hold under ORB-SLAM and the present algorithm, and it can be seen from the figure that the ORB-SLAM algorithm has substantially the same effect as the present algorithm on rl _ xyz and frl _ long _ office _ house hold on data sets with complex environment and more feature points, but the present algorithm has a significantly better effect than ORB-SLAM on data sets frl _ floor with less texture.

Claims (8)

1. The pose track estimation method based on the image frame interpolation method is characterized by comprising the following steps of:
s01 feature detection: the system acquires a series of pictures and then carries out ORB feature detection on the pictures;
s02 semantic recognition: acquiring the matching number of the feature points according to the front frame and the back frame, and performing semantic recognition on the image if the threshold requirement is met; otherwise, performing S021 image frame interpolation;
the S021 image frame inserting process: performing image frame interpolation between two adjacent frames, and then performing detection matching of the feature points again;
and S03 image information fusion: after semantic recognition, fusing semantic image information and feature point image information, and removing feature points detected on the dynamic object; if the number of the characteristic points does not meet the threshold requirement after the characteristic points of the dynamic object are removed, carrying out S021 image frame interpolation;
s04, after the threshold requirement is met, finally carrying out pose estimation;
the S021 image frame interpolation is carried out between two adjacent frames, and then the detection and matching of the characteristic points are carried out again, and the operation steps comprise: the pixel interpolation is regarded as convolution of corresponding image blocks in two input image frames, and a depth complete convolution neural network is used for estimating a space self-adaptive convolution kernel; specifically, for one pixel in the interpolated frame
Figure 161292DEST_PATH_IMAGE001
Two receptive field blocks of the deep neural network centered on the pixel
Figure 514913DEST_PATH_IMAGE002
And
Figure 313105DEST_PATH_IMAGE003
as input, a convolution kernel K is estimated; the convolution kernel is used for matching with the input block
Figure 43164DEST_PATH_IMAGE004
And
Figure 243201DEST_PATH_IMAGE005
convolving to synthesize an output pixel;
given two image frames
Figure 770653DEST_PATH_IMAGE006
And
Figure 56141DEST_PATH_IMAGE007
temporarily inserting a frame in the middle of two input frames
Figure 324311DEST_PATH_IMAGE008
(ii) a Combining motion estimation and pixel synthesis into one step and interpolating pixels as input image
Figure 378855DEST_PATH_IMAGE006
And
Figure 605437DEST_PATH_IMAGE007
partial convolution of the patches in (1); by convolving the appropriate kernel K onto the input patch
Figure 378221DEST_PATH_IMAGE009
And
Figure 184503DEST_PATH_IMAGE010
obtaining the pixels in the target image to be interpolated
Figure 359132DEST_PATH_IMAGE001
In a respective input image, with a color of
Figure 225457DEST_PATH_IMAGE001
Is taken as the center; the convolution kernel K captures the motion and resampling coefficients for pixel synthesis.
2. The image interpolation frame-based pose trajectory estimation method according to claim 1, wherein in the S021 image interpolation frame process, a deep convolutional neural network method is used to estimate a suitable convolution kernel to synthesize each output pixel in the interpolated image.
3. The image interpolation-based pose trajectory estimation method according to claim 1, wherein output coefficients of the convolution kernel are non-negative and the sum is 1.
4. The image interpolation frame method-based pose trajectory estimation method according to claim 1, wherein a loss function and a gradient loss function between an interpolated pixel color and an under color are fused in an S021 image interpolation frame process.
5. The image interpolation-based pose trajectory estimation method according to claim 1, wherein in the S02 semantic recognition process, a YOLO algorithm is used to extract semantic information and divide the semantic information into static objects and dynamic objects; and removing the characteristic points detected on the dynamic object, reserving the characteristic points of the static object, and constructing a new loss function.
6. The image interpolation-based pose trajectory estimation method according to claim 1, wherein in the S03 image information fusion process, after removing the feature points detected on the dynamic object by using the semantic image information, the feature points of the static object are used to construct semantic error and reprojection error fusion.
7. The image interpolation-based pose trajectory estimation method of claim 1, wherein semantic errors and reprojection errors in the joint optimization function adopt different weights in the pose estimation process of S04.
8. The image interpolation-based pose trajectory estimation method of claim 1, wherein an expectation maximization algorithm is used to minimize an error function in the S04 pose estimation process.
CN202011352019.4A 2020-11-27 2020-11-27 Pose track estimation method based on image frame interpolation method Active CN112465021B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011352019.4A CN112465021B (en) 2020-11-27 2020-11-27 Pose track estimation method based on image frame interpolation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011352019.4A CN112465021B (en) 2020-11-27 2020-11-27 Pose track estimation method based on image frame interpolation method

Publications (2)

Publication Number Publication Date
CN112465021A CN112465021A (en) 2021-03-09
CN112465021B true CN112465021B (en) 2022-08-05

Family

ID=74808007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011352019.4A Active CN112465021B (en) 2020-11-27 2020-11-27 Pose track estimation method based on image frame interpolation method

Country Status (1)

Country Link
CN (1) CN112465021B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113077516B (en) * 2021-04-28 2024-02-23 深圳市人工智能与机器人研究院 Pose determining method and related equipment
CN113671522B (en) * 2021-07-07 2023-06-27 中国人民解放军战略支援部队信息工程大学 Dynamic environment laser SLAM method based on semantic constraint
CN113705431B (en) * 2021-08-26 2023-08-08 山东大学 Track instance level segmentation and multi-motion visual mileage measurement method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104374395A (en) * 2014-03-31 2015-02-25 南京邮电大学 Graph-based vision SLAM (simultaneous localization and mapping) method
CN111462135A (en) * 2020-03-31 2020-07-28 华东理工大学 Semantic mapping method based on visual S L AM and two-dimensional semantic segmentation
CN111582232A (en) * 2020-05-21 2020-08-25 南京晓庄学院 SLAM method based on pixel-level semantic information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104374395A (en) * 2014-03-31 2015-02-25 南京邮电大学 Graph-based vision SLAM (simultaneous localization and mapping) method
CN111462135A (en) * 2020-03-31 2020-07-28 华东理工大学 Semantic mapping method based on visual S L AM and two-dimensional semantic segmentation
CN111582232A (en) * 2020-05-21 2020-08-25 南京晓庄学院 SLAM method based on pixel-level semantic information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
最小化光度误差先验的视觉SLAM算法;韩健英等;《小型微型计算机系统》;20201015(第10期);全文 *

Also Published As

Publication number Publication date
CN112465021A (en) 2021-03-09

Similar Documents

Publication Publication Date Title
CN112435325B (en) VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
CN111325794B (en) Visual simultaneous localization and map construction method based on depth convolution self-encoder
CN108665496B (en) End-to-end semantic instant positioning and mapping method based on deep learning
CN108986136B (en) Binocular scene flow determination method and system based on semantic segmentation
CN109387204B (en) Mobile robot synchronous positioning and composition method facing indoor dynamic environment
Zou et al. Df-net: Unsupervised joint learning of depth and flow using cross-task consistency
CN112465021B (en) Pose track estimation method based on image frame interpolation method
CN110381268B (en) Method, device, storage medium and electronic equipment for generating video
CN109685045B (en) Moving target video tracking method and system
CN111815665B (en) Single image crowd counting method based on depth information and scale perception information
CN101120382A (en) Method for tracking moving object in video acquired of scene with camera
CN104766065B (en) Robustness foreground detection method based on various visual angles study
CN102156995A (en) Video movement foreground dividing method in moving camera
CN105809716B (en) Foreground extraction method integrating superpixel and three-dimensional self-organizing background subtraction method
CN104182968B (en) The fuzzy moving-target dividing method of many array optical detection systems of wide baseline
CN107403451B (en) Self-adaptive binary characteristic monocular vision odometer method, computer and robot
EP3252713A1 (en) Apparatus and method for performing 3d estimation based on locally determined 3d information hypotheses
WO2016165064A1 (en) Robust foreground detection method based on multi-view learning
CN110910421A (en) Weak and small moving object detection method based on block characterization and variable neighborhood clustering
CN114782628A (en) Indoor real-time three-dimensional reconstruction method based on depth camera
Djelouah et al. N-tuple color segmentation for multi-view silhouette extraction
CN113436251A (en) Pose estimation system and method based on improved YOLO6D algorithm
CN113421210A (en) Surface point cloud reconstruction method based on binocular stereo vision
Knorr et al. A modular scheme for 2D/3D conversion of TV broadcast
CN114612545A (en) Image analysis method and training method, device, equipment and medium of related model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant