CN111445526A - Estimation method and estimation device for pose between image frames and storage medium - Google Patents

Estimation method and estimation device for pose between image frames and storage medium Download PDF

Info

Publication number
CN111445526A
CN111445526A CN202010321620.0A CN202010321620A CN111445526A CN 111445526 A CN111445526 A CN 111445526A CN 202010321620 A CN202010321620 A CN 202010321620A CN 111445526 A CN111445526 A CN 111445526A
Authority
CN
China
Prior art keywords
frame
current frame
pose
image
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010321620.0A
Other languages
Chinese (zh)
Other versions
CN111445526B (en
Inventor
张涛
李少朋
杨新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Huayun Intelligent Technology Co ltd
Tsinghua University
Original Assignee
Ningbo Huayun Intelligent Technology Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Huayun Intelligent Technology Co ltd, Tsinghua University filed Critical Ningbo Huayun Intelligent Technology Co ltd
Priority to CN202010321620.0A priority Critical patent/CN111445526B/en
Publication of CN111445526A publication Critical patent/CN111445526A/en
Application granted granted Critical
Publication of CN111445526B publication Critical patent/CN111445526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an estimation method, an estimation device and a storage medium for pose among image frames, which are characterized by firstly receiving the image frames, secondly taking each image frame as a current frame, taking a previous frame of the current frame as a reference frame, sequentially tracking the current frame in the reference frame and a local map generated based on the reference frame according to the current frame, finally, responding to the successful tracking, determining the current frame meeting preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose among the key frames based on the image feature points. According to the method and the device, the key frames and the non-key frames are distinguished, and the image feature points are extracted only from the key frames, so that the optimization efficiency and the optimization precision of pose optimization are improved.

Description

Estimation method and estimation device for pose between image frames and storage medium
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to an estimation method, an estimation apparatus, and a storage medium for pose estimation between image frames.
Background
S L AM helps to solve the problem of reconstructing the three-dimensional structure of the environment in real time and Positioning the robot itself in an unknown environment, and can present information more efficiently and intuitively than the traditional text, image, video and other modes.
The visual S L AM technique can be divided into the realization by a feature point extraction method and a direct method, wherein the feature point method extracts significant image features in each image, the matching of feature points is performed in continuous frames using invariant feature descriptions, the pose and structure of the camera are recovered robustly using an antipodal geometry, visual combination calculations and pose optimization based on minimizing projection errors are completed using associated features, these extracted significant features can be clustered to describe the whole image for loop detection, but the extraction of image feature points and the associated matching are relatively cumbersome and time-consuming work.
Disclosure of Invention
The embodiment of the application provides an estimation method of pose between image frames, and the problems of inaccurate pose estimation between image frames and low efficiency are solved.
The method comprises the following steps:
receiving an image frame;
taking each frame of the image frame as a current frame, taking a previous frame of the current frame as a reference frame, and sequentially tracking the current frame in the reference frame and a local map generated based on the reference frame according to the current frame;
and responding to the successful tracking, determining the current frame meeting the preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose between the key frames based on the image feature points.
Optionally, when tracking the current frame in the reference frame is successful, acquiring initial poses of the current frame and the reference frame;
projecting the map points tracked in the reference frame to the current frame according to the initial pose, and calculating pixel errors between image blocks where the map points are projected in the current frame and image blocks where the matching points corresponding to the current frame are located;
and taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the reference frame.
Optionally, three-dimensional map points, corresponding to a local map, of at least one image frame before the current frame are all projected to the current frame, and the projection points corresponding to the three-dimensional map points in the local map are respectively searched near at least one of the projection points:
selecting the projection point which is closest to the gray value of the matching point in the current frame from the searched projection points, and calculating the pixel error between the selected projection point and the matching point corresponding to the current frame;
and taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the local map.
Optionally, when tracking the current frame in the reference frame fails, extracting the image feature point matched with at least one map point in the key frame of the previous frame in the current frame, calculating a relative pose between the current frame and the key frame of the previous frame, and minimizing the relative pose to complete pose tracking.
Optionally, when tracking the current frame in the local map fails, extracting the image feature point matched with at least one map point in the local map in the current frame, calculating a relative pose between the current frame and the local map, and minimizing the relative pose to complete pose tracking.
Optionally, the number of consecutive image frames exceeds a preset number of times and the key frame is not selected in the preset number of consecutive image frames, and/or the number of map points tracked in the reference frame is less than a preset threshold.
Optionally, extracting the image feature point matched with at least one three-dimensional map point in the local map from at least one key frame, and performing pose optimization on the coordinate of the matched three-dimensional map point based on the optimal pose between the image feature point and the three-dimensional map point after the pixel error is minimized;
comparing the image feature points of the current key frame with the image feature points of at least one key frame which is determined before, determining a candidate loop-back frame from the current key frame which is greater than a similarity threshold, and determining the candidate loop-back frame as a loop-back frame when the number of the candidate loop-back frame and the adjacent image frame thereof which are continuously similar to the at least one key frame and the adjacent image frame thereof which are determined before is greater than a preset number.
In another embodiment of the present invention, there is provided an apparatus for estimating a pose between image frames, the apparatus including:
a receiving module for receiving the image frame;
the tracking module is used for taking the image frame of each frame as a current frame, taking the previous frame of the current frame as a reference frame, and sequentially tracking the current frame in the reference frame and a local map generated based on the reference frame according to the current frame;
and the construction module is used for responding to the successful tracking, determining the current frame meeting the preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose between the key frames based on the image feature points.
In another embodiment of the present invention, there is provided a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the above-described method of estimating the pose between image frames.
In another embodiment of the present invention, there is provided a terminal device including a processor for executing each step in the above-described method of estimating a pose between image frames.
Based on the embodiment, firstly, image frames are received, secondly, each image frame is used as a current frame, a previous frame of the current frame is used as a reference frame, the current frame is sequentially tracked in the reference frame and a local map generated based on the reference frame according to the current frame, and finally, in response to successful tracking, the current frame meeting the preset conditions is determined to be a key frame, image feature points are extracted from the key frame, and the optimal pose between the key frames is calculated based on the image feature points. According to the method and the device, the key frames and the non-key frames are distinguished, and the image feature points are extracted only from the key frames, so that the optimization efficiency and the optimization precision of pose optimization are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic flowchart illustrating a method for estimating a pose between image frames according to an embodiment 100 of the present application;
FIG. 2 is a schematic diagram illustrating a flow of tracking by using a direct method in the S L AM algorithm in a non-critical mode and optimizing and detecting a closed loop by using a feature point extraction method in the S L AM algorithm in a key frame according to an embodiment 200 of the present application;
fig. 3 is a schematic diagram illustrating a specific flow of a method for estimating a pose between image frames according to an embodiment 300 of the present application;
fig. 4 shows a schematic diagram of an apparatus for estimating a pose between image frames according to an embodiment 400 of the present application;
fig. 5 shows a schematic diagram of a terminal device provided in embodiment 500 of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
The complete S L AM framework generally comprises a tracking front end, an optimization rear end, a loop detection and a map reconstruction, wherein the tracking front end, namely a visual odometer, is responsible for preliminarily estimating the motion state between image frames and the position of a landmark, the rear end is responsible for receiving pose information measured by the visual odometer at different moments and calculating maximum posterior probability estimation, the loop detection is responsible for judging whether a robot returns to the original position and performing loop closing correction estimation error, and the map reconstruction is responsible for constructing a map adaptive to task requirements according to a camera track and an image, based on the problems in the prior art, the embodiment of the application provides an estimation method of the pose between the image frames, which is mainly applicable to the technical field of computer vision, improves the construction precision of a three-dimensional map and saves the construction time by distinguishing key frames from non-key frames, using a feature extraction method in an S L AM algorithm in the key frames and a direct method in the non-key frames, and the detailed technical scheme of the invention is explained in detail to realize the estimation of a concept extraction of the invention, and the specific embodiments can be combined with the following methods, wherein the process of estimating pose between image frames is not provided as an example 100:
s11, receiving the image frame.
In this step, the received image frames are acquired by an image acquisition device. In particular, the image acquisition device may be a camera, a video camera or a Virtual Reality (VR) device.
And S12, taking each frame of image as a current frame, taking a previous frame of the current frame as a reference frame, and tracking the current frame in the reference frame and the local map generated based on the reference frame according to the current frame.
In this step, for ease of presentation, embodiments of the present application use each frame received as a current frame and a frame preceding the current frame as a reference frame, upon receipt of each current frame, track the current frame in the reference frame first using the direct method in the S L AM algorithm, and continue to track the current frame in a local map generated based on the reference frame after tracking is successful.
Further, there may be a problem that the relative pose of the current frame obtained only by referring to the frame is not sufficiently accurate. The tracking thread tracks the current frame in the local map to obtain more matching mapping points that should be tracked and optimized. The local map is formed by initializing a plurality of reference frames. Specifically, pixels in the current frame are matched with three-dimensional map points in the local map by tracking the relative pose determined by the current frame in the reference frame, the three-dimensional map points of the previous image frame corresponding to the local map are all projected onto the current frame, and a projection point which is closest to the gray value of the matching point on the current frame is selected in the projection area. And respectively obtaining the corresponding characteristic positions in the current frame by minimizing the luminosity error between the projection point and the current matching point, so that the relative pose of the image acquisition equipment corresponding to the current frame can be further optimized.
And S13, responding to the tracking success, determining the current frame meeting the preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose between the key frames based on the image feature points.
In this step, when the current frame is successfully tracked in the reference frame and the local map generated based on the reference frame in sequence, whether the current frame is a key frame meeting preset conditions is further determined, wherein the preset conditions include, but are not limited to, four types, that is, 20 unselected key frames have been continuously preset times, a local optimization thread is in an idle state, the number of tracked map points is smaller than a preset value, such as 50, the number of image feature points tracked by the current frame to the previous key frame is smaller than a preset threshold value, such as 90%, of the total number of the image feature points, a feature point extraction method in an S L AM algorithm is used for extracting an interested fast computed route bridge (ORB) ORB as an image feature point only for the key frame, and an optimal pose between the key frames is calculated based on the image feature point.
As described above, based on the above-described embodiment, image frames are received first, each frame of image is taken as a current frame, and a previous frame of the current frame is taken as a reference frame, and the current frame is tracked in the reference frame and the local map generated based on the reference frame in sequence according to the current frame, and finally, in response to the success of the tracking, the current frame satisfying the preset condition is determined as a key frame, image feature points are extracted from the key frame, and the optimal pose between the key frames is calculated based on the image feature points. According to the method and the device, the key frames and the non-key frames are distinguished, and the image feature points are extracted only from the key frames, so that the optimization efficiency and the optimization precision of pose optimization are improved.
As shown in FIG. 2, a schematic diagram of a process for tracking by direct method in non-critical S L AM algorithm and optimizing and loop-detecting by feature point extraction in S L AM algorithm is shown for the present embodiment 200, wherein it is determined whether the current frame is a keyframe according to the scene change, image point features are extracted in the keyframe, and relative pose calculation between keyframes is accomplished by feature point extraction, and FAST positioning is accomplished by direct method in non-keyframes, wherein the extracted features are Oriented FAST and RotatdBER (ORB) features, which are selected as image feature point coordinates and described by 256 dimensional Rotated, which have been described by IEF, and are processed by a new loop detection thread, which is responsible for correcting the detected image frames in parallel, and the loop detection process is performed by a new loop detection thread, which is used for correcting the detected image frames before the loop detection process, and the loop detection process is performed by a new loop detection thread, which is described by a BRIEF in 256 dimensions, and which the extracted features are used to correct the image frame in parallel processing of the image frames.
Fig. 3 is a schematic diagram illustrating a specific flow of a method for estimating a pose between image frames according to an embodiment 300 of the present application. The method disclosed by the embodiment of the application mainly comprises four parts of monocular initialization, initial positioning according to the reference frame, positioning according to the local map and image frame management. FIG. 3 depicts the whole process of the tracking thread after initialization is completed, and when the image frame is received, the reference frame is first tracked by a direct method to determine the pose T of the current framei(i is an integer greater than or equal to 1 and represents the ith current frame), if the tracking is successful, the direct method is continuously adopted to track the local map to finely adjust the relative pose of the current frame, if the tracking is failed, the image feature points of the current frame are extracted, and the feature point extraction method is adopted to track the last key frame Tk-1And (k is an integer greater than or equal to 1 and represents the sequence of the current key frame), then tracking the local map, if the local map is failed to be tracked, extracting feature points, tracking the local map by adopting a feature extraction point method, then updating variables such as speed, the local map and the like, then determining whether the current frame is the key frame, and if the current frame is the key frame, extracting image features and transmitting the image features to a local optimization thread. In summary, the detailed process of the specific flow is as follows:
s301, receiving an image frame.
Here, each frame image frame is taken as a current frame, and a previous frame of the current frame is taken as a reference frame.
S302, initializing and acquiring initial poses of the current frame and the reference frame, and determining a local map.
Here, the goal of monocular initialization is to compute the relative pose between two image frames and triangulate a set of three-dimensional map points for tracking of the next image frame. During initialization, ORB features need to be extracted every frame, and two geometric models are calculated in parallel: the method comprises the following steps of selecting one model by using a heuristic method according to a homography matrix H under the assumption of a plane scene and a basic matrix F under the assumption of a non-plane scene, and solving an initial pose by using a method corresponding to the model, wherein the method comprises the following specific steps:
(1) selecting an initialization frame: two continuous image frames with the matching ORB feature point number larger than 100 are selected from the received image frames as initialization frames, so that the initialization under the condition of low texture and poor illumination can be avoided.
(2) Two models were computed in parallel: the basis matrix F is solved by direct linear transformation of the homography matrix H and the eight-point method respectively, and the two matrixes are solved by Random Sample Consensus (RANSAC) with the same iteration number.
(3) And (3) solving model selection: if the scene is planar, nearly planar or has low parallax, it can be interpreted as homography H. Meanwhile, the fundamental matrix F can also be solved, but the model cannot be well constrained, and the motion estimated from the fundamental matrix has a large error. In contrast, in a non-planar scene, the base matrix F should be selected to estimate motion, and the criterion is as follows:
Figure BDA0002461650990000071
if R isH>0.45, the homography matrix is selected, otherwise, the basis matrix is selected.
(4) Solving the relative pose and structure: during the solution of the relative pose, if no one model is obviously better than the other model, the initialization is not carried out, the step (1) is returned to start again, and the robustness of the scene during the initialization is well guaranteed through the operation. When the basic matrix model is selected, the intrinsic matrix E needs to be continuously solved according to the camera internal parameter K, as shown in the following formula 2:
E=KTFK equation 2
And E ^ t ^ R, four groups of pose transformation solutions can be solved according to the intrinsic matrix E, and then correct solutions are selected according to the positive and negative values of the depth of the map points in the camera.
(5) Global Bundle Adjustment: and then executing full BA (global Bundle adjust, the pose of the image acquisition equipment and the map structure participate in optimization at the same time) to finely adjust the image acquisition equipment and the map structure.
The above is the step of monocular initialization, after the initialization is completed, the relative poses of the initial two frames are determined, and a local map which can be used for tracking the subsequent frame is generated by initializing a part of the image frames (such as ten image frames from the initialization).
S303, the current frame is tracked in the reference frame.
Here, the tracking thread sequentially tracks the current frame in the reference frame and the local map generated based on the reference frame according to the current frame.
And S304, calculating the optimal poses of the current frame and the reference frame.
Here, when tracking of the current frame in the reference frame is successful, the initial poses of the current frame and the reference frame are acquired. Specifically, after successful initialization or successful reference frame tracking, the pose T of the current frame relative to the reference frame is given according to a constant-speed motion modeli,i-1Initial value T of initial valuei,i-1=Ti-1,i-2And because the current frame does not extract image characteristic points, the position and posture between adjacent frames are optimized by adopting photometric errors on the basis of the initial value:
Figure BDA0002461650990000072
wherein,
Figure BDA0002461650990000073
u is the corner pixel position in the i-1 frame image, R is the region for collecting the corner of the image, Ti,i-1∈ SE (3) is the 6 degree-of-freedom pose represented by the lie group, I is the photometric error, du is the depth value to which pixel u corresponds in frame I-1.
Further, the map points tracked in the reference frame are projected to the current frame according to the initial pose, and the pixel error between the image block where the projection points of the map points projected in the current frame are located and the image block where the matching points corresponding to the current frame are located is calculated. And taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the reference frame. The plaque corresponding to each pixel point in the optimization process can contain 8 pixels, wherein the pixel at the lower right corner is omitted, the SSE accelerated calculation in the starting processor is facilitated by the 8 pixels, and the plaque model realizes good balance between speed and precision.
S305, when the tracking fails, extracting image feature points in the current frame to complete the pose tracking of the current frame in the reference frame.
Here, when tracking of the current frame in the reference frame fails, image feature points matching at least one map point in the previous frame key frame are extracted in the current frame, the relative pose between the current frame and the previous frame key frame is calculated, and the relative pose is minimized to complete pose tracking. Specifically, after matching is completed, pose tracking is completed by minimizing the relative pose between the current frame and the previous key frame (k-1), as shown in formula 4;
Figure BDA0002461650990000081
wherein p isjIs the three-dimensional coordinate, u 'of the j-th image feature point extracted from the key frame k-1 in the frame camera coordinate system'jAnd extracting the points which are matched with the jth image feature point in the key frame from the image feature points of the current frame.
And S306, performing pose optimization on the current frame.
S307, tracking the current frame in the local map.
In the step, after the pose estimation of the current frame based on the reference frame is completed, the local map is continuously tracked, three-dimensional map points corresponding to a plurality of key frames are maintained in the local map, and the constraint can be further increased and the pose estimation precision can be improved by tracking the three-dimensional map points of more frames.
And S308, calculating the optimal poses of the current frame and the local map.
Here, three-dimensional map points corresponding to at least one image frame in the local map before the current frame are all projected to the current frame, and projection points corresponding to the three-dimensional map points in the local map are respectively searched near the at least one projection point: selecting a projection point which is closest to the gray value of the matching point in the current frame from the searched projection points, and calculating the pixel error between the selected projection point and the matching point corresponding to the current frame; and taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the local map. Specifically, the local map point is projected to the current frame, and the pose is fine-tuned again through the photometric error, as shown in formula 5:
Figure BDA0002461650990000082
wherein p isj,k-1Is a three-dimensional map point u under a world coordinate systemjTo extract pjPixel of (b), k is ujThe key frame where it is located. Furthermore, the pose accuracy needs to be further improved through pixel block matching and projection error optimization. Since the image feature points are not extracted in the current frame, the matching of the pixel patch needs to be completed through the pixel error of the patch. And projecting the local map points to the current frame by taking the fine-tuned pose as an initial value, and respectively searching matched patches in the local map points near the projection points. After matching is completed, the pose is further optimized by using a reprojection error model, as shown in formula 6:
Figure BDA0002461650990000091
wherein u'jFor searched pjThe matching point of (2).
S309, when the tracking fails, extracting image feature points in the current frame to complete the pose tracking of the current frame in the local map.
Here, when tracking of the current frame in the local map fails, image feature points matching at least one map point in the local map are extracted in the current frame, the relative pose between the current frame and the local map is calculated, and the relative pose is minimized to complete pose tracking. If the local map is not tracked, the ORB features are extracted from the current frame to complete matching with the local map, and then pose optimization is carried out according to a formula 6. A cache mechanism is adopted in the management of the local map, map points with better tracking effect in the previous frame are cached and preferentially projected to the current frame, and if the cached map points are not enough, other map points are supplemented. This allows the local map to be tracked with greater efficiency.
S310, judging whether the current frame meets the preset condition of the key frame.
Here, it is determined whether the current frame is selected as a key frame, mainly according to the magnitude of the motion amplitude and the scene at the time, the main criterion refers to ORB-S L AM, and one of the following conditions is satisfied, the current frame is selected as a key frame, the key frame is not selected in consecutive image frames in which the number of consecutive image frames exceeds a preset number of times and the preset number of times is, for example, 20 frames, and/or the number of map points tracked in the reference frame is less than a preset threshold, for example, 90%.
S311, local optimization of the three-dimensional map is carried out based on the key frame and the local map.
Specifically, the local optimization thread mainly has the main function of performing local Bundle Adjustment (L ocal BA) on the nearest m key frames and the corresponding local map, optimizing the poses of the key frames and the three-dimensional coordinates of the map points by taking the poses of the key frames and the three-dimensional coordinates of the map points as optimization variables, simultaneously improving the poses and the map accuracy, eliminating unstable map points and key frames according to the tracking and the visual degree of the map points in the key frames, and triangulating new map points.
And S312, carrying out loop detection on the three-dimensional map.
Here, the image feature point of the current key frame is compared with the image feature point of at least one key frame which has been determined before in similarity, the current key frame which is greater than the similarity threshold value is determined as a candidate loop-back frame, and when the number of the candidate loop-back frame and its adjacent image frame which are continuously similar to the at least one key frame and its adjacent image frame which have been determined before is greater than a preset number, the candidate loop-back frame is determined as a loop-back frame.
Specifically, the loop detection thread mainly completes similar key frame judgment, solution approximation transformation, loop fusion and essential graph optimization, and specifically includes the following steps:
(1) and (3) detecting a loop: the similarity between images is calculated by a Bag-of-video-Words method, firstly, the similarity between the current key frame and the common-view key frame (the number of common-view map points is more than 30) is calculated, the minimum value is stored, the frame with the similarity being more than the value is determined as a candidate loop frame, the current frame and the common-view key frame are matched with the candidate loop frame and the adjacent frames thereof, and if the number of continuous similar frames is more than the preset value, such as 3, the candidate frame is determined as the loop frame.
(2) Calculating a similarity transformation: since ORB matching can be performed between the current key frame and the loop frame, a matching relationship is also established between respective map points, and thus, similarity transformation between two key frames can be optimized.
(3) And (3) loop fusion: after the similar pose transformation solution is completed, the position and the posture of the current key frame and the surrounding key frames are adjusted, so that the two ends of the loop are basically aligned, and then the matched map between the current key frame and the loop frame is fused.
(4) And (3) loop closing: and optimizing the position postures of all key frames in the loop according to an essential graph (a graph formed by establishing edges between key frames with a common viewpoint of the loop being more than 100), and uniformly distributing loop errors to the corresponding key frames.
The above is the main step of the loop detection thread, and since the ORB feature points are extracted from the key frame, SV L has the loop detection function.
The method for estimating the pose between the image frames is achieved based on the steps. Extracting image point characteristics from the key frames, and completing characteristic matching between the key frames according to the descriptors of the characteristics; and the non-key frame does not extract and match image point characteristics any more, and the pose tracking and positioning are completed through sparse image alignment. Time consumption brought by feature extraction and matching is avoided in non-key frames, and the ORB features at the key frames enable the scheme to have high precision and loop detection capability.
Based on the same inventive concept, the embodiment 400 of the present application further provides an apparatus for constructing a three-dimensional map, where, as shown in fig. 4, the apparatus includes:
a receiving module 41, configured to receive an image frame;
the tracking module 42 is configured to use each frame of image as a current frame, use a previous frame of the current frame as a reference frame, and track the current frame in the reference frame and the local map generated based on the reference frame in sequence according to the current frame;
and the constructing module 43 is configured to, in response to the tracking success, determine a current frame meeting a preset condition as a key frame, extract image feature points from the key frame, and calculate an optimal pose between the key frames based on the image feature points.
In this embodiment, specific functions and interaction manners of the receiving module 41, the tracking module 42 and the constructing module 43 may refer to the record of the embodiment corresponding to fig. 1, and are not described herein again.
As shown in fig. 5, another embodiment 500 of the present application further provides a terminal device, which includes a processor 501, wherein the processor 501 is configured to execute the steps of the above-mentioned method for estimating the pose between image frames. As can also be seen from fig. 5, the terminal device provided by the above-described embodiment further includes a non-transitory computer-readable storage medium 502, the non-transitory computer-readable storage medium 502 having stored thereon a computer program, which when executed by the processor 501, performs the above-described steps of one method for estimating the pose between image frames. In practice, the terminal device may be one or more computers, as long as the computer-readable medium and the processor are included.
In practical applications, the computer readable medium may be included in the apparatus/device/system described in the above embodiments, or may exist separately without being assembled into the apparatus/device/system.
According to embodiments disclosed herein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the present disclosure. In the embodiments disclosed herein, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments disclosed herein. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not explicitly recited in the present application. In particular, the features recited in the various embodiments and/or claims of the present application may be combined and/or coupled in various ways, all of which fall within the scope of the present disclosure, without departing from the spirit and teachings of the present application.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can still change or easily conceive of the technical solutions described in the foregoing embodiments or equivalent replacement of some technical features thereof within the technical scope disclosed in the present application; such changes, variations and substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application and are intended to be covered by the appended claims. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for estimating a pose between image frames, comprising:
receiving an image frame;
taking each frame of the image frame as a current frame, taking a previous frame of the current frame as a reference frame, and sequentially tracking the current frame in the reference frame and a local map generated based on the reference frame according to the current frame;
and responding to the successful tracking, determining the current frame meeting the preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose between the key frames based on the image feature points.
2. The estimation method according to claim 1, wherein between the step of tracking the current frame in the reference frame and the local map generated based on the reference frame in turn and the step of determining that the current frame satisfying a preset condition is a key frame, the method further comprises:
when the current frame is successfully tracked in the reference frame, acquiring initial poses of the current frame and the reference frame;
projecting the map points tracked in the reference frame to the current frame according to the initial pose, and calculating pixel errors between image blocks where the map points are projected in the current frame and image blocks where the matching points corresponding to the current frame are located;
and taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the reference frame.
3. The estimation method according to claim 2, wherein after the step of taking the relative pose after minimizing the pixel error as the optimal pose between the current frame and the reference frame, the method further comprises:
projecting three-dimensional map points, corresponding to a local map, of at least one image frame before the current frame to the current frame, and searching for the projection points corresponding to the three-dimensional map points in the local map near the at least one projection point respectively:
selecting the projection point which is closest to the gray value of the matching point in the current frame from the searched projection points, and calculating the pixel error between the selected projection point and the matching point corresponding to the current frame;
and taking the relative pose after the pixel error is minimized as the optimal pose between the current frame and the local map.
4. The estimation method according to claim 2, wherein between the step of tracking the current frame in the reference frame and the local map generated based on the reference frame in turn and the step of determining that the current frame satisfying a preset condition is a key frame, the method further comprises:
when tracking the current frame in the reference frame fails, extracting the image feature point matched with at least one map point in the key frame of the previous frame in the current frame, calculating the relative pose between the current frame and the key frame of the previous frame, and minimizing the relative pose to complete pose tracking.
5. The estimation method according to claim 3, characterized in that after the step of taking the relative pose after minimizing the pixel error as the optimal pose between the current frame and the reference frame, the method further comprises:
when tracking of the current frame in the local map fails, extracting the image feature point matched with at least one map point in the local map from the current frame, calculating a relative pose between the current frame and the local map, and minimizing the relative pose to complete pose tracking.
6. The estimation method according to claim 1, wherein the step of determining the current frame satisfying a preset condition as a key frame comprises:
the number of the continuous image frames exceeds a preset number of times, the key frames are not selected from the continuous image frames of the preset number of times, and/or the number of the map points tracked in the reference frame is less than a preset threshold value.
7. The estimation method according to claim 3, wherein after the step of calculating optimal poses between the keyframes based on the image feature points, the method further comprises:
extracting the image feature points matched with at least one three-dimensional map point in the local map from at least one key frame, and performing pose optimization on the coordinates of the matched three-dimensional map points on the basis of the optimal pose between the image feature points and the three-dimensional map points after the pixel error is minimized;
comparing the image feature points of the current key frame with the image feature points of at least one key frame which is determined before, determining a candidate loop-back frame from the current key frame which is greater than a similarity threshold, and determining the candidate loop-back frame as a loop-back frame when the number of the candidate loop-back frame and the adjacent image frame thereof which are continuously similar to the at least one key frame and the adjacent image frame thereof which are determined before is greater than a preset number.
8. An estimation apparatus of a three-dimensional map, characterized in that the construction apparatus comprises:
a receiving module for receiving the image frame;
the tracking module is used for taking the image frame of each frame as a current frame, taking the previous frame of the current frame as a reference frame, and sequentially tracking the current frame in the reference frame and a local map generated based on the reference frame according to the current frame;
and the construction module is used for responding to the successful tracking, determining the current frame meeting the preset conditions as a key frame, extracting image feature points from the key frame, and calculating the optimal pose between the key frames based on the image feature points.
9. A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of a method of estimating pose between image frames as claimed in any one of claims 1 to 7.
10. A terminal device characterized by comprising a processor for executing each step in a method of estimating a pose between image frames according to any one of claims 1 to 7.
CN202010321620.0A 2020-04-22 2020-04-22 Method, device and storage medium for estimating pose of image frame Active CN111445526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010321620.0A CN111445526B (en) 2020-04-22 2020-04-22 Method, device and storage medium for estimating pose of image frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010321620.0A CN111445526B (en) 2020-04-22 2020-04-22 Method, device and storage medium for estimating pose of image frame

Publications (2)

Publication Number Publication Date
CN111445526A true CN111445526A (en) 2020-07-24
CN111445526B CN111445526B (en) 2023-08-04

Family

ID=71653532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010321620.0A Active CN111445526B (en) 2020-04-22 2020-04-22 Method, device and storage medium for estimating pose of image frame

Country Status (1)

Country Link
CN (1) CN111445526B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112025709A (en) * 2020-08-31 2020-12-04 东南大学 Mobile robot positioning system and method based on vehicle-mounted camera vision
CN112115980A (en) * 2020-08-25 2020-12-22 西北工业大学 Binocular vision odometer design method based on optical flow tracking and point line feature matching
CN112270710A (en) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 Pose determination method, pose determination device, storage medium, and electronic apparatus
CN112288816A (en) * 2020-11-16 2021-01-29 Oppo广东移动通信有限公司 Pose optimization method, pose optimization device, storage medium and electronic equipment
CN112651997A (en) * 2020-12-29 2021-04-13 咪咕文化科技有限公司 Map construction method, electronic device, and storage medium
CN112734839A (en) * 2020-12-31 2021-04-30 浙江大学 Monocular vision SLAM initialization method for improving robustness
CN112884838A (en) * 2021-03-16 2021-06-01 重庆大学 Robot autonomous positioning method
CN113034582A (en) * 2021-03-25 2021-06-25 浙江商汤科技开发有限公司 Pose optimization device and method, electronic device and computer readable storage medium
CN113361400A (en) * 2021-06-04 2021-09-07 清远华奥光电仪器有限公司 Head posture estimation method and device and storage medium
CN113393505A (en) * 2021-06-25 2021-09-14 浙江商汤科技开发有限公司 Image registration method, visual positioning method, related device and equipment
CN113624222A (en) * 2021-07-30 2021-11-09 深圳市优必选科技股份有限公司 Map updating method, robot and readable storage medium
CN114399532A (en) * 2022-01-06 2022-04-26 广东汇天航空航天科技有限公司 Camera position and posture determining method and device
CN114549612A (en) * 2022-02-25 2022-05-27 北京百度网讯科技有限公司 Model training and image processing method, device, equipment and storage medium
CN115371661A (en) * 2022-08-12 2022-11-22 深圳市优必选科技股份有限公司 Robot, and method, device and storage medium for establishing image of robot

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108648240A (en) * 2018-05-11 2018-10-12 东南大学 Based on a non-overlapping visual field camera posture scaling method for cloud characteristics map registration
US20190114777A1 (en) * 2017-10-18 2019-04-18 Tata Consultancy Services Limited Systems and methods for edge points based monocular visual slam
US10346949B1 (en) * 2016-05-27 2019-07-09 Augmented Pixels, Inc. Image registration
CN110310326A (en) * 2019-06-28 2019-10-08 北京百度网讯科技有限公司 A kind of pose data processing method, device, terminal and computer readable storage medium
CN110866496A (en) * 2019-11-14 2020-03-06 合肥工业大学 Robot positioning and mapping method and device based on depth image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10346949B1 (en) * 2016-05-27 2019-07-09 Augmented Pixels, Inc. Image registration
US20190114777A1 (en) * 2017-10-18 2019-04-18 Tata Consultancy Services Limited Systems and methods for edge points based monocular visual slam
CN108648240A (en) * 2018-05-11 2018-10-12 东南大学 Based on a non-overlapping visual field camera posture scaling method for cloud characteristics map registration
CN110310326A (en) * 2019-06-28 2019-10-08 北京百度网讯科技有限公司 A kind of pose data processing method, device, terminal and computer readable storage medium
CN110866496A (en) * 2019-11-14 2020-03-06 合肥工业大学 Robot positioning and mapping method and device based on depth image

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SHAOPENG LI等: "Metric Learning for Patch-Based 3-D Image Registratio", 《IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING》 *
SHAO-PENG LI等: "Semi-direct monocular visual and visual-inertial SLAM with loop closure detection", 《ROBOTICS AND AUTONOMOUS SYSTEMS》 *
张国良等: "考虑多位姿估计约束的双目视觉里程计", 《控制与决策》 *
机器人事业: "ORB-SLAM:a Versatile and Accurate Monocular SLAM System 论文笔记", 《HTTP://ZHEHANGT.GITHUB.IO/2017/04/20/SLAM/ORBSLAM/ORBSLAMPAPER》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115980A (en) * 2020-08-25 2020-12-22 西北工业大学 Binocular vision odometer design method based on optical flow tracking and point line feature matching
CN112025709A (en) * 2020-08-31 2020-12-04 东南大学 Mobile robot positioning system and method based on vehicle-mounted camera vision
CN112270710A (en) * 2020-11-16 2021-01-26 Oppo广东移动通信有限公司 Pose determination method, pose determination device, storage medium, and electronic apparatus
CN112288816A (en) * 2020-11-16 2021-01-29 Oppo广东移动通信有限公司 Pose optimization method, pose optimization device, storage medium and electronic equipment
CN112288816B (en) * 2020-11-16 2024-05-17 Oppo广东移动通信有限公司 Pose optimization method, pose optimization device, storage medium and electronic equipment
CN112270710B (en) * 2020-11-16 2023-12-19 Oppo广东移动通信有限公司 Pose determining method, pose determining device, storage medium and electronic equipment
CN112651997A (en) * 2020-12-29 2021-04-13 咪咕文化科技有限公司 Map construction method, electronic device, and storage medium
CN112651997B (en) * 2020-12-29 2024-04-12 咪咕文化科技有限公司 Map construction method, electronic device and storage medium
CN112734839A (en) * 2020-12-31 2021-04-30 浙江大学 Monocular vision SLAM initialization method for improving robustness
CN112884838B (en) * 2021-03-16 2022-11-15 重庆大学 Robot autonomous positioning method
CN112884838A (en) * 2021-03-16 2021-06-01 重庆大学 Robot autonomous positioning method
CN113034582A (en) * 2021-03-25 2021-06-25 浙江商汤科技开发有限公司 Pose optimization device and method, electronic device and computer readable storage medium
CN113361400A (en) * 2021-06-04 2021-09-07 清远华奥光电仪器有限公司 Head posture estimation method and device and storage medium
CN113393505B (en) * 2021-06-25 2023-11-03 浙江商汤科技开发有限公司 Image registration method, visual positioning method, related device and equipment
CN113393505A (en) * 2021-06-25 2021-09-14 浙江商汤科技开发有限公司 Image registration method, visual positioning method, related device and equipment
CN113624222A (en) * 2021-07-30 2021-11-09 深圳市优必选科技股份有限公司 Map updating method, robot and readable storage medium
CN114399532A (en) * 2022-01-06 2022-04-26 广东汇天航空航天科技有限公司 Camera position and posture determining method and device
CN114549612A (en) * 2022-02-25 2022-05-27 北京百度网讯科技有限公司 Model training and image processing method, device, equipment and storage medium
CN115371661A (en) * 2022-08-12 2022-11-22 深圳市优必选科技股份有限公司 Robot, and method, device and storage medium for establishing image of robot

Also Published As

Publication number Publication date
CN111445526B (en) 2023-08-04

Similar Documents

Publication Publication Date Title
CN111445526B (en) Method, device and storage medium for estimating pose of image frame
CN110555901B (en) Method, device, equipment and storage medium for positioning and mapping dynamic and static scenes
CN110322500B (en) Optimization method and device for instant positioning and map construction, medium and electronic equipment
Mur-Artal et al. ORB-SLAM: a versatile and accurate monocular SLAM system
CN111862296B (en) Three-dimensional reconstruction method, three-dimensional reconstruction device, three-dimensional reconstruction system, model training method and storage medium
CN111707281B (en) SLAM system based on luminosity information and ORB characteristics
CN110533587A (en) A kind of SLAM method of view-based access control model prior information and map recovery
CN109544636A (en) A kind of quick monocular vision odometer navigation locating method of fusion feature point method and direct method
CN111462207A (en) RGB-D simultaneous positioning and map creation method integrating direct method and feature method
US20130335529A1 (en) Camera pose estimation apparatus and method for augmented reality imaging
Prankl et al. RGB-D object modelling for object recognition and tracking
CN108776976B (en) Method, system and storage medium for simultaneously positioning and establishing image
CN112001859B (en) Face image restoration method and system
JP2011008687A (en) Image processor
JP2007183256A (en) Image processing device and method therefor
CN110349212B (en) Optimization method and device for instant positioning and map construction, medium and electronic equipment
US20150098607A1 (en) Deformable Surface Tracking in Augmented Reality Applications
CN111928842B (en) Monocular vision based SLAM positioning method and related device
CN112418288A (en) GMS and motion detection-based dynamic vision SLAM method
CN113763466B (en) Loop detection method and device, electronic equipment and storage medium
CN111829522B (en) Instant positioning and map construction method, computer equipment and device
Xue et al. Fisheye distortion rectification from deep straight lines
CN111951158B (en) Unmanned aerial vehicle aerial image splicing interruption recovery method, device and storage medium
CN113610967B (en) Three-dimensional point detection method, three-dimensional point detection device, electronic equipment and storage medium
CN112270748B (en) Three-dimensional reconstruction method and device based on image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant