CN112651997A

CN112651997A - Map construction method, electronic device, and storage medium

Info

Publication number: CN112651997A
Application number: CN202011598547.8A
Authority: CN
Inventors: 陈明扬; 吴一飞; 周巍; 刘军; 张祥通
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-13
Anticipated expiration: 2040-12-29
Also published as: CN112651997B

Abstract

The invention provides a map construction method, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a current image frame; determining the matching relation between the characteristic points of the current image frame and the characteristic points of a plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames continuously acquired with the current image frame according to a time sequence; and based on the matching relationship, carrying out image frame tracking on the current image frame, and based on the current pose information obtained by image frame tracking, projecting the feature points of the current image frame into the three-dimensional map to obtain newly-built map points of the three-dimensional map. The method, the electronic equipment and the storage medium provided by the invention effectively avoid the problem of missing matching of the feature points, improve the utilization rate of the feature points, and contribute to reducing the required quantity of feature point extraction, thereby improving the calculation efficiency; and the matching relation among more characteristic points is used as a constraint condition for image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Description

Map construction method, electronic device, and storage medium

Technical Field

The present invention relates to the field of augmented reality technologies, and in particular, to a map construction method, an electronic device, and a storage medium.

Background

Visual SLAM (Simultaneous Localization and Mapping) is a technology that accomplishes environmental perception by a camera.

The current visual SLAM mainly comprises the following four stages: initializing a 3D map, tracking an image frame, creating a map point and optimizing a rear end. The 3D map initialization and the image frame tracking are both required to be applied to the matching relation of the feature points in the adjacent images of the front frame and the back frame. However, only the feature point matching of two adjacent images in the front and back frames is used for initializing the 3D map and executing image frame tracking, and the tracking failure may be caused due to too little feature constraint relationship, which affects the map construction effect.

Disclosure of Invention

The invention provides a map construction method, electronic equipment and a storage medium, which are used for solving the problem of poor map construction effect caused by too few feature constraints in the conventional map construction method.

The invention provides a map construction method, which comprises the following steps:

determining a current image frame;

determining a matching relation between the feature points of the current image frame and the feature points of a plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames continuously acquired with the current image frame according to a time sequence;

and based on the matching relationship, carrying out image frame tracking on the current image frame, and based on current pose information obtained by image frame tracking, projecting the feature points of the current image frame into a three-dimensional map to obtain newly-built map points of the three-dimensional map.

According to a map construction method provided by the present invention, the image frame tracking of the current image frame based on the matching relationship includes:

determining estimated pose information of the current image frame;

based on the estimated pose information, dividing the feature points of the current image frame into associated feature points and unassociated feature points, wherein the associated feature points are feature points associated with map points of the three-dimensional map;

determining feature points matched with the feature points in the plurality of sliding window image frames in the unassociated feature points as matched feature points based on the matching relationship;

and if the number of the associated characteristic points and the number of the matched characteristic points meet preset conditions, taking the estimated pose information as the current pose information.

According to the map construction method provided by the invention, the preset condition is that the number of the associated feature points is greater than a first preset number threshold, or the number of the associated feature points is less than or equal to the first preset number threshold and the number of the matched feature points is greater than a second preset number threshold.

According to the map construction method provided by the invention, the dividing of the feature points of the current image frame into associated feature points and unassociated feature points based on the estimated pose information comprises the following steps:

based on the estimated pose information, projecting candidate map points of the three-dimensional map into the current image frame to obtain two-dimensional position information of the candidate map points; the candidate map points are map points matched with the feature points of the current image frame;

and dividing the feature points of the current image frame into associated feature points and unassociated feature points based on the two-dimensional position information of the candidate map points and the two-dimensional position information of the feature points of the current image frame matched with the candidate map points.

According to the map construction method provided by the invention, the three-dimensional map is determined based on the following steps:

determining at least three initial image frames which are continuously acquired;

and projecting the characteristic points of each initial image frame into a three-dimensional space based on the matching relationship among the characteristic points of the at least three initial image frames and the relative pose relationship among the at least three initial image frames to obtain an initialized three-dimensional map.

According to a map construction method provided by the present invention, the projecting the feature points of each initial image frame into a three-dimensional space based on the matching relationship between the feature points of the at least three initial image frames and the relative pose relationship between the at least three initial image frames to obtain an initialized three-dimensional map includes:

based on the matching relation between the feature points of the first and the last two initial image frames in the at least three initial image frames and the relative pose relation between the first and the last two initial image frames, projecting the feature points of the first and the last two initial image frames into a three-dimensional space;

based on the matching relationship between the feature points of the rest initial image frames in the at least three initial image frames and the feature points of the head and tail two initial image frames and the relative pose relationship between the rest initial image frames and the head and tail two initial image frames, projecting the feature points of the rest initial image frames into the three-dimensional space to obtain an initialized three-dimensional map.

According to the map construction method provided by the invention, the projecting the feature points of the current image frame to the three-dimensional map to obtain the newly-built map points of the three-dimensional map further comprises:

and taking the matching relation between the feature points of the current image frame and the feature points of the plurality of sliding window image frames as a common-view constraint relation between the current image frame and the plurality of sliding window image frames, and carrying out global pose optimization on the three-dimensional map.

According to a map construction method provided by the present invention, the determining a matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames includes:

determining the number of sliding window image frames in the sliding window structure body;

if the number of the sliding window image frames is smaller than the preset length of the sliding window, adding the current image frame into the sliding window structural body; otherwise, deleting the sliding window image frame with the earliest acquisition time in the sliding window structure body and the matching relation between the characteristic point of the sliding window image frame and the characteristic points of other sliding window image frames, and adding the current image frame into the sliding window structure body;

and adding the matching relation between the characteristic points of the current image frame and the characteristic points of other sliding window image frames into the sliding window structure.

The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of any one of the map construction methods.

The invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the mapping methods described above.

According to the map construction method, the electronic device and the storage medium, the image frame tracking is carried out on the current image frame based on the matching relation between the feature points of the current image frame and the feature points of the plurality of sliding window image frames, the problem that the feature points are not matched is effectively avoided, the utilization rate of the feature points is improved, the requirement for extracting the feature points is reduced, and therefore the calculation efficiency is improved; and the matching relation among more characteristic points is used as a constraint condition for image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Drawings

In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a map construction method provided by the present invention;

FIG. 2 is a schematic flow chart of an image frame tracking method provided by the present invention;

FIG. 3 is a schematic flow chart of a three-dimensional map initialization method provided by the present invention;

FIG. 4 is a flowchart illustrating a feature point matching method according to an embodiment of the present invention;

FIG. 5 is a second schematic flow chart of the map construction method provided by the present invention;

FIG. 6 is a schematic structural diagram of a map building apparatus provided in the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The current visual SLAM needs to be applied to the matching relation of feature points in two adjacent images in front and back frames in the 3D map initialization and image frame tracking stages. However, in such a processing method, the constraint relationship between the feature points is too small, and particularly, when there are feature points that are missed to be detected in two adjacent images in front and back frames, there are fewer pairs of feature points that can be successfully matched, and the utilization rate of a large number of feature points obtained by image feature extraction is extremely low. Especially, in the weak texture scene, the number of the feature points is insufficient, so that the probability of tracking failure is greatly increased, and the mapping effect is influenced. In order to increase the number of successfully matched feature point pairs, more feature points may need to be further extracted, and the extraction of a large number of feature points inevitably results in a reduction in image feature extraction efficiency, causing a delay in three-dimensional map construction.

In view of the above, the present invention provides a map construction method. Fig. 1 is a schematic flow chart of a map construction method provided by the present invention, and as shown in fig. 1, the method includes:

at step 110, a current image frame is determined.

And step 120, determining a matching relationship between the feature points in the current image frame and the feature points in the plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames continuously acquired with the current image frame according to a time sequence.

Specifically, the current image frame is the two-dimensional image frame acquired at the current time in the image acquisition stage of the visual SLAM.

After obtaining the current image frame, feature point extraction may be performed on the current image frame, so as to obtain a plurality of feature points in the current image frame, specifically, attributes of the plurality of feature points in the current image, and feature representation of each feature point, where the attribute of the feature point may be an image coordinate, a scale, a direction, and the like of the feature point, which is not specifically limited in this embodiment of the present invention.

After the feature points of the current image frame are obtained, the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames can be determined. Here, the plurality of sliding window image frames of the current image frame are a plurality of image frames acquired before the current image frame, and the plurality of sliding window image frames and the current image frame are acquired continuously, for example, 5 image frames before the current image frame are preset as the sliding window image frame of the current image frame, and the current image frame is denoted as Img (t), so that the sliding window image frames of the current image frame may be Img (t-5), Img (t-4), Img (t-3), Img (t-2), and Img (t-1). Here, the number of sliding window image frames is at least 2. Considering that the pose of the image frame acquisition equipment is changed more smoothly in the image acquisition process, the corresponding acquisition visual angle is changed more smoothly, the continuous acquisition characteristics of the plurality of sliding window image frames and the current image frame in time sequence can cause the acquisition areas corresponding to the sliding window image frames and the current image frame to be overlapped, and the characteristic points in the same acquisition area may appear in the multi-frame image, so that the characteristic points of the current image frame may be matched with the characteristic points of the plurality of sliding window image frames.

The matching relation between the feature points of the current image frame and the feature points of the multiple sliding window image frames represents whether each feature point of the current image frame is matched with each feature point of each sliding window image frame, whether the feature points are matched can be judged according to the distance between feature representations of the two feature points, and the distance between the feature representations can be realized by common similarity calculation methods such as Euclidean distance and Chebyshev distance.

Compared with the common method, the method only considers the matching relationship of the feature points in the adjacent images of the front frame and the back frame, and matches the feature points of the current image frame with the feature points of the plurality of sliding window image frames respectively, so that the matching success rate of the feature points in the current image frame can be effectively improved, the number of the feature point pairs is increased, and the constraint relationship among the feature points of each image frame is enriched.

And step 130, based on the matching relationship, performing image frame tracking on the current image frame, and based on the current pose information obtained by image frame tracking, projecting the feature points of the current image frame into the three-dimensional map to obtain new map points of the three-dimensional map.

Specifically, after the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames is obtained, the image frame tracking can be performed on the current image frame by using the matching relationship and combining the pose information of the plurality of sliding window image frames, so as to determine the pose information of the current image frame, that is, the current pose information. The pose information here is used to reflect the pose of the image frame acquisition device, i.e. the camera, such as the rotational matrix and translation vector of the camera.

After the current pose information is obtained, the feature points of the current image frame can be projected into the three-dimensional map based on the current pose information, so that the feature points of the current image frame are converted from a two-dimensional image coordinate system to a three-dimensional world coordinate system, the feature points projected under the three-dimensional world coordinate system can be used as newly-built map points in the three-dimensional map, and the three-dimensional map updating based on the current image frame is realized.

According to the method provided by the embodiment of the invention, based on the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames, the image frame tracking is carried out on the current image frame, so that the problem of missing matching of the feature points is effectively avoided, the utilization rate of the feature points is improved, the requirement for extracting the feature points is favorably reduced, and the calculation efficiency is improved; and the matching relation among more characteristic points is used as a constraint condition for image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Based on the foregoing embodiment, fig. 2 is a schematic flowchart of an image frame tracking method provided by the present invention, and as shown in fig. 2, in step 130, performing image frame tracking on the current image frame based on the feature point matching relationship includes:

131, determining estimated pose information of a current image frame;

step 132, based on the estimated pose information, dividing the feature points of the current image frame into associated feature points and unassociated feature points, wherein the associated feature points are feature points associated with map points of the three-dimensional map;

step 133, determining feature points, which are matched with the feature points in the plurality of sliding window image frames, in the unassociated feature points as matched feature points based on the matching relationship;

and 134, if the number of the associated characteristic points and the number of the matched characteristic points meet preset conditions, taking the estimated pose information as the current pose information.

Specifically, the estimated pose information of the current image frame may be estimated according to the pose information of two image frames before the current image frame, for example, assuming that the current image frame is Img (t), and the two image frames before the current image frame are Img (t-2) and Img (t-1), the estimated pose information of the current frame may be obtained by multiplying the relative pose rotation matrix and the relative offset vector between Img (t-2) and Img (t-1) as the pose speed by the pose information of Img (t-1).

After the estimated position information is obtained, whether each feature point in the current image frame has a related map point in the three-dimensional map or not can be analyzed and judged, and therefore the feature points are divided into two types, namely related feature points and non-related feature points. The associated feature points are feature points where the associated map points already exist in the three-dimensional map, and the unassociated feature points are feature points where the associated map points are not found in the three-dimensional map. The number of the associated feature points is generally used to determine whether the image frame tracking is successful, and the unassociated map feature points are generally used to be projected into the three-dimensional map after the image frame tracking is successful to form new map points.

Here, the association between the feature point and the map point may be understood as that both are similar in feature representation and location, wherein the similarity in feature representation may be evaluated by a bag-of-words search technique or other feature matching technique, and the similarity in location may be evaluated by projecting the map point from a three-dimensional world coordinate system into a two-dimensional image coordinate system of a current image frame by estimating location information, thereby calculating a distance between the projected map point and the feature point.

The number of matching feature points is also taken into account in the embodiment of the present invention, compared to the conventional method that only uses the number of associated feature points to evaluate whether the image frame tracking of the current image frame is successful. Here, the matching feature points are unassociated feature points where matched feature points exist in the sliding window image frame, and in consideration of matching between the matching feature points and the feature points in the sliding window image frame, position information of the matching feature points in the three-dimensional coordinate system can be directly obtained by applying triangulation based on a relative pose relationship between the current image frame and the sliding window image frame, so that if the number of the matching feature points is high, it can also be considered that image frame tracking of the current image frame is successful.

The method provided by the embodiment of the invention judges whether the image frame tracking is successful or not by combining the number of the associated characteristic points and the number of the matched characteristic points, and compared with the method of simply considering the number of the associated characteristic points, the method can effectively provide the success rate of the image frame tracking, avoids a large amount of calculation caused by the failure of the image frame tracking, effectively reduces the calculated amount of map construction and improves the map construction efficiency while ensuring the map construction quality.

Based on any of the above embodiments, in step 134, the preset condition is that the number of associated feature points is greater than a first preset number threshold, or the number of associated feature points is less than or equal to the first preset number threshold and the number of matched feature points is greater than a second preset number threshold.

Specifically, the first preset number threshold and the second preset number threshold are both preset thresholds. When judging whether the image frame tracking is successful, firstly considering whether the number of the associated feature points exceeds a first preset number threshold, and if so, directly determining that the image frame tracking is successful without considering the number of the matched feature points.

If the number of associated feature points does not exceed the first preset number threshold, then it is considered whether the number of matched feature points exceeds the second preset number threshold, and if so, it can be determined that the image frame tracking is successful.

The method provided by the embodiment of the invention takes the number of the matched characteristic points as an additional condition for judging whether the image frame tracking is successful, thereby reducing the requirement of image frame tracking and improving the success rate of image frame tracking.

Based on any of the above embodiments, step 132 includes:

projecting candidate map points of the three-dimensional map into the current image frame based on the estimated pose information to obtain two-dimensional position information of the candidate map points; the candidate map points are map points matched with the feature points of the current image frame;

Specifically, a map point in the three-dimensional map that matches a feature point of the current image frame may be searched for as a candidate map point that may be associated with the feature point through a bag-of-words search technique or other feature matching technique. After the candidate map point is obtained, the candidate map point can be projected into a two-dimensional image coordinate system of the current image frame from a three-dimensional world coordinate system according to the estimated pose information, so that two-dimensional position information of the candidate map point in the current image frame is obtained.

After the two-dimensional position information of the candidate map point is obtained, the deviation between the two-dimensional position information of the feature point and the candidate map point thereof, such as the euclidean distance between the two-dimensional position information, is analyzed by combining the two-dimensional position information of the corresponding feature point in the current image frame, whether the feature point and the candidate map point thereof are associated is judged by analyzing the magnitude of the deviation, if the deviation is small, the feature point is associated, the feature point has the associated map point in the three-dimensional map, the feature point is taken as the associated feature point, if the deviation is large, the feature point is not associated, the feature point does not have the associated map point in the three-dimensional map, and the feature point is taken as the unassociated feature point.

Based on any of the above embodiments, step 130 further includes: if the number of the associated characteristic points and the number of the matched characteristic points do not meet the preset condition, the current image frame is repositioned based on the historical image frame so as to update the estimated pose information.

Specifically, after the number of associated feature points and the number of matched feature points are determined, the two may be combined to determine whether the image frame tracking of the current image frame is successful, if the number of associated feature points and the number of matched feature points satisfy a preset condition, it is determined that the tracking is successful, otherwise, it is determined that the tracking is failed, and at this time, the current image frame needs to be repositioned.

When the current image frame is repositioned, the current image frame needs to be applied to the historical image frame of the current image frame, wherein the historical image frame of the current image frame is each image frame acquired before the current image frame is acquired. Matching search can be performed through algorithms such as feature word bag and the like, so that a history image frame close to a current image frame is selected from a large number of history image frames to serve as a candidate image frame, feature matching is performed on the current image frame and the candidate image frame, pose recovery is performed on the current frame according to a matching result to achieve repositioning of the current image frame, and a pose recovery result serves as updated estimated pose information.

Based on any of the above embodiments, fig. 3 is a schematic flow chart of the three-dimensional map initialization method provided by the present invention, and as shown in fig. 3, the three-dimensional map is determined based on the following steps:

step 310, determining at least three initial image frames which are continuously acquired;

and 320, projecting the feature points of each initial image frame into a three-dimensional space to obtain an initialized three-dimensional map based on the matching relationship among the feature points of at least three initial image frames and the relative pose relationship among the at least three initial image frames.

Here, the initial image frame refers to an image frame acquired first in an image acquisition stage of the visual SLAM. Different from the common method for initializing the three-dimensional map based on two adjacent frames of images, the number of the initial image frames is at least three. In an embodiment of the present invention, the number of initial image frames for initializing the three-dimensional map may be equal to the sum of the number of image frames of the plurality of sliding window image frames and the current image frame in step 120.

In the at least three initial image frames, matching relations exist between the feature points of every two initial image frames, and relative pose relations exist between every two image frames, so that rich feature point matching constraint relations can be obtained, and triangulation can be performed on the feature points of each initial image frame by combining the matching relations between the feature points of every two initial image frames and the relative pose relations between every two initial image frames, so that the feature points of each initial image frame are projected into a three-dimensional space, and initialization of a three-dimensional map is realized.

The method provided by the embodiment of the invention is used for initializing the three-dimensional map by combining at least three initial image frames, introduces more feature point matching constraints for the initialization of the three-dimensional map, and is favorable for improving the initialization precision of the three-dimensional map.

Based on any of the above embodiments, step 320 includes:

based on the matching relationship between the feature points of the first and the last two initial image frames in the at least three initial image frames and the relative pose relationship between the first and the last two initial image frames, projecting the feature points of the first and the last two initial image frames into a three-dimensional space;

based on the matching relationship between the feature points of the rest initial image frames in the at least three initial image frames and the feature points of the first initial image frame and the last initial image frame and the relative pose relationship between the rest initial image frames and the first initial image frame and the last initial image frame, the feature points of the rest initial image frames are projected into a three-dimensional space to obtain an initialized three-dimensional map.

Specifically, the initial image frames may be divided into two types, one type being a head frame and a tail frame of the initial image frames, i.e., the head and tail two initial image frames, and the other type being each of the remaining initial image frames except the head frame and the tail frame. For example, the initial image frames include the first n frames, and the first two initial image frames, i.e., initial image frames with fid 0 and fid n-1.

In the process of initializing the three-dimensional map, three-dimensional projection of the feature points can be performed on the basis of the matching relationship between the feature points of the first initial image frame and the last initial image frame and the relative pose relationship between the first initial image frame and the last initial image frame. Here, the first and last initial image frames are applied because the first and last initial image frames are two image frames with the largest difference in pose information among all the initial image frames, and the greater the difference in pose information, the more accurate the relative pose relationship obtained based on feature point matching, and considering that the relative pose relationship between the first and last two image frames is higher in accuracy, the two image frames are used as image frames for first projection, which is helpful for establishing an accurate three-dimensional projection reference.

On the basis, the other initial image frames can project the characteristic points of the other initial image frames into the three-dimensional space by combining the relative pose relationship between the other initial image frames and the first and the last initial image frames and the matching relationship between the characteristic points of the other initial image frames and the characteristic points of the first and the last initial image frames so as to fill map points in the three-dimensional space, thereby completing the initialization process of the three-dimensional map.

Further, for the first and last two initial image frames, the projection process of the three-dimensional space is as follows: firstly, solving a basic matrix F12 based on the matching relationship between the feature points of the first and the last initial image frames to obtain the relative pose relationship between the first and the last initial image frames, re-projecting the feature points on the first initial image frame onto the last initial image frame based on the relative pose relationship, and calculating the re-projection error of the re-projected feature points and the feature points matched with the feature points on the last initial image frame. Similarly, the feature point on the last initial image frame is re-projected onto the first initial image frame, and the re-projection error of the re-projected feature point and the feature point matched with the feature point on the first initial image frame is calculated. And finally, judging the inner point and the outer point under the relative pose relation by combining the reprojection error, if the reprojection error is less than or equal to a preset error threshold value, determining the corresponding characteristic point as the inner point, and if the reprojection error is greater than the preset error threshold value, determining the corresponding characteristic point as the outer point. And if the number of the inner points is greater than the preset inner point threshold value, determining that the relative pose relationship is correct, and performing three-dimensional coordinate recovery by triangulation aiming at the inner points and not performing three-dimensional coordinate recovery aiming at the outer points. And if the number of the inner points is less than the preset inner point threshold value, deleting the first initial image frame, taking the second initial image frame as the first initial image frame, taking the latest acquired initial image frame as the latest initial image frame, and repeating the operation.

Based on any of the above embodiments, step 130 further includes:

Specifically, global pose optimization refers to that all map points in a three-dimensional map and all image frames used for constructing the three-dimensional map are subjected to a universal map optimization technology to construct a nonlinear optimization equation so as to perform polar global clustering constraint adjustment. The nonlinear optimization method covers the relationship between the nodes formed by map points, the bitmap nodes formed by image frames and the edges of the feature point bitmaps associated with the map points and the image frames, and is used for realizing error calculation.

Because the embodiment of the invention performs global pose optimization on the basis of obtaining the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames in the step 120, the edge constraint relationship between the bitmap nodes corresponding to each image frame is increased, the back-end optimization effect is favorably improved, and the accumulated pose drift in the running process of the visual SLAM is reduced.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of a feature point matching method provided in an embodiment of the present invention, and as shown in fig. 4, step 120 includes:

step 121, determining the number of sliding window image frames in the sliding window structure body;

step 122, if the number of the sliding window image frames is less than the preset length of the sliding window, adding the current image frame into the sliding window structure; otherwise, deleting the sliding window image frame with the earliest acquisition time in the sliding window structure body and the matching relation between the characteristic point of the sliding window image frame and the characteristic points of other sliding window image frames, and adding the current image frame into the sliding window structure body;

and step 123, adding the matching relationship between the feature points of the current image frame and the feature points of other sliding window image frames into the sliding window structure.

Specifically, in order to realize matching between feature points of a current image frame and feature points of a plurality of sliding window image frames, the sliding window structure is arranged in the embodiment of the invention and used for storing the sliding window image frame used for performing feature point matching with the current image frame and storing a matching relation obtained by performing feature point matching between the current image frame and each sliding window image frame so as to perform feature point matching on the next image frame.

The sliding window structure body can store each sliding window image frame and also can store the matching relation between the characteristic points of each sliding window image frame, so that after the characteristic points of the current image frame are matched with the characteristic points of any sliding window image frame, the matching relation between the characteristic points of the sliding window image frame and the characteristic points of other sliding window image frames can be combined, and the matching relation between the characteristic points of the current image frame and the characteristic points of other sliding window image frames can be quickly obtained.

Before feature point matching is carried out on a current image frame, whether the number of the sliding window image frames in the sliding window structure body is smaller than a preset sliding window length or not is judged. Here, the preset sliding window length is the maximum number of sliding window image frames that the preset sliding window structure can store. If the number of the sliding window image frames is less than the preset length of the sliding window, the sliding window structural body is not full, the current image frames can be directly stored in the sliding window structural body as the sliding window image frames, if the number of the sliding window image frames is equal to the preset length of the sliding window, the sliding window structural body is full, the sliding window image frames collected at the earliest in the sliding window structural body need to be deleted, the matching relation between the characteristic points of the sliding window image frames collected at the earliest and the other sliding window image frames needs to be deleted, and then the current image frames are stored in the sliding window structural body as the sliding window image frames.

When the current image frame is stored in the sliding window structure as the sliding window image frame, or after the current image frame is stored in the sliding window structure as the sliding window image frame, the feature points of the current image frame can be respectively matched with the feature points of each sliding window image frame in the sliding window structure, so that the matching relationship between the feature points of the current image frame and the feature points of each sliding window image frame is obtained, and the matching relationship is stored in the sliding window structure together.

According to the method provided by the embodiment of the invention, through the arrangement and application of the sliding window structure body, the matching relation among the characteristic points is enriched, and meanwhile, the matching efficiency of the characteristic points is effectively ensured.

Based on any embodiment, the sliding window structure comprises four parts, namely an image frame list, an image frame ID list, a sliding window feature point structure and the number of image frames in the sliding window.

The image frame list stores information of all sliding window image frames in a current sliding window, the image frame ID list stores the ID of all sliding window image frames in the sliding window, the image frame number in the sliding window stores the number of sliding window image frames in a current sliding window structure, and the sliding window feature point structure stores the matching relation of feature points of all sliding window image frames in the sliding window.

Further, the sliding window feature point structure may include attribute information of each feature point, where the attribute information includes a Frame ID, coordinates, a descriptor, a map point ID, a common view, a matching flag, and a common view Frame information list. Wherein, FrameID is an image frame ID to which the feature point belongs; the coordinates represent image coordinates (x, y) in the image frame where the feature points are located; a descriptor, i.e., a feature representation of a feature point; map point IDs, i.e., IDs of map points of the three-dimensional map associated with the feature points; the common view represents the number of image frames which can be successfully matched with the feature points in the sliding window; the matching flag bit represents a flag indicating whether the feature point has a feature point which is successfully matched in all image frames in the sliding window, if the feature point which is successfully matched exists, the flag bit is stored into the ID of the feature point which is successfully matched, and if the feature point which is successfully matched does not exist, the flag bit is stored as-1; the common view Frame information list stores information of the sliding window image Frame successfully matched with the feature point.

After obtaining a current image Frame, firstly reading the number of image frames in a sliding window of a sliding window structure body, if the number of image frames in the sliding window is equal to the length of a preset sliding window, moving an image Frame Iold acquired earliest in the sliding window structure body out of an image Frame list of the sliding window structure body, moving the Iold ID out of an image Frame ID list of the sliding window structure body, removing each feature point in the Iold from a sliding window feature point structure body of the sliding window structure body, meanwhile, removing a common view number-1 of feature points which have a matching relation with the feature points of the Iold in the feature structure body, removing the feature ID corresponding to the Iold in a matching mark position, removing information of the Iold in a common view Frame information list, and then storing the current image Frame into the sliding window structure body.

And if the number of the image frames in the sliding window is less than the preset length of the sliding window, directly storing the current image frame into the sliding window structural body.

The steps of storing the current image frame in the sliding window structure are specifically as follows:

when the sliding window structure body is empty, image Frame information and an image Frame ID of a current image Frame can be directly stored in the sliding window structure body, the number of the image frames in the sliding window is updated to be 1, feature points of the current image Frame are stored in the sliding window feature point structure body, the common view number of all the feature points is set to be 1, a matching identification bit is set to be 0, and a common view Frame information list is the current image Frame information;

when the sliding window is not empty, the image Frame information and the image Frame ID of the current image Frame can be stored in the sliding window structure, the number of the image frames in the sliding window is updated to be +1, the feature points of the current image Frame are stored in the sliding window feature point structure, the common view number of the feature points of the current image Frame is initialized to be 1, the matching identification bit is initialized to be 0, and the common view Frame information list is initialized to be the current image Frame information. Matching the feature points of the current image Frame (assuming the feature points Fi of the current image Frame) with the feature points of the rest sliding window image frames in the sliding window feature point structure body, if the matching is successful (assuming the feature points Fj of the matching is successful), updating the common view number +1 of the Fi, adding the ID of the Fj in the matching identification bit, and adding the image Frame information where the Fj is located in the common view Frame information list; and simultaneously updating the total view number +1 of the Fj, matching the identification bits and adding the ID of the Fi, and adding the image Frame information where the Fi is located in the total view Frame information list.

Based on any of the above embodiments, fig. 5 is a second schematic flow chart of the map building method provided by the present invention, as shown in fig. 5, the map building method includes:

firstly, a video stream is acquired through a visual camera, and a current image frame is obtained in the video stream acquisition process.

And detecting the characteristic points of the current image frame to obtain the characteristic points and the characteristic representation of the current image frame.

And placing the current image frame into a sliding window structure body, performing feature point matching on the current image frame and each sliding window image frame in the sliding window structure body, and updating the matching relation between the feature points of each sliding window image frame in the sliding window structure body.

Judging whether the current image frame is applied to the initialization of the three-dimensional map, namely whether the initialization of the three-dimensional map is finished:

if the initialization of the three-dimensional map is not completed, the initialization of the three-dimensional map is executed by combining the matching relation among the characteristic points of each sliding window image frame in the sliding window structure body, and the next image frame is determined after the initialization is completed;

if the initialization of the three-dimensional map is completed, combining the matching relation among the characteristic points of each sliding window image frame in the sliding window structure body, carrying out image frame tracking on the current image frame, and judging whether the image frame tracking is successful:

if the tracking is unsuccessful, entering a repositioning process, updating the estimated pose information of the current image frame through repositioning, and re-tracking the image frame based on the updated estimated pose information;

and if the tracking is successful, projecting the feature points of the current image frame into the three-dimensional map based on the current pose information obtained by image frame tracking to obtain newly-built map points of the three-dimensional map.

After the map points are newly built, the sliding window structure body can be updated according to the incidence relation between the newly added map points and the feature points. And performing back-end optimization on the three-dimensional map with the newly-built map points, and returning to determine the next image frame after the optimization is completed.

Based on any embodiment, the feature point detection of the image frame can be realized by common ORB features, Freak features or an RF-Net feature point extraction method based on deep learning. The method comprises the following specific steps: firstly, constructing an image Gaussian pyramid through downsampling, enabling Gaussian pyramids of different layers to represent different image scales so as to guarantee scale invariance of features, extracting feature points in the pyramids of different scales, and finally performing feature representation calculation based on the scales of the extracted feature points and image blocks of specific sizes of main direction angles, so as to obtain feature points in an image frame and feature representation of the feature points.

Based on any of the above embodiments, fig. 6 is a schematic structural diagram of a map construction apparatus provided by the present invention, as shown in fig. 6, the apparatus includes an image frame determining unit 610, a matching relationship determining unit 620, and an image frame tracking unit 630;

the image frame determining unit 610 is configured to determine a current image frame;

the matching relationship determining unit 620 is configured to determine matching relationships between feature points of the current image frame and feature points of a plurality of sliding window image frames, where the plurality of sliding window image frames are image frames continuously acquired with the current image frame according to a time sequence;

the image frame tracking unit 630 is configured to perform image frame tracking on the current image frame based on the matching relationship, and project the feature points of the current image frame to a three-dimensional map based on current pose information obtained by image frame tracking to obtain new map points of the three-dimensional map.

According to the device provided by the embodiment of the invention, based on the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames, the image frame tracking is carried out on the current image frame, so that the problem of missing matching of the feature points is effectively avoided, the utilization rate of the feature points is improved, the requirement for extracting the feature points is favorably reduced, and the calculation efficiency is improved; and the matching relation among more characteristic points is used as a constraint condition for image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Based on any of the above embodiments, the image frame tracking unit 630 is configured to:

determining estimated pose information of the current image frame;

Based on any of the above embodiments, the preset condition is that the number of the associated feature points is greater than a first preset number threshold, or that the number of the associated feature points is less than or equal to the first preset number threshold and the number of the matched feature points is greater than a second preset number threshold.

Based on any embodiment above, the apparatus further includes a map initialization unit, where the map initialization unit is configured to:

Based on any of the above embodiments, the map initialization unit is configured to:

Based on any of the above embodiments, the apparatus further comprises an optimization unit, the optimization unit is configured to:

Based on any of the above embodiments, the matching relation determining unit 620 is configured to:

Fig. 7 illustrates a physical structure diagram of an electronic device, and as shown in fig. 7, the electronic device may include: a processor (processor)710, a communication Interface (Communications Interface)720, a memory (memory)730, and a communication bus 740, wherein the processor 710, the communication Interface 720, and the memory 730 communicate with each other via the communication bus 740. Processor 710 may invoke logic instructions in memory 730 to perform a mapping method comprising: determining a current image frame; determining a matching relation between the feature points of the current image frame and the feature points of a plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames continuously acquired with the current image frame according to a time sequence; and based on the matching relationship, carrying out image frame tracking on the current image frame, and based on current pose information obtained by image frame tracking, projecting the feature points of the current image frame into a three-dimensional map to obtain newly-built map points of the three-dimensional map.

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the mapping method provided by the above methods, the method comprising: determining a current image frame; determining a matching relation between the feature points of the current image frame and the feature points of a plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames continuously acquired with the current image frame according to a time sequence; and based on the matching relationship, carrying out image frame tracking on the current image frame, and based on current pose information obtained by image frame tracking, projecting the feature points of the current image frame into a three-dimensional map to obtain newly-built map points of the three-dimensional map.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above-provided map construction method, the method comprising: determining a current image frame; determining a matching relation between the feature points of the current image frame and the feature points of a plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames continuously acquired with the current image frame according to a time sequence; and based on the matching relationship, carrying out image frame tracking on the current image frame, and based on current pose information obtained by image frame tracking, projecting the feature points of the current image frame into a three-dimensional map to obtain newly-built map points of the three-dimensional map.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A map construction method, comprising:

determining a current image frame;

2. The map construction method according to claim 1, wherein the image frame tracking the current image frame based on the matching relationship comprises:

determining estimated pose information of the current image frame;

3. The map construction method according to claim 2, wherein the preset condition is that the number of associated feature points is greater than a first preset number threshold, or that the number of associated feature points is less than or equal to the first preset number threshold and the number of matched feature points is greater than a second preset number threshold.

4. The map construction method according to claim 2, wherein the dividing of the feature points of the current image frame into associated feature points and unassociated feature points based on the estimated pose information comprises:

5. The map construction method according to claim 1, wherein the three-dimensional map is determined based on:

6. The map construction method according to claim 5, wherein the projecting the feature points of each initial image frame into a three-dimensional space based on the matching relationship between the feature points of the at least three initial image frames and the relative pose relationship between the at least three initial image frames to obtain an initialized three-dimensional map comprises:

7. The map construction method according to any one of claims 1 to 6, wherein the projecting the feature points of the current image frame into a three-dimensional map to obtain new map points of the three-dimensional map, further comprises:

8. The map construction method according to any one of claims 1 to 6, wherein the determining of the matching relationship between the feature points of the current image frame and the feature points of a plurality of sliding window image frames, respectively, comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the mapping method according to any of claims 1 to 8 are implemented by the processor when executing the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the mapping method according to any one of claims 1 to 8.