CN112651997B

CN112651997B - Map construction method, electronic device and storage medium

Info

Publication number: CN112651997B
Application number: CN202011598547.8A
Authority: CN
Inventors: 陈明扬; 吴一飞; 周巍; 刘军; 张祥通
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2024-04-12
Anticipated expiration: 2040-12-29
Also published as: CN112651997A

Abstract

The invention provides a map construction method, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a current image frame; determining the matching relation between the characteristic points of the current image frame and the characteristic points of a plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames obtained by continuously acquiring the current image frame according to time sequence; and based on the matching relation, carrying out image frame tracking on the current image frame, and based on the current pose information obtained by the image frame tracking, projecting the characteristic points of the current image frame into the three-dimensional map to obtain new map points of the three-dimensional map. The method, the electronic equipment and the storage medium provided by the invention effectively avoid the problem of feature point leakage matching, improve the utilization rate of feature points, and help to reduce the demand of feature point extraction, thereby improving the calculation efficiency; the matching relation among more feature points is used as a constraint condition of image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Description

Map construction method, electronic device and storage medium

Technical Field

The present invention relates to the field of augmented reality technologies, and in particular, to a map construction method, an electronic device, and a storage medium.

Background

Visual SLAM (Simultaneous Localization and Mapping, instant localization and map creation) is a technology that accomplishes environmental awareness through cameras.

Current visual SLAM mainly includes the following four phases: initializing a 3D map, tracking an image frame, creating map points and optimizing the back end. The 3D map initialization and the image frame tracking are applied to the matching relation of the feature points in the adjacent images of the front frame and the rear frame. However, initializing a 3D map and performing image frame tracking based on feature point matching of two adjacent images of the front frame and the rear frame may cause tracking failure due to too few feature constraint relations, and affect the map construction effect.

Disclosure of Invention

The invention provides a map construction method, electronic equipment and a storage medium, which are used for solving the problem that the map construction effect is poor due to the fact that feature constraint is too small in the existing map construction method.

The invention provides a map construction method, which comprises the following steps:

determining a current image frame;

determining matching relations between characteristic points of the current image frame and characteristic points of a plurality of sliding window image frames respectively, wherein the plurality of sliding window image frames are image frames obtained by continuously acquiring the current image frame according to time sequence;

And carrying out image frame tracking on the current image frame based on the matching relation, and projecting the characteristic points of the current image frame into a three-dimensional map based on the current pose information obtained by the image frame tracking to obtain new map points of the three-dimensional map.

According to the map construction method provided by the invention, the image frame tracking is carried out on the current image frame based on the matching relation, and the map construction method comprises the following steps:

determining estimated pose information of the current image frame;

based on the estimated pose information, dividing the feature points of the current image frame into associated feature points and unassociated feature points, wherein the associated feature points are feature points associated with map points of the three-dimensional map;

based on the matching relation, determining feature points matched with feature points in the sliding window image frames in the unassociated feature points as matching feature points;

and if the number of the associated feature points and the number of the matched feature points meet preset conditions, taking the estimated pose information as the current pose information.

According to the map construction method provided by the invention, the preset condition is that the number of the associated feature points is larger than a first preset number threshold, or the number of the associated feature points is smaller than or equal to the first preset number threshold and the number of the matched feature points is larger than a second preset number threshold.

According to the map construction method provided by the invention, the feature points of the current image frame are divided into associated feature points and unassociated feature points based on the estimated pose information, and the map construction method comprises the following steps:

based on the estimated pose information, projecting candidate map points of the three-dimensional map into the current image frame to obtain two-dimensional position information of the candidate map points; the candidate map points are map points matched with the characteristic points of the current image frame;

and dividing the characteristic points of the current image frame into associated characteristic points and unassociated characteristic points based on the two-dimensional position information of the candidate map points and the two-dimensional position information of the characteristic points of the current image frame matched with the candidate map points.

According to the map construction method provided by the invention, the three-dimensional map is determined based on the following steps:

determining at least three initial image frames acquired in succession;

and projecting the characteristic points of each initial image frame into a three-dimensional space based on the matching relation among the characteristic points of the at least three initial image frames and the relative pose relation among the at least three initial image frames to obtain an initialized three-dimensional map.

According to the map construction method provided by the invention, the feature points of each initial image frame are projected into a three-dimensional space to obtain an initialized three-dimensional map based on the matching relation among the feature points of the at least three initial image frames and the relative pose relation among the at least three initial image frames, and the map construction method comprises the following steps:

based on the matching relation between the characteristic points of the first and the last two initial image frames in the at least three initial image frames and the relative pose relation between the first and the last two initial image frames, the characteristic points of the first and the last two initial image frames are projected into a three-dimensional space;

and projecting the characteristic points of the rest initial image frames into the three-dimensional space based on the matching relation between the characteristic points of the rest initial image frames and the characteristic points of the first and the last initial image frames and the relative pose relation between the rest initial image frames and the first and the last initial image frames, so as to obtain an initialized three-dimensional map.

According to the map construction method provided by the invention, the feature points of the current image frame are projected into a three-dimensional map to obtain a new map point of the three-dimensional map, and then the map construction method further comprises the following steps:

And taking the matching relationship between the characteristic points of the current image frame and the characteristic points of the sliding window image frames as a common view constraint relationship between the current image frame and the sliding window image frames, and performing global pose optimization on the three-dimensional map.

According to the map construction method provided by the invention, the determining the matching relationship between the characteristic points of the current image frame and the characteristic points of a plurality of sliding window image frames respectively comprises the following steps:

determining the number of sliding window image frames in the sliding window structure;

if the number of the sliding window image frames is smaller than the preset sliding window length, adding the current image frames into the sliding window structure; otherwise, deleting the sliding window image frame with earliest acquisition time in the sliding window structure and the matching relation between the characteristic points of the sliding window image frame and the characteristic points of other sliding window image frames, and adding the current image frame into the sliding window structure;

and adding the matching relation between the characteristic points of the current image frame and the characteristic points of other sliding window image frames into the sliding window structure.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the map construction methods described above when executing the computer program.

The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the map construction method as described in any of the above.

According to the map construction method, the electronic equipment and the storage medium, based on the matching relation between the characteristic points of the current image frame and the characteristic points of the plurality of sliding window image frames, the image frame tracking is carried out on the current image frame, so that the problem of feature point missing matching is effectively avoided, the utilization rate of the feature points is improved, the demand quantity of feature point extraction is reduced, and the calculation efficiency is improved; the matching relation among more feature points is used as a constraint condition of image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a map construction method according to the present invention;

FIG. 2 is a flow chart of an image frame tracking method provided by the invention;

FIG. 3 is a schematic flow chart of the three-dimensional map initialization method provided by the invention;

FIG. 4 is a flow chart of a feature point matching method according to an embodiment of the present invention;

FIG. 5 is a second flow chart of the map construction method according to the present invention;

FIG. 6 is a schematic diagram of a map construction apparatus according to the present invention;

fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

At present, the visual SLAM needs to be applied to the matching relation of feature points in two adjacent images of a front frame and a rear frame in the stage of initializing a 3D map and tracking an image frame. However, the processing mode makes the constraint relation among the feature points too small, and especially when feature points which are not detected exist in the adjacent images of the front frame and the rear frame, the number of successfully matched feature point pairs is smaller, and the utilization rate of a large number of feature points obtained by extracting the image features is extremely low. Especially in the scene of weak texture, the insufficient number of characteristic points can greatly increase the probability of tracking failure and influence the mapping effect. In order to increase the number of successfully matched pairs of feature points, it may be necessary to further extract more feature points, and extraction of a large number of feature points necessarily results in a decrease in image feature extraction efficiency, causing a delay in three-dimensional map construction.

In this regard, the present invention provides a map construction method. Fig. 1 is a schematic flow chart of a map construction method provided by the present invention, as shown in fig. 1, the method includes:

at step 110, a current image frame is determined.

Step 120, determining matching relations between the feature points in the current image frame and feature points in a plurality of sliding window image frames, wherein the plurality of sliding window image frames are image frames acquired continuously with the current image frame according to time sequence.

Specifically, the current image frame is the two-dimensional image frame acquired at the current moment in the image acquisition stage of the visual SLAM.

After the current image frame is obtained, feature point extraction can be performed on the current image frame, so as to obtain a plurality of feature points in the current image frame, specifically, the attribute of the plurality of feature points in the current image and the feature representation of each feature point, where the attribute of the feature point may be the image coordinates, the scale, the direction and the like of the feature point, and the embodiment of the invention is not limited in detail.

After the feature points of the current image frame are obtained, the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames can be determined. Here, the plurality of sliding window image frames of the current image frame, that is, the plurality of image frames acquired before the current image frame, and the plurality of sliding window image frames and the current image frame are continuously acquired, for example, 5 image frames before the current image frame are preset as sliding window image frames of the current image frame, and the current image frame is denoted as Img (t), the sliding window image frames of the current image frame may be Img (t-5), img (t-4), img (t-3), img (t-2), and Img (t-1). Here, the number of sliding window image frames is at least 2. Considering that the pose transformation of the image frame acquisition equipment in the image acquisition process is gentle, the corresponding acquisition view angle transformation is also gentle, the continuous acquisition characteristics of the plurality of sliding window image frames and the current image frame in time sequence can enable the acquisition areas corresponding to the sliding window image frames and the current image frame to overlap, and the characteristic points in the same acquisition area can possibly appear in multi-frame images, so that the characteristic points of the current image frame can be matched with the characteristic points of the plurality of sliding window image frames.

The matching relation between the characteristic points of the current image frame and the characteristic points of the plurality of sliding window image frames characterizes whether each characteristic point of the current image frame is matched with each characteristic point of each sliding window image frame, whether the characteristic points are matched or not can be judged according to the distance between characteristic representations of the two characteristic points, and the distance between the characteristic representations can be realized by common similarity calculation methods such as Euclidean distance, chebyshev distance and the like.

Compared with the common method, only the matching relation of the characteristic points in the adjacent images of the front frame and the rear frame is considered, the characteristic points of the current image frame are respectively matched with the characteristic points of a plurality of sliding window image frames, the matching success rate of the characteristic points in the current image frame can be effectively improved, the number of the characteristic point pairs is increased, and therefore the constraint relation among the characteristic points of each image frame is enriched.

And 130, carrying out image frame tracking on the current image frame based on the matching relation, and projecting the characteristic points of the current image frame into the three-dimensional map based on the current pose information obtained by the image frame tracking to obtain a new map point of the three-dimensional map.

Specifically, after obtaining the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames, the matching relationship can be utilized to track the current image frame by combining the pose information of the plurality of sliding window image frames, so as to determine the pose information of the current image frame, namely the current pose information. The pose information here is used to reflect the pose of the image frame acquisition device, i.e. the camera, such as the rotation matrix and translation vector of the camera.

After the current pose information is obtained, the characteristic points of the current image frame can be projected into the three-dimensional map based on the current pose information, so that the conversion of the characteristic points of the current image frame from the two-dimensional image coordinate system to the three-dimensional world coordinate system is realized, and the characteristic points projected to the three-dimensional world coordinate system can be used as map points newly built in the three-dimensional map, so that the three-dimensional map updating based on the current image frame is realized.

According to the method provided by the embodiment of the invention, based on the matching relation between the characteristic points of the current image frame and the characteristic points of the plurality of sliding window image frames, the current image frame is subjected to image frame tracking, so that the problem of feature point missing matching is effectively avoided, the utilization rate of the characteristic points is improved, the demand of feature point extraction is reduced, and the calculation efficiency is improved; the matching relation among more feature points is used as a constraint condition of image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Based on the above embodiment, fig. 2 is a flowchart of the image frame tracking method provided by the present invention, as shown in fig. 2, in step 130, image frame tracking is performed on the current image frame based on the feature point matching relationship, including:

Step 131, determining estimated pose information of the current image frame;

step 132, based on the estimated pose information, dividing the feature points of the current image frame into associated feature points and unassociated feature points, wherein the associated feature points are feature points associated with map points of the three-dimensional map;

step 133, determining feature points matched with feature points in a plurality of sliding window image frames in the unassociated feature points based on the matching relation, and taking the feature points as matched feature points;

and step 134, if the number of the associated feature points and the number of the matched feature points meet the preset conditions, taking the estimated pose information as the current pose information.

Specifically, the estimated pose information of the current image frame may be estimated according to pose information of two image frames before the current image frame, for example, assuming that the current image frame is Img (t), and the previous two image frames are Img (t-2) and Img (t-1), and the relative pose rotation matrix and the relative offset vector between Img (t-2) and Img (t-1) may be used as the pose speed to multiply with pose information of Img (t-1) to obtain the estimated pose information of the current frame.

After the estimated position information is obtained, whether each feature point in the current image frame has associated map points in the three-dimensional map or not can be analyzed and judged, so that the feature points are divided into two types of associated feature points and unassociated feature points. Wherein, the associated feature points are feature points of the associated map points in the three-dimensional map, and the unassociated feature points are feature points of the associated map points which are not found in the three-dimensional map. The number of the associated feature points is generally used for judging whether the image frame tracking is successful, and the feature points of the unassociated map are generally used for projecting the image frame tracking into a three-dimensional map after the image frame tracking is successful to form a newly built map point.

Here, the association between the feature point and the map point may be understood as being similar in both feature representation and position, where the similarity in feature representation may be evaluated by a word bag search technique or other feature matching technique, and the similarity in position may be evaluated by projecting the map point from the three-dimensional world coordinate system into the two-dimensional image coordinate system of the current image frame by estimating the position information, thereby calculating the distance between the projected map point and the feature point.

Compared with the conventional method for evaluating whether the image frame tracking of the current image frame is successful by only applying the number of the associated feature points, the embodiment of the invention also takes the number of the matched feature points into consideration. Here, the matching feature points are unassociated feature points with the matched feature points in the sliding window image frame, and considering that the matching feature points are matched with the feature points in the sliding window image frame, the position information of the matching feature points under the three-dimensional coordinate system can be directly obtained by applying triangulation based on the relative pose relationship between the current image frame and the sliding window image frame, so that if the number of the matching feature points is higher, the image frame tracking of the current image frame can be considered as successful.

The method provided by the embodiment of the invention combines the number of the associated characteristic points and the number of the matched characteristic points to judge whether the image frame tracking is successful, and compared with the method which simply considers the number of the associated characteristic points, the method can effectively provide the success rate of the image frame tracking, avoid a large amount of calculation caused by the failure of the image frame tracking, effectively reduce the calculation amount of map construction while ensuring the quality of map construction, and improve the efficiency of map construction.

Based on any of the foregoing embodiments, in step 134, the preset condition is that the number of associated feature points is greater than a first preset number threshold, or the number of associated feature points is less than or equal to the first preset number threshold and the number of matching feature points is greater than a second preset number threshold.

Specifically, the first preset number of thresholds and the second preset number of thresholds are preset thresholds. When judging whether the image frame tracking is successful, firstly considering whether the number of the associated feature points exceeds a first preset number threshold, and if so, directly determining that the image frame tracking is successful without considering the number of the matched feature points.

If the number of associated feature points does not exceed the first preset number threshold, then consider again whether the number of matching feature points exceeds the second preset number threshold, and if so, also determine that the image frame tracking was successful.

The method provided by the embodiment of the invention takes the number of the matched characteristic points as an additional condition for judging whether the image frame tracking is successful, thereby reducing the image frame tracking requirement and improving the image frame tracking success rate.

Based on any of the above embodiments, step 132 includes:

projecting candidate map points of the three-dimensional map into the current image frame based on the estimated pose information to obtain two-dimensional position information of the candidate map points; the candidate map points are map points matched with the feature points of the current image frame;

the feature points of the current image frame are classified into associated feature points and unassociated feature points based on the two-dimensional position information of the candidate map points and the two-dimensional position information of the feature points of the current image frame that are matched with the candidate map points.

Specifically, map points in the three-dimensional map, which are matched with the feature points of the current image frame, can be searched for as candidate map points that can be associated with the feature points through a word bag search technology or other feature matching technologies. After obtaining the candidate map points, the candidate map points can be projected from the three-dimensional world coordinate system to the two-dimensional image coordinate system of the current image frame according to the estimated pose information, so that the two-dimensional position information of the candidate map points in the current image frame is obtained.

After the two-dimensional position information of the candidate map points is obtained, the two-dimensional position information of the corresponding feature points in the current image frame can be combined, deviation between the two-dimensional position information of the feature points and the candidate map points is analyzed, for example, euclidean distance between the two-dimensional position information is analyzed, whether the feature points and the candidate map points are related or not is judged through analyzing the size of the deviation, if the deviation is smaller, the association between the feature points and the candidate map points is indicated, the feature points are related map points in the three-dimensional map, the feature points are taken as related feature points, if the deviation is larger, the feature points are not related, the feature points are not taken as non-related feature points in the three-dimensional map.

Based on any of the above embodiments, step 130 further includes: if the number of the associated feature points and the number of the matched feature points do not meet the preset conditions, repositioning the current image frame based on the historical image frame so as to update the estimated pose information.

Specifically, after the number of the associated feature points and the number of the matched feature points are determined, whether the image frame tracking of the current image frame is successful or not can be judged by combining the number of the associated feature points and the number of the matched feature points, if the number of the associated feature points and the number of the matched feature points meet preset conditions, the tracking is determined to be successful, otherwise, the tracking is determined to be failed, and repositioning is needed to be carried out on the current image frame.

When repositioning the current image frame, it is necessary to apply to the historical image frames of the current image frame, where the historical image frames of the current image frame are each image frame acquired before the current image frame is acquired. Matching search can be performed through algorithms such as feature word bags, so that a history image frame close to a current image frame is selected from a large number of history image frames to serve as a candidate image frame, then feature matching is performed on the current image frame and the candidate image frame, pose recovery is performed on the current frame according to a matching result, repositioning of the current image frame is achieved, and the pose recovery result is used as updated estimated pose information.

Based on any of the above embodiments, fig. 3 is a schematic flow chart of the three-dimensional map initialization method provided by the present invention, and as shown in fig. 3, the three-dimensional map is determined based on the following steps:

step 310, determining at least three initial image frames acquired in succession;

step 320, based on the matching relationship between the feature points of at least three initial image frames and the relative pose relationship between at least three initial image frames, the feature points of each initial image frame are projected into the three-dimensional space, so as to obtain an initialized three-dimensional map.

Here, the initial image frame refers to an image frame that is acquired first in an image acquisition stage of the visual SLAM. The number of initial image frames is at least three, which is different from the conventional method for initializing the three-dimensional map based on two adjacent frames of images. In an embodiment of the present invention, the number of initial image frames used for initializing the three-dimensional map may be consistent with the sum of the number of image frames of the plurality of sliding window image frames and the current image frame in step 120.

In at least three initial image frames, the characteristic points of every two initial image frames have a matching relationship, and every two image frames have a relative pose relationship, so that a rich characteristic point matching constraint relationship can be obtained, and by combining the matching relationship between the characteristic points of every two initial image frames and the relative pose relationship between every two initial image frames, the characteristic points of each initial image frame can be triangulated, so that the characteristic points of each initial image frame are projected into a three-dimensional space, and the initialization of a three-dimensional map is realized.

According to the method provided by the embodiment of the invention, the three-dimensional map is initialized by combining at least three initial image frames, so that more characteristic point matching constraints are introduced for initializing the three-dimensional map, and the initialization precision of the three-dimensional map is improved.

Based on any of the above embodiments, step 320 includes:

based on the matching relation between the characteristic points of the first and the last two initial image frames in at least three initial image frames and the relative pose relation between the first and the last two initial image frames, the characteristic points of the first and the last two initial image frames are projected into a three-dimensional space;

based on the matching relation between the characteristic points of the rest initial image frames and the characteristic points of the first initial image frame and the last initial image frame in at least three initial image frames, the characteristic points of the rest initial image frames are projected into a three-dimensional space to obtain an initialized three-dimensional map.

Specifically, the initial image frames may be divided into two types, one type being a first frame and a last frame of the initial image frames, i.e., the first and last two initial image frames, and the other type being the remaining initial image frames except the first frame and the last frame. For example, the initial image frames include the first n frames of images, the first and last two initial image frames, i.e., initial image frames of fid=0 and fid=n-1.

In the process of initializing the three-dimensional map, three-dimensional projection of the feature points can be firstly performed based on the matching relationship between the feature points of the first and the last two initial image frames and the relative pose relationship between the first and the last two initial image frames. Here, the first and last two initial image frames are applied firstly because the first and last two initial image frames are the two image frames with the largest difference of position information in all initial image frames, and the more the difference of position information is, the more accurate the relative position relation obtained based on feature point matching is, and the two image frames are taken as the first projected image frames in consideration of the higher accuracy of the relative position relation between the first and last two image frames, so that the accurate three-dimensional projection reference can be established.

On the basis, the rest initial image frames can be combined with the relative pose relation between the initial image frames and the head and the tail, and the matching relation between the characteristic points of the rest initial image frames and the characteristic points of the head and the tail initial image frames, and the characteristic points of the rest initial image frames are projected into the three-dimensional space to fill map points in the three-dimensional space, so that the initialization flow of the three-dimensional map is completed.

Further, for the first and last two initial image frames, the three-dimensional space projection process is as follows: firstly, based on a matching relation between characteristic points of the first initial image frame and the second initial image frame, a basic matrix F12 is solved, so that a relative pose relation between the first initial image frame and the second initial image frame is obtained, based on the relative pose relation, the characteristic points on the first initial image frame are re-projected onto the last initial image frame, and re-projection errors of the re-projected characteristic points and the matched characteristic points on the last initial image frame are calculated. Similarly, the feature points on the final initial image frame are re-projected onto the first initial image frame, and the re-projection errors of the re-projected feature points and the matched feature points on the first initial image frame are calculated. And finally, judging the inner point and the outer point under the relative pose relation by combining the re-projection errors, if the re-projection errors are smaller than or equal to a preset error threshold, determining the corresponding characteristic points as inner points, and if the re-projection errors are larger than the preset error threshold, determining the corresponding characteristic points as outer points. If the number of the inner points is larger than a preset inner point threshold value, determining that the relative pose relation is correct, performing three-dimensional coordinate recovery on the inner points through triangulation, and performing no three-dimensional coordinate recovery on the outer points. If the number of the inner points is smaller than the preset inner point threshold value, deleting the first initial image frame, taking the second initial image frame as the first initial image frame, taking the latest initial image frame acquired as the last initial image frame, and repeating the operation.

Based on any of the above embodiments, step 130 further includes:

Specifically, global pose optimization refers to constructing a nonlinear optimization equation by adopting a general graph optimization technology to construct all map points in a three-dimensional map and all image frames used for constructing the three-dimensional map so as to adjust polar global bundling constraint. The nonlinear optimization method covers the relation between the nodes formed by map points, the bitmap nodes formed by image frames and the edges of the characteristic point bitmaps associated with the map points and the image frames, and is used for realizing error calculation.

Because the embodiment of the invention performs global pose optimization on the basis of the matching relationship between the characteristic points of the current image frame and the characteristic points of the plurality of sliding window image frames obtained in the step 120, the edge constraint relationship between bitmap nodes corresponding to each image frame is increased, the rear-end optimization effect is improved, and the accumulated pose drift in the operation process of the visual SLAM is reduced.

Based on any of the above embodiments, fig. 4 is a flowchart of a feature point matching method according to an embodiment of the present invention, as shown in fig. 4, step 120 includes:

step 121, determining the number of sliding window image frames in the sliding window structure;

step 122, if the number of the sliding window image frames is smaller than the preset sliding window length, adding the current image frame into the sliding window structure; otherwise, deleting the sliding window image frame with the earliest acquisition time in the sliding window structure and the matching relation between the characteristic points of the sliding window image frame and the characteristic points of other sliding window image frames, and adding the current image frame into the sliding window structure;

and step 123, adding the matching relation between the characteristic points of the current image frame and the characteristic points of other sliding window image frames into the sliding window structure.

Specifically, in order to realize the matching between the characteristic points of the current image frame and the characteristic points of a plurality of sliding window image frames, in the embodiment of the invention, a sliding window structure body is arranged for storing the sliding window image frames for carrying out characteristic point matching with the current image frame and storing the matching relation obtained by carrying out characteristic point matching between the current image frame and each sliding window image frame so as to facilitate the characteristic point matching of the next image frame.

The sliding window structure body can store each sliding window image frame and also can store the matching relation between the characteristic points of each sliding window image frame, so that after the characteristic points of the current image frame are matched with the characteristic points of any sliding window image frame, the matching relation between the characteristic points of the sliding window image frame and the characteristic points of other sliding window image frames can be combined, and the matching relation between the characteristic points of the current image frame and the characteristic points of other sliding window image frames can be obtained quickly.

Before feature point matching is performed on a current image frame, whether the number of sliding window image frames in a sliding window structure is smaller than a preset sliding window length is judged. Here, the preset sliding window length is the maximum number of sliding window image frames that can be stored in the preset sliding window structure. If the number of the sliding window image frames is smaller than the preset sliding window length, the sliding window structure is indicated to be not full, the current image frame can be directly stored into the sliding window structure as the sliding window image frame, if the number of the sliding window image frames is equal to the preset sliding window length, the sliding window structure is indicated to be full, the earliest acquired sliding window image frame in the sliding window structure needs to be deleted, and the matching relation between the earliest acquired sliding window image frame and the characteristic points of the rest sliding window image frames is required to be deleted, and then the current image frame is stored into the sliding window structure as the sliding window image frame.

When the current image frame is stored in the sliding window structure body as the sliding window image frame, or after the current image frame is stored in the sliding window structure body as the sliding window image frame, the characteristic points of the current image frame can be respectively matched with the characteristic points of each sliding window image frame in the sliding window structure body, so that the matching relation between the characteristic points of the current image frame and the characteristic points of each sliding window image frame is obtained, and the matching relation is stored in the sliding window structure body.

According to the method provided by the embodiment of the invention, the matching relation between the characteristic points is enriched through the arrangement and the application of the sliding window structure, and meanwhile, the characteristic point matching efficiency is effectively ensured.

Based on any of the above embodiments, the sliding window structure includes four parts of an image frame list, an image frame ID list, a sliding window feature point structure, and a number of image frames within the sliding window.

The image frame list stores information of each sliding window image frame in the current sliding window, the image frame ID list stores IDs of each sliding window image frame in the sliding window, the image frame number in the sliding window stores the number of the sliding window image frames in the current sliding window structure, and the sliding window characteristic point structure stores the matching relation of characteristic points of all the sliding window image frames in the sliding window.

Further, the sliding window feature point structure may include attribute information of each feature point, where the attribute information includes a Frame ID, coordinates, a descriptor, a map point ID, a common view number, a matching flag bit, and a common view Frame information list. Wherein, the frame ID is the image frame ID to which the feature points belong; coordinates represent image coordinates (x, y) within the image frame in which the feature points are located; the descriptors are the feature representation of the feature points; map point IDs, i.e., IDs of map points of a three-dimensional map associated with feature points; the common view number represents the number of image frames which can be successfully matched with the characteristic point in the sliding window; the matching flag bit represents whether the feature point has the mark of the feature point successfully matched in all the image frames in the sliding window, if the feature point successfully matched exists, the flag bit is stored in the feature point ID successfully matched, and if the feature point successfully matched does not exist, the flag bit is stored as-1; the common view Frame information list stores information of sliding window image frames successfully matched with the feature points.

After the current image Frame is obtained, firstly, reading the number of image frames in a sliding window of a sliding window structure, if the number of image frames in the sliding window is equal to the preset sliding window length, moving the earliest acquired sliding window image Frame Iold in the sliding window structure out of an image Frame list of the sliding window structure, moving the Iold ID out of an image Frame ID list of the sliding window structure, removing all characteristic points in the Iold from the sliding window characteristic point structure of the sliding window structure, simultaneously removing the common view number-1 of the characteristic points with the matching relation with the characteristic points of the Iold in the characteristic structure, removing the characteristic ID corresponding to the Iold in a matching zone bit, removing the information of the Iold in a common view Frame information list, and storing the current image Frame into the sliding window structure.

If the number of the image frames in the sliding window is smaller than the preset sliding window length, the current image frames are directly stored in the sliding window structure.

The step of storing the current image frame in a sliding window structure is specifically as follows:

when the sliding window structure is empty, the image Frame information and the image Frame ID of the current image Frame can be directly stored in the sliding window structure, the image Frame number in the sliding window is updated to be 1, the characteristic points of the current image Frame are stored in the sliding window characteristic point structure, the common view number of all the characteristic points is set to be 1, the matching identification bit is set to be 0, and the common view Frame information list is the current image Frame information;

When the sliding window is not empty, the image Frame information and the image Frame ID of the current image Frame can be stored in the sliding window structure body, the image Frame number +1 in the sliding window is updated, the characteristic points of the current image Frame are stored in the sliding window characteristic point structure body, the common view number of the characteristic points of the current image Frame is initialized to 1, the matching identification bit is initialized to 0, and the common view Frame information list is initialized to the current image Frame information. Matching the characteristic points of the current image Frame (assumed to be the characteristic points Fi of the current image Frame) with the characteristic points of the rest sliding window image frames in the sliding window characteristic point structure, if the matching is successful (assumed to be the characteristic points Fj of successful matching), updating the common view number +1 of Fi, adding the ID of Fj in a matching identification bit, and adding the image Frame information where Fj is located in a common view Frame information list; and updating the common view number +1 of Fj, matching the ID of the identification bit added with Fi, and adding the image Frame information where Fi is located in the common view Frame information list.

Based on any one of the above embodiments, fig. 5 is a second schematic flow chart of a map construction method according to the present invention, where, as shown in fig. 5, the map construction method includes:

firstly, a video stream is acquired through a visual camera, and a current image frame is obtained in the video stream acquisition process.

And detecting the characteristic points of the current image frame to obtain the characteristic points and characteristic representation of the current image frame.

And placing the current image frame into a sliding window structure body, performing characteristic point matching with each sliding window image frame in the sliding window structure body, and updating the matching relation between the characteristic points of each sliding window image frame in the sliding window structure body.

Judging whether the current image frame is applied to the initialization of the three-dimensional map, namely, judging whether the initialization of the three-dimensional map is finished or not:

if the initialization of the three-dimensional map is not completed, executing the initialization of the three-dimensional map by combining the matching relation between the characteristic points of each sliding window image frame in the sliding window structure, and returning to determine the next image frame after the initialization is completed;

if the initialization of the three-dimensional map is completed, combining the matching relation between the characteristic points of each sliding window image frame in the sliding window structure, carrying out image frame tracking on the current image frame, and judging whether the image frame tracking is successful or not:

if the tracking is unsuccessful, entering a repositioning process, updating the estimated pose information of the current image frame through repositioning, and carrying out image frame tracking again based on the updated estimated pose information;

and if the tracking is successful, based on the current pose information obtained by the image frame tracking, projecting the characteristic points of the current image frame into the three-dimensional map to obtain a new map point of the three-dimensional map.

After the map points are newly built, the sliding window structure body can be updated according to the association relation between the newly added map points and the feature points. And performing back-end optimization on the three-dimensional map after the map points are newly built, and returning to determining the next image frame after the optimization is completed.

Based on any of the above embodiments, feature point detection of the image frames may be achieved by common ORB features, freak features, or RF-Net feature point extraction methods based on deep learning. The method comprises the following specific steps: firstly, an image Gaussian pyramid is constructed through downsampling, gaussian pyramids of different layers represent different image scales to ensure scale invariance of features, feature point extraction is carried out in pyramids of different scales, and finally feature representation calculation is carried out based on specific-size image blocks of scales and main direction angles of the extracted feature points, so that feature points in image frames and feature representation of the feature points are obtained.

Based on any of the above embodiments, fig. 6 is a schematic structural diagram of a map construction apparatus provided by the present invention, and as shown in fig. 6, the apparatus includes an image frame determining unit 610, a matching relationship determining unit 620, and an image frame tracking unit 630;

wherein the image frame determining unit 610 is configured to determine a current image frame;

The matching relationship determining unit 620 is configured to determine matching relationships between feature points of the current image frame and feature points of a plurality of sliding window image frames, where the plurality of sliding window image frames are image frames acquired continuously with the current image frame according to a time sequence;

the image frame tracking unit 630 is configured to perform image frame tracking on the current image frame based on the matching relationship, and project feature points of the current image frame into a three-dimensional map based on current pose information obtained by image frame tracking, so as to obtain a new map point of the three-dimensional map.

According to the device provided by the embodiment of the invention, based on the matching relation between the characteristic points of the current image frame and the characteristic points of the plurality of sliding window image frames, the current image frame is subjected to image frame tracking, so that the problem of feature point missing matching is effectively avoided, the utilization rate of the characteristic points is improved, the demand of feature point extraction is reduced, and the calculation efficiency is improved; the matching relation among more feature points is used as a constraint condition of image frame tracking, so that the success rate of image frame tracking is improved, and the map construction quality is further optimized.

Based on any of the above embodiments, the image frame tracking unit 630 is configured to:

Determining estimated pose information of the current image frame;

Based on any of the foregoing embodiments, the preset condition is that the number of associated feature points is greater than a first preset number threshold, or the number of associated feature points is less than or equal to the first preset number threshold and the number of matching feature points is greater than a second preset number threshold.

Based on any one of the above embodiments, the apparatus further includes a map initialization unit, where the map initialization unit is configured to:

determining at least three initial image frames acquired in succession;

Based on any of the above embodiments, the map initialization unit is configured to:

Based on any of the above embodiments, the apparatus further includes an optimizing unit, where the optimizing unit is configured to:

Based on any of the above embodiments, the matching relation determining unit 620 is configured to:

Fig. 7 illustrates a physical schematic diagram of an electronic device, as shown in fig. 7, which may include: processor 710, communication interface (Communications Interface) 720, memory 730, and communication bus 740, wherein processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. The processor 710 may invoke logic instructions in the memory 730 to perform a mapping method comprising: determining a current image frame; determining matching relations between characteristic points of the current image frame and characteristic points of a plurality of sliding window image frames respectively, wherein the plurality of sliding window image frames are image frames obtained by continuously acquiring the current image frame according to time sequence; and carrying out image frame tracking on the current image frame based on the matching relation, and projecting the characteristic points of the current image frame into a three-dimensional map based on the current pose information obtained by the image frame tracking to obtain new map points of the three-dimensional map.

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the map construction method provided by the methods described above, the method comprising: determining a current image frame; determining matching relations between characteristic points of the current image frame and characteristic points of a plurality of sliding window image frames respectively, wherein the plurality of sliding window image frames are image frames obtained by continuously acquiring the current image frame according to time sequence; and carrying out image frame tracking on the current image frame based on the matching relation, and projecting the characteristic points of the current image frame into a three-dimensional map based on the current pose information obtained by the image frame tracking to obtain new map points of the three-dimensional map.

In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the above-provided map construction methods, the method comprising: determining a current image frame; determining matching relations between characteristic points of the current image frame and characteristic points of a plurality of sliding window image frames respectively, wherein the plurality of sliding window image frames are image frames obtained by continuously acquiring the current image frame according to time sequence; and carrying out image frame tracking on the current image frame based on the matching relation, and projecting the characteristic points of the current image frame into a three-dimensional map based on the current pose information obtained by the image frame tracking to obtain new map points of the three-dimensional map.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A map construction method, comprising:

determining a current image frame;

based on the matching relation, carrying out image frame tracking on the current image frame, and based on current pose information obtained by image frame tracking, projecting characteristic points of the current image frame into a three-dimensional map to obtain new map points of the three-dimensional map;

the image frame tracking for the current image frame based on the matching relation comprises:

determining estimated pose information of the current image frame;

2. The map construction method according to claim 1, wherein the preset condition is that the number of the associated feature points is greater than a first preset number threshold, or that the number of the associated feature points is less than or equal to the first preset number threshold and the number of the matching feature points is greater than a second preset number threshold.

3. The map construction method according to claim 1, wherein the classifying feature points of a current image frame into associated feature points and unassociated feature points based on the estimated pose information includes:

4. The map construction method according to claim 1, wherein the three-dimensional map is determined based on the steps of:

Determining at least three initial image frames acquired in succession;

5. The map construction method according to claim 4, wherein the projecting the feature points of each initial image frame into the three-dimensional space based on the matching relationship between the feature points of the at least three initial image frames and the relative pose relationship between the at least three initial image frames to obtain the initialized three-dimensional map comprises:

6. The map construction method according to any one of claims 1 to 5, characterized in that the projecting the feature points of the current image frame into a three-dimensional map, obtaining new map points of the three-dimensional map, further comprises:

7. The map construction method according to any one of claims 1 to 5, wherein the determining of the matching relationship between the feature points of the current image frame and the feature points of the plurality of sliding window image frames, respectively, includes:

determining the number of sliding window image frames in the sliding window structure; the sliding window structure body comprises an image frame list, an image frame ID list, a sliding window characteristic point structure body and an image frame number in the sliding window;

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the map construction method according to any one of claims 1 to 7 when the program is executed.

9. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps of the map construction method according to any one of claims 1 to 7.