CN110322500B

CN110322500B - Optimization method and device for instant positioning and map construction, medium and electronic equipment

Info

Publication number: CN110322500B
Application number: CN201910578527.5A
Authority: CN
Inventors: 王宇鹭
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2023-08-15
Anticipated expiration: 2039-06-28
Also published as: CN110322500A

Abstract

The invention discloses an optimization method and device for instant positioning and map construction, a storage medium and electronic equipment, and relates to the technical field of image processing. The optimization method for the instant positioning and map construction comprises the following steps: extracting visual feature points of the current frame image, and determining feature points which are not associated with the depth information of the current frame image in the visual feature points as monocular feature points; if the number of the visual feature points matched between the current frame image and the reference frame image is larger than a preset matching threshold, when the visual feature is aligned with the inertial feature, optimizing the pose of the current frame image by using a first constraint condition constructed by monocular feature point information and a second constraint condition constructed by the inertial information to obtain an intermediate pose parameter; and optimizing the middle pose parameters by utilizing the depth information of the current frame image and the depth information of the reference frame image so as to determine the pose parameters of the current frame image. The invention can improve the accuracy and the robustness of SLAM.

Description

Optimization method and device for instant positioning and map construction, medium and electronic equipment

Technical Field

The disclosure relates to the technical field of image processing, in particular to an optimization method for instant positioning and map construction, an optimization device for instant positioning and map construction, a storage medium and electronic equipment.

Background

As one of important technologies in the field of computer vision, SLAM (Simultaneous Localization And Mapping, instant localization and mapping) technology has received extensive attention and has been rapidly developed. The technology can be applied to various fields such as unmanned aerial vehicles, automatic driving, high-precision map construction, virtual reality, augmented reality and the like.

SLAM technology is used to build a map of an unknown environment and locate the position of a sensor in the map in real time. In the monocular SLAM technology based on inertia, on one hand, the speed of initializing a map by using visual information is slow; on the other hand, in the process of generating three-dimensional information by using visual information, the problem of generation failure may exist, so that effective data is omitted, the estimated pose is inaccurate, and finally the accuracy and the robustness of SLAM are affected.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The disclosure aims to provide an optimization method, an optimization device, a storage medium and electronic equipment for instant positioning and map construction, so as to overcome the problem of low accuracy of an inertial-based monocular SLAM scheme caused by the limitations and defects of the related art at least to a certain extent.

According to a first aspect of exemplary embodiments of the present disclosure, there is provided an optimization method of instant localization and mapping, the method comprising: extracting visual feature points of the current frame image, and determining feature points which are not associated with the depth information of the current frame image in the visual feature points as monocular feature points; if the number of the visual feature points matched between the current frame image and the reference frame image is larger than a preset matching threshold, when the visual feature is aligned with the inertial feature, optimizing the pose of the current frame image by using a first constraint condition constructed by monocular feature point information and a second constraint condition constructed by the inertial information to obtain an intermediate pose parameter; and optimizing the middle pose parameters by utilizing the depth information of the current frame image and the depth information of the reference frame image so as to determine the pose parameters of the current frame image.

According to a second aspect of exemplary embodiments of the present disclosure, there is provided an optimization apparatus for instantaneous positioning and mapping, the apparatus comprising: the feature determining module is used for extracting visual feature points of the current frame image and determining feature points, which are not associated with the depth information of the current frame image, in the visual feature points as monocular feature points; the first optimization module is used for optimizing the pose of the current frame image by using a first constraint condition constructed by monocular feature point information and a second constraint condition constructed by inertia information when the visual features are aligned with the inertia features if the number of the visual feature points matched between the current frame image and the reference frame image is larger than a preset matching threshold value, so as to obtain intermediate pose parameters; the second optimization module is used for optimizing the pose parameters by utilizing the depth information of the current frame image and the depth information of the reference frame image so as to determine the pose parameters of the current frame image.

According to a third aspect of exemplary embodiments of the present disclosure, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described optimization method of instantaneous positioning and map construction

According to a fourth aspect of exemplary embodiments of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the optimization method of instantaneous positioning and map construction via execution of the executable instructions.

In the technical solutions provided in some embodiments of the present disclosure, after the pose of the current frame image is optimized by the first constraint condition constructed by the monocular feature point information and the second constraint condition constructed by the inertia information, the optimization result is further optimized by using the depth information. The pose is optimized by combining the depth information, so that the problem that the feature points are omitted due to the fact that the pose is optimized by only adopting the visual information is avoided, and the accuracy and the robustness of SLAM can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 schematically illustrates a flow chart of an optimization method of instant localization and mapping in accordance with an exemplary embodiment of the present disclosure;

FIG. 2 shows a schematic diagram of a process of determining single-order feature points and stereo feature points according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a schematic diagram of triangulating process in accordance with an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a relocation process according to an exemplary embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of an on-the-fly positioning and mapping optimization apparatus according to a first exemplary embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of an on-the-fly positioning and mapping optimization apparatus according to a second exemplary embodiment of the present disclosure;

FIG. 7 schematically illustrates a block diagram of a feature determination module according to an exemplary embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of an on-the-fly positioning and mapping optimization apparatus according to a third exemplary embodiment of the present disclosure;

FIG. 9 schematically illustrates a block diagram of an on-the-fly positioning and mapping optimization apparatus according to a fourth exemplary embodiment of the present disclosure;

FIG. 10 schematically illustrates a block diagram of an on-the-fly positioning and mapping optimization apparatus according to a fifth exemplary embodiment of the present disclosure;

fig. 11 schematically illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only and not necessarily all steps are included. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations. In addition, the terms "first," "second," "third," etc. are used below for distinguishing purposes only and should not be taken as a limitation of the present disclosure.

The optimization method of the instant positioning and the map construction described below may be implemented by the terminal device, that is, the terminal device may perform the respective steps of the optimization method of the instant positioning and the map construction of the present disclosure. In this case, the optimization apparatus of the instant localization and mapping of the exemplary embodiments of the present disclosure may be configured in the terminal device.

The terminal device may be a device implementing the SLAM scheme, for example, the terminal device may be a mobile phone, a tablet, an intelligent wearable device (smart watch, smart glasses, etc.), an unmanned aerial vehicle, a mobile robot, etc.

Fig. 1 schematically illustrates a flow chart of an optimization method of the instant localization and mapping of an exemplary embodiment of the present disclosure. Referring to fig. 1, the optimization method of the instant localization and mapping may include the steps of:

s12, extracting visual feature points of the current frame image, and determining feature points, which are not associated with the depth information of the current frame image, in the visual feature points as monocular feature points.

In an exemplary embodiment of the present disclosure, the visual feature points may be ORB (Oriented FAST and Rotated BRIEF, fast feature point extraction and description) feature points. The ORB feature points are calculated quickly and are suitable for implementation on terminal equipment.

An ORB feature point may include both FAST corner and BRIER descriptor. The FAST corner points refer to the positions of the ORB feature points in the image, and mainly detect areas with obvious local pixel gray level change, and the computing speed is high, and the core concept is that if one pixel has a large difference (namely, is too dark or too bright) with the pixels of the neighborhood, the pixel is one corner point; the BRIEF descriptor is a binary vector, which describes the information of pixels around the FAST corner in an artificially set manner, that is, the vector of the BRIEF descriptor consists of a plurality of 0 s and 1 s, and represents the pixel value size relationship between the FAST corner and the nearby pixels.

Although ORB feature points are described below as examples, it should be understood that other feature points may also be employed as the visual feature points described in the present disclosure, such as SURF (Speeded Up Robust Features, acceleration robust feature), sift (Scale-invariant feature transform ) features, harris corner points, and the like, the type of visual feature points of the present disclosure being not particularly limited.

For a current frame image photographed by a camera, an ORB feature point of the current frame image may be extracted using an ORB algorithm as a visual feature point of the current frame image. In addition, depth information (depth map) corresponding to the current frame image may be acquired by means of a depth sensor, in which case it may be determined whether or not the visual feature points are associated with the depth information, that is, the visual feature points corresponding to the depth information are found from the current frame image.

Specifically, the external reference matrix of the camera image and the depth map may be calibrated in advance, and on this basis, for a given image coordinate, the corresponding depth pixel coordinate may be determined on the depth map through transformation of the external reference matrix, so when the visual feature point and the depth information of the current frame image are obtained, it may be determined whether the visual feature point corresponds to the depth information, that is, whether the visual feature point is associated with the depth information.

The exemplary embodiments of the present disclosure take feature points, which are not associated with depth information, among visual feature points as monocular feature points (may also be referred to as mono feature points), wherein the monocular feature points may be understood as visual feature points corresponding to a two-dimensional space. In addition, a feature point associated with depth information among the visual feature points is taken as a stereoscopic feature point (may also be referred to as a stereofeature point), wherein the stereoscopic feature point may be understood as a visual feature point corresponding to a three-dimensional space.

Fig. 2 shows a schematic diagram of a process of determining monocular feature points and stereoscopic feature points. First, ORB feature points may be extracted from an RGB image photographed by a camera of a terminal device; next, the ORB feature points are divided into monocular feature points and stereoscopic feature points using depth information acquired by the depth sensor.

Furthermore, for objects in a strong light environment or on a black surface, where there may be inaccuracy in the detected depth information, some embodiments of the present disclosure may further include a process of evaluating the depth information, e.g., a quality threshold may be set that characterizes the availability of the depth information. And when the depth information is determined to reach the quality threshold, the depth information is used for dividing the visual feature points.

S14, if the number of the visual feature points matched between the current frame image and the reference frame image is larger than a preset matching threshold, when the visual feature is aligned with the inertia feature, optimizing the pose of the current frame image by using a first constraint condition constructed by monocular feature point information and a second constraint condition constructed by the inertia information to obtain intermediate pose parameters.

The exemplary embodiments of the present disclosure further include a map initialization scheme before optimizing the current frame image, so as to determine a reference frame that is compared with the current frame image to achieve tracking.

Specifically, firstly, for an input image, extracting a visual feature point of the input image, and determining a monocular feature point and a stereoscopic feature point of the input image, wherein the specific process is similar to the above-mentioned determination of the monocular feature point and the stereoscopic feature point of the current frame image, and is not repeated here; next, the number of visual feature points of the input image may be compared with a first preset threshold, and the number of stereoscopic feature points of the input image may be compared with a second preset threshold, where the first preset threshold and the second preset threshold may be manually configured in advance and may be thresholds related to the resolution of the image, and the disclosure does not specifically limit the specific value.

If the number of the visual feature points of the input image is greater than a first preset threshold value and the number of the stereoscopic feature points of the input image is greater than a second preset threshold value, the map initialization can be considered to be successful, and the spatial three-dimensional points corresponding to the stereoscopic feature points of the input image can be used as an initial point cloud map. In this case, the input image may be regarded as a reference frame image.

If the number of the visual feature points of the input image is not greater than a first preset threshold value or the number of the stereoscopic feature points of the input image is not greater than a second preset threshold value, the input image is not satisfied with the requirement of the reference frame image, and the input image can be selected to be rejected.

In an exemplary embodiment of the present disclosure, the number of visual feature points in the current frame image that match the visual feature points of the reference frame image may be determined, and if the number is greater than a preset matching threshold, tracking may be considered successful. The preset configuration threshold value can be set manually, and the specific value is not particularly limited in the disclosure.

In addition, the method comprises the following steps. The present disclosure does not specifically limit the relative relationship between the current frame image and the reference frame image in the video stream, for example, the current frame image may be a next frame image of the reference frame image, and for example, there are several frame images between the current frame image and the reference frame image.

Under the condition of successful tracking, the visual features acquired by the terminal equipment can be aligned with the inertial features, and the initialization process of the visual features and the inertial features is realized.

In particular, the inertial information may be obtained by means of an IMU (Inertial Measurement Unit ) device of the terminal device, which may comprise a gyroscope and an accelerometer, which may measure the angular velocity and the acceleration of the terminal device, respectively. Since the IMU device generally operates at a higher frequency than the camera captures images, inertial information of the corresponding frame can be estimated by means of IMU pre-integration. The IMU pre-integration is based on time integration, and inertial information such as positions, speeds, rotation angles and the like of the two corresponding images can be obtained.

The visual information corresponds to the characteristics of inertia, and the SFM (Structure From Motion, motion restoration structure) technology can be utilized to process the two images so as to determine the corresponding information such as position, speed, rotation angle and the like. The SFM technique can recover corresponding three-dimensional information from a two-dimensional image or video sequence, that is, input a series of two-dimensional images or video sequences, and output three-dimensional model information of a scene through the SFM technique.

Ideally, the inertial information determined using SFM techniques should be equal to the inertial information measured based on the IMU. However, there is a deviation in the results of both of them in practice due to the circuit clock, the device measurement accuracy, and the like, and therefore, alignment is required.

In this case, the state quantities of the parameters in the inertial information need to be calibrated initially to ensure that the results determined by the SFM technique are as close as possible to the results measured by the IMU, where these state quantities may include position, velocity, rotation angle, acceleration bias, gyroscope bias, etc.

If it is determined, based on the calibration process described above, that the deviation between the result of the SFM technique and the result of the IMU measurement is less than a predetermined deviation, the visual features are aligned with the inertial features, that is, the visual features and the inertial features are successfully initialized. In this case, the pose of the current frame image can be optimized.

In an exemplary embodiment of the present disclosure, the optimization function employed in the optimization herein may include two constraints, namely, two cost functions, one of which is a first constraint constructed from monocular feature point information and the other of which is a second constraint constructed from inertial information. And then, optimizing the cost function by using a nonlinear optimization method, so that the value of the cost function is continuously reduced to determine the pose optimization result of the current frame image, and recording the pose optimization result as the intermediate pose parameter. The nonlinear optimization method is not particularly limited, and may be, for example, a gaussian-newton algorithm, a levenberg-marquardt algorithm, or the like.

For the process of optimizing the current frame image by using the first constraint condition, firstly, the monocular feature point information of the current frame image and the monocular feature point information of the reference frame image can be subjected to triangularization processing to respectively determine the spatial three-dimensional information of the current frame image and the reference frame image based on the monocular feature points.

In particular, triangulation, also known as Triangulation (Triangulation), refers to determining the distance of a point by observing the angle of the same point at two points. Referring to fig. 3, photographing is performed at two positions, and the camera optical centers are O1 and O2. The feature point P1 corresponds to the feature point P2, and theoretically, the straight lines O1P1 and O2P2 intersect at a point P in the scene, where the point P is the position of two feature points in the corresponding three-dimensional scene. However, due to the influence of noise, two straight lines often cannot normally intersect, in which case the position of the P point can be solved using the least square method. Therefore, the three-dimensional information of the current frame image and the reference frame image based on the monocular feature point can be determined by using a triangulation method.

Next, a re-projection error of the current frame image and the reference frame image based on the spatial three-dimensional information of the monocular feature points may be calculated to optimize the pose of the current frame image. Specifically, a PNP (Perspective-n-Point) calculation method may be used to determine the re-projection error, and the specific calculation process is not particularly limited in this disclosure.

According to other embodiments of the present disclosure, if the deviation between the results using the SFM technique and the IMU measurements is not less than the preset deviation, it is indicated that the visual features are not aligned with the inertial features, that is, that the visual features are not initialized with the inertial features.

In this case, the pose of the current frame image can be predicted based on a motion speed constant model, in which the camera is considered to be in uniform motion. For example, visual feature points matched with the current frame image in the previous frame image can be searched, and for the visual feature points, the pose of the predicted current frame image can be optimized by taking monocular feature point information and stereo feature point information as constraint conditions.

Based on the method, the pose of the current frame image is optimized by combining the three-dimensional feature point information containing depth information, so that the accuracy and the robustness are improved.

According to further embodiments of the present disclosure, upon determining that the number of matched visual feature points between the current frame image and the reference frame image is not greater than a preset matching threshold, a repositioning process as shown in fig. 4 may be performed.

In step S402, a Bag of Words (BoW) vector of the current frame image is calculated. Specifically, the features on the image can be regarded as individual words, and a dictionary containing all feature types can be trained in advance, whereby for the features of the image, a set of corresponding words can be generated from the dictionary, the set being a bag of words.

In step S404, a plurality of candidate images are determined from the key frame database based on the bag-of-words vector of the current frame image. Specifically, the similarity between the bag-of-word vector of the current frame image and the bag-of-word vector of each key frame image in the key frame database can be calculated, and the image with the similarity meeting the preset similarity requirement is determined as the candidate image. The preset similarity requirement may be manually configured, for example, the preset similarity requirement may be that the similarity of the two is greater than 80%, which is not particularly limited in the present disclosure.

In step S406, visual feature points in each candidate image that match the visual feature points of the current frame image are determined. That is, for each candidate image, an ORB feature point corresponding to the map point cloud of the current frame image is calculated.

In step S408, the pose of the current frame image is calculated using the visual feature points matched with the visual feature points of the current frame image, specifically, the pose of the current frame image may be calculated by sequentially calculating each candidate image using a PNP calculation method and using iterative calculation using a random sample consensus (RANdom SAmple Consensus, RANSAC) method.

The target image may be determined from a plurality of candidate images based on the calculation result. Specifically, the target image may be an image that most matches the feature points of the current frame image.

In step S410, the pose of the current frame image is optimized based on the visual feature points of the target image and the visual feature points of the current frame image to complete the repositioning process.

S16, optimizing the pose parameters by utilizing the depth information of the current frame image and the depth information of the reference frame image to determine the pose parameters of the current frame image.

After the pose of the current frame image is optimized by the first constraint condition constructed by the monocular feature point information and the second constraint condition constructed by the inertia information, the depth information can be used for further optimization.

Specifically, after determining the intermediate pose parameter, firstly, determining a point cloud map corresponding to the depth information of the current frame image as a first point cloud map, and determining a point cloud map corresponding to the depth information of the reference frame image as a second point cloud map based on the intermediate pose parameter,

Next, geometric feature points of the first point cloud map and the second point cloud map are extracted. The geometric feature points can comprise two types of feature points, wherein the first type of feature is sharp edge feature and is marked as edge point; the second type of feature is a local planar feature, denoted as a planar point.

Subsequently, geometric feature points that match between the first point cloud map and the second point cloud map may be determined. Specifically, two types of feature points in the first point cloud map and the second point cloud map may be clustered, where each feature point corresponds to a nearest point on the other point cloud map. In this case, a distance function of the feature point and its nearest neighbor point may be established, and when the distance between the two points is smaller than a preset distance, the two geometric feature points are considered to match. The specific value of the preset distance is not particularly limited in the present disclosure.

And then, a third constraint condition can be established by utilizing the determined matched set feature points, and the pose of the current frame image is further optimized by utilizing the third constraint condition and combining a nonlinear optimization method.

In addition, some embodiments of the present disclosure may further include a scheme of determining whether the current frame image is a key frame image.

Specifically, it can be determined whether the pose parameter of the current frame image after the optimization process meets the preset key frame determination condition. For example, the preset key frame judgment condition may be, for example: the number of the point clouds tracked by the current frame image is less than 50, or the number of the point clouds tracked by the current frame image is less than 90% of the number of the point clouds of the reference frame image. However, the present disclosure does not particularly limit the preset key frame judgment conditions.

When the pose parameter of the current frame image meets the preset key frame judging condition, determining the current frame image as a key frame image, and inserting the current frame image into a key frame database.

It should be appreciated that the optimization method of instant localization and mapping described by way of example above is primarily directed to tracking threads in SLAM schemes. Based on the above, on one hand, the map is initialized by combining the depth information, and compared with the scheme of initializing the map by adopting pure visual characteristics, the calculation speed is improved; on the other hand, the visual feature points of the image are divided into three-dimensional feature points (namely the three-dimensional feature points) based on depth information and feature points to be subjected to triangulation (namely the monocular feature points), and cost functions of the three-dimensional feature points and the feature points are different, so that compared with the feature points to be subjected to the triangulation, the operation time is saved; on the other hand, by combining the depth information, the problem of inaccurate pose estimation caused by effective data omission can be effectively solved, and the accuracy and the robustness of SLAM can be further improved.

In addition, some embodiments of the present disclosure further provide a new local mapping method, the concept of which includes: the depth information of the image is used as one of constraint conditions for determining the local pose, so that the local map is built more accurately.

Firstly, screening map point clouds of a current frame image based on a key frame database to ensure that the reserved point clouds are observed by at least three other key frames; then, the current frame image and at least three other related key frames are subjected to local pose optimization by using a beam adjustment method (Bundle Adjustment, BA), monocular feature point information, three-dimensional feature point information and inertial information (IMU pre-integral information) in the extracted ORB feature points are used as constraint conditions for local pose optimization, and the optimization is performed by using a nonlinear optimization method.

In addition, the present disclosure may also include schemes to cull redundant images. For example, if 90% of the feature points in one key frame image can be observed by the other three key frame images at the same time, the key frame image is determined to be a redundant image, and the redundant image is removed from the calculation process of SLAM.

In addition, some embodiments of the present disclosure also provide a closed loop detection thread that executes in parallel.

First, candidate loop frames are detected. Specifically, calculating the similarity between the current frame image and the key frame image in the key frame database, removing the key frame with low similarity to determine a candidate loop frame, and calculating a similarity transformation matrix from the current frame image to the loop key frame to determine the accumulated error of the loop; then, optimizing the pose of the current frame image through the calculated similarity transformation matrix, wherein the optimization form can be applied to all key frames adjacent to the current frame image; then, the loop key frame and the adjacent key frames can determine that all map point clouds are mapped to the current frame image and the adjacent images thereof, and corresponding matching points are determined near the mapped areas so as to fuse all the matched map point clouds with effective data in the process of calculating similar transformation; then, a base map optimization is performed, which ignores the inertial data. The optimization program corrects the scale offset through similar transformation, and after optimization, each map point cloud is transformed according to the correction of the key frame; and finally, performing global pose BA optimization and map updating.

It should be noted that although the steps of the methods in the present disclosure are depicted in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order, or that all illustrated steps be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

Further, in this example embodiment, an optimizing apparatus for instant positioning and map construction is also provided.

Fig. 5 schematically illustrates a block diagram of an on-the-fly positioning and mapping optimization apparatus of an exemplary embodiment of the present disclosure. Referring to fig. 5, an immediate localization and mapping optimization apparatus 5 according to an exemplary embodiment of the present disclosure may include a feature determination module 51, a first optimization module 53, and a second optimization module 55.

Specifically, the feature determining module 51 may be configured to extract visual feature points of the current frame image, and determine feature points, which are not associated with depth information of the current frame image, in the visual feature points, as monocular feature points; the first optimization module 53 may be configured to optimize, if the number of the visual feature points matched between the current frame image and the reference frame image is greater than a preset matching threshold, the pose of the current frame image by using a first constraint condition constructed by monocular feature point information and a second constraint condition constructed by inertial information when the visual feature is aligned with the inertial feature, so as to obtain an intermediate pose parameter; the second optimization module 55 may be configured to optimize the pose parameter using the depth information of the current frame image and the depth information of the reference frame image to determine the pose parameter of the current frame image.

According to the optimizing device for the instant positioning and map construction of the exemplary embodiment of the disclosure, the pose is optimized by combining the depth information, so that the problem that the feature points are omitted due to the fact that the visual information is only adopted for optimization is avoided, and the accuracy and the robustness of SLAM can be improved.

According to an exemplary embodiment of the present disclosure, referring to fig. 6, the on-the-fly positioning and mapping optimization apparatus 6 may further include a repositioning module 61 as compared to the on-the-fly positioning and mapping optimization apparatus 5.

In particular, the relocation module 61 may be configured to perform: if the number of matched visual feature points between the current frame image and the reference frame image is not greater than a preset matching threshold, the following repositioning process is performed: calculating a bag-of-word vector of the current frame image; determining a plurality of candidate images from a key frame database based on the bag-of-word vector of the current frame image; determining visual feature points matched with the visual feature points of the current frame image in each candidate image; calculating the pose of the current frame image by utilizing visual feature points matched with the visual feature points of the current frame image, and determining a target image from a plurality of candidate images according to a calculation result; and optimizing the pose of the current frame image based on the visual feature points of the target image and the visual feature points of the current frame image.

According to an exemplary embodiment of the present disclosure, referring to fig. 7, the feature determination module 51 may include a feature division unit 701.

Specifically, the feature division unit 701 may be configured to perform: acquiring depth information of the current frame image; determining whether the visual feature points are associated with the depth information or not based on a pre-calibrated external reference matrix; wherein, the feature points associated with the depth information among the visual feature points are determined as stereoscopic feature points.

According to an exemplary embodiment of the present disclosure, the first optimization module 53 may be further configured to perform: when the visual features are not aligned with the inertial features, predicting the pose of the current frame image based on the motion speed constant model, and optimizing the predicted pose of the current frame image by taking the monocular feature point information and the stereo feature point information as constraint conditions.

According to an exemplary embodiment of the present disclosure, the process of the first optimization module 53 performing the optimization of the pose of the current frame image using the first constraint condition may be configured to: triangularizing the monocular feature point information of the current frame image and the monocular feature point information of the reference frame image to respectively determine the spatial three-dimensional information of the current frame image and the reference frame image based on the monocular feature points; and calculating the re-projection errors of the current frame image and the reference frame image based on the spatial three-dimensional information of the monocular feature points so as to optimize the pose of the current frame image.

According to an exemplary embodiment of the present disclosure, referring to fig. 8, the on-the-fly positioning and mapping optimization apparatus 8 may further include a reference frame determination module 81 as compared to the on-the-fly positioning and mapping optimization apparatus 5.

Specifically, the reference frame determination module 81 may be configured to perform: pre-extracting visual feature points of an input image, and determining monocular feature points and three-dimensional feature points of the input image; comparing the number of the visual feature points of the input image with a first preset threshold value, and comparing the number of the stereoscopic feature points of the input image with a second preset threshold value; if the number of visual feature points of the input image is greater than a first preset threshold and the number of stereoscopic feature points of the input image is greater than a second preset threshold, the input image is taken as a reference frame image.

According to an example embodiment of the present disclosure, the second optimization module 55 may be configured to perform: based on the intermediate pose parameters, determining a point cloud map corresponding to the depth information of the current frame image as a first point cloud map, and determining a point cloud map corresponding to the depth information of the reference frame image as a second point cloud map; extracting geometric feature points of the first point cloud map and the second point cloud map, and determining geometric feature points matched between the first point cloud map and the second point cloud map; and constructing a third constraint condition by using the information of the matched geometric feature points, and optimizing the pose parameters by using the third constraint condition.

Referring to fig. 9, the on-the-fly positioning and mapping optimization apparatus 9 may further include a key frame determination module 91, in comparison to the on-the-fly positioning and mapping optimization apparatus 5, according to an exemplary embodiment of the present disclosure.

Specifically, the key frame determination module 91 may be configured to perform: and when the pose parameters of the current frame image meet the preset key frame judging conditions, determining the current frame image as a key frame image, and inserting the key frame image into a key frame database.

Referring to fig. 10, the on-the-fly positioning and mapping optimization apparatus 10 may further include a local mapping optimization module 101, in comparison to the on-the-fly positioning and mapping optimization apparatus 5, according to an exemplary embodiment of the present disclosure.

Specifically, the local diagramming optimization module 101 may be configured to perform: when the current frame image is a key frame image, in the process of local image construction, monocular feature point information, three-dimensional feature point information and inertia information of the current frame image are used as constraint conditions, and the process of local image construction is optimized.

Since each functional module of the program execution performance analysis device according to the embodiment of the present invention is the same as that of the above-described method embodiment of the present invention, a detailed description thereof will be omitted.

In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.

The program product for implementing the above-described method according to an embodiment of the present invention may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical disk, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 1100 according to this embodiment of the invention is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in fig. 11, the electronic device 1100 is embodied in the form of a general purpose computing device. Components of electronic device 1100 may include, but are not limited to: the at least one processing unit 1110, the at least one memory unit 1120, a bus 1130 connecting the different system components (including the memory unit 1120 and the processing unit 1110), and a display unit 1140.

Wherein the storage unit stores program code that is executable by the processing unit 1110 such that the processing unit 1110 performs steps according to various exemplary embodiments of the present invention described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 1110 may perform steps S12 to S16 as shown in fig. 1.

The storage unit 1120 may include a readable medium in the form of a volatile storage unit, such as a Random Access Memory (RAM) 11201 and/or a cache memory 11202, and may further include a Read Only Memory (ROM) 11203.

The storage unit 1120 may also include a program/utility 11204 having a set (at least one) of program modules 11205, such program modules 11205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The bus 1130 may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a bus using any of a variety of bus architectures.

The electronic device 1100 may also communicate with one or more external devices 1200 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1100, and/or any devices (e.g., routers, modems, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1150. Also, electronic device 1100 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1160. As shown, network adapter 1160 communicates with other modules of electronic device 1100 via bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 1100, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An optimization method for instant positioning and map construction is characterized by comprising the following steps:

extracting visual feature points of a current frame image, and determining feature points, which are not associated with the depth information of the current frame image, in the visual feature points as monocular feature points;

if the number of the visual feature points matched between the current frame image and the reference frame image is larger than a preset matching threshold, optimizing the pose of the current frame image by using a first constraint condition constructed by monocular feature point information and a second constraint condition constructed by inertial information when the visual feature is aligned with the inertial feature to obtain an intermediate pose parameter;

And optimizing the intermediate pose parameter by utilizing the depth information of the current frame image and the depth information of the reference frame image to determine the pose parameter of the current frame image.

2. The optimization method of instant localization and mapping of claim 1, further comprising:

if the number of the matched visual feature points between the current frame image and the reference frame image is not greater than the preset matching threshold, the following repositioning process is executed:

calculating a bag-of-word vector of the current frame image;

determining a plurality of candidate images from a key frame database based on the bag-of-word vector of the current frame image;

determining visual feature points matched with the visual feature points of the current frame image in each candidate image;

calculating the pose of the current frame image by utilizing visual feature points matched with the visual feature points of the current frame image, and determining a target image from the plurality of candidate images according to a calculation result;

and optimizing the pose of the current frame image based on the visual feature points of the target image and the visual feature points of the current frame image.

3. The optimization method of instant localization and mapping of claim 1, further comprising:

acquiring depth information of the current frame image;

determining whether the visual feature points are associated with the depth information or not based on a pre-calibrated extrinsic matrix;

and determining the feature points associated with the depth information from the visual feature points as stereoscopic feature points.

4. The optimization method of instant localization and mapping of claim 3, further comprising:

and when the visual features are not aligned with the inertial features, predicting the pose of the current frame image based on the motion speed constant model, and optimizing the predicted pose of the current frame image by taking the monocular feature point information and the stereo feature point information as constraint conditions.

5. The optimization method of immediate localization and mapping according to claim 1, wherein optimizing the pose of the current frame image using a first constraint constructed from monocular feature point information comprises:

triangularizing the monocular feature point information of the current frame image and the monocular feature point information of the reference frame image to respectively determine spatial three-dimensional information of the current frame image and the reference frame image based on the monocular feature points;

And calculating the re-projection errors of the current frame image and the reference frame image based on the spatial three-dimensional information of the monocular feature points so as to optimize the pose of the current frame image.

6. The optimization method of instant localization and mapping of claim 3, further comprising:

pre-extracting visual characteristic points of an input image, and determining monocular characteristic points and three-dimensional characteristic points of the input image;

comparing the number of the visual feature points of the input image with a first preset threshold value, and comparing the number of the stereoscopic feature points of the input image with a second preset threshold value;

and if the number of the visual feature points of the input image is larger than the first preset threshold value and the number of the stereoscopic feature points of the input image is larger than the second preset threshold value, the input image is taken as the reference frame image.

7. The optimization method of on-the-fly localization and mapping of claim 1, wherein optimizing the intermediate pose parameters using the depth information of the current frame image and the depth information of the reference frame image comprises:

Based on the intermediate pose parameters, determining a point cloud map corresponding to the depth information of the current frame image as a first point cloud map, and determining a point cloud map corresponding to the depth information of the reference frame image as a second point cloud map;

extracting geometric feature points of the first point cloud map and the second point cloud map, and determining geometric feature points matched between the first point cloud map and the second point cloud map;

and constructing a third constraint condition by using the information of the matched geometric feature points, and optimizing the intermediate pose parameters by using the third constraint condition.

8. The optimization method of instantaneous positioning and mapping according to any one of claims 1-7, characterized in that it further comprises:

and when the pose parameters of the current frame image meet preset key frame judging conditions, determining the current frame image as a key frame image, and inserting the key frame image into a key frame database.

9. The method of optimizing instant localization and mapping of claim 8, further comprising:

When the current frame image is a key frame image, in the process of local image construction, monocular feature point information, three-dimensional feature point information and inertia information of the current frame image are used as constraint conditions, and the process of local image construction is optimized.

10. An optimization device for instant positioning and map construction, comprising:

the characteristic determining module is used for extracting visual characteristic points of the current frame image and determining characteristic points which are not associated with the depth information of the current frame image in the visual characteristic points as monocular characteristic points;

the first optimization module is used for optimizing the pose of the current frame image by using a first constraint condition constructed by monocular feature point information and a second constraint condition constructed by inertia information when the visual features are aligned with the inertia features if the number of the visual feature points matched between the current frame image and the reference frame image is larger than a preset matching threshold value, so as to obtain intermediate pose parameters;

and the second optimization module is used for optimizing the intermediate pose parameters by utilizing the depth information of the current frame image and the depth information of the reference frame image so as to determine the pose parameters of the current frame image.

11. A storage medium having stored thereon a computer program, which when executed by a processor implements the method of optimizing instant localization and mapping of any of claims 1 to 9.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the optimization method of immediate localization and mapping of any one of claims 1 to 9 via execution of the executable instructions.