CN110487274B

CN110487274B - SLAM method and system for weak texture scene, navigation vehicle and storage medium

Info

Publication number: CN110487274B
Application number: CN201910694313.4A
Authority: CN
Inventors: 万雪; 王晨卉; 李盛阳
Original assignee: Technology and Engineering Center for Space Utilization of CAS
Current assignee: Technology and Engineering Center for Space Utilization of CAS
Priority date: 2019-07-30
Filing date: 2019-07-30
Publication date: 2021-01-29
Anticipated expiration: 2039-07-30
Also published as: CN110487274A

Abstract

The invention discloses an SLAM method and system for a weak texture scene, a navigation vehicle and a storage medium, and relates to the field of instant positioning and map construction. The method comprises the following steps: reading an image sequence, and extracting and matching feature points in the image sequence according to a preset deep learning model; resolving the pose of the camera to obtain pose transformation parameters; carrying out map point calculation on the feature point pairs in the matching result; generating a three-dimensional point cloud map at the current moment; and repeating the steps until all images in the image sequence are read, generating a three-dimensional point cloud map at all times, and completing real-time positioning and map construction of the image sequence. The method is suitable for the weak texture scene, realizes the application of the SLAM method in the weak texture scene, and can improve the precision and accuracy of the map construction.

Description

SLAM method and system for weak texture scene, navigation vehicle and storage medium

Technical Field

The invention relates to the field of instant positioning and map construction, in particular to an SLAM method, a system, a navigation vehicle and a storage medium for weak texture scenes.

Background

The SLAM (Simultaneous Localization And Mapping) problem can be described as: whether there is a way to let a robot move while gradually tracing a complete map of an unknown environment by placing the robot at an unknown position in the environment. The existing monocular SLAM method based on vision mainly comprises the steps of feature point detection and extraction, camera pose estimation, camera pose optimization, local map construction and the like.

At present, research aiming at autonomous navigation and positioning of robots still mainly focuses on solving the problem of drawing of structured scenes, and the scenes often contain a large number of artificial objects, such as tables and chairs, floor tiles, railings, and objects with obvious angle features, point features and line features on the outer wall of a building, and are beneficial to detecting and extracting feature points.

However, in the field of extraterrestrial planet detection, the deep space mapping task is directed at a weak texture scene, and the weak texture scene usually does not have or only has few line features and lacks obvious point features, so that the current three-dimensional mapping method is not suitable for the weak texture scene, and the problems of too few extracted feature points, more mismatching and the like easily occur, so that the three-dimensional mapping cannot be completed.

Disclosure of Invention

The invention aims to solve the technical problem of providing an SLAM method, a system, a navigation vehicle and a storage medium which can be used for weak texture scenes aiming at the defects of the prior art.

The technical scheme for solving the technical problems is as follows:

a SLAM method for weak texture scenes, comprising:

s1, acquiring an image sequence acquired by the monocular camera;

s2, reading a first frame image and a second frame image in the image sequence, and extracting and matching feature points in the first frame image and the second frame image according to a preset deep learning model;

s3, camera pose calculation is carried out according to the matching result, and pose transformation parameters of the second frame image relative to the first frame image are obtained;

s4, carrying out map point calculation on the feature point pairs in the matching result according to the pose transformation parameters, and converting the two-dimensional feature point pairs into three-dimensional space points;

s5, generating a three-dimensional point cloud map of the current time according to the three-dimensional space points;

and S6, repeating the steps S2 to S5 until all images in the image sequence are read, generating a three-dimensional point cloud map at all times, and completing real-time positioning and map construction of the image sequence.

The invention has the beneficial effects that: the SLAM method provided by the invention is suitable for weak texture scenes, the images of the weak texture scenes are subjected to feature point extraction and matching through the deep learning model, more suitable feature detection and extraction rules can be abstracted from samples, then map point calculation is carried out, a three-dimensional point cloud map is constructed, the application of the SLAM method in the weak texture scenes is realized, feature points with better stability and robustness can be provided based on the features of deep learning, the real-time positioning and map construction of the weak texture scenes are realized, and the precision and accuracy of map construction can be improved.

Another technical solution of the present invention for solving the above technical problems is as follows:

a storage medium, wherein instructions are stored, and when the instructions are read by a computer, the instructions cause the computer to execute the SLAM method for the weak texture scene according to the above technical solution.

a SLAM navigation vehicle for weak texture scenes comprising:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the SLAM method for the weak texture scene according to the technical scheme.

a SLAM system for weak texture scenes comprising:

the acquisition unit is used for acquiring an image sequence acquired by the monocular camera;

the matching unit is used for reading a first frame image and a second frame image in the image sequence, and extracting and matching feature points in the first frame image and the second frame image according to a preset deep learning model;

the pose resolving unit is used for resolving the pose of the camera according to the matching result to obtain pose transformation parameters of the second frame image relative to the first frame image;

the map point resolving unit is used for resolving map points of the feature point pairs in the matching result according to the pose transformation parameters and converting the two-dimensional feature point pairs into three-dimensional space points;

and the modeling unit is used for generating a three-dimensional point cloud map at the current moment according to the three-dimensional space points.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

Fig. 1 is a schematic flowchart of an embodiment of a SLAM method for a weak texture scene according to the present invention;

fig. 2 is a schematic flow chart of the SLAM method for a weak texture scene according to another embodiment of the present invention;

fig. 3 is a schematic flow chart of the SLAM method for a weak texture scene according to another embodiment of the present invention;

fig. 4 is a schematic flowchart of an SLAM method for a weak texture scene according to another embodiment of the present invention;

fig. 5 is a structural framework diagram provided by an embodiment of the SLAM system for weak texture scenes according to the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth to illustrate, but are not to be construed to limit the scope of the invention.

As shown in fig. 1, a schematic flow diagram provided by an embodiment of an SLAM method for a weak texture scene according to the present invention is that the SLAM method is applied to the weak texture scene, and when a monocular camera acquires an image sequence, the image is processed and mapped in real time, including:

s1, acquiring an image sequence acquired by the monocular camera;

it should be understood that the present invention is applicable to weak texture scene recognition, and therefore, the image sequence acquired by the monocular camera may be an image sequence including a weak texture scene, and the monocular camera may be placed on a navigation vehicle or a robot, for example, may acquire a weak texture scene such as extraterrestrial planet image or space image. Here, the weak texture scene refers to a scene that does not have or lacks line features, point features, and angle features.

it should be understood that the deep learning model can be a convolutional neural network, which is pre-established and trained, the specific structure of the convolutional neural network can be set according to actual requirements, the training process can also be set according to actual requirements, and finally, the trained convolutional neural network can accurately and efficiently output feature point extraction matching results of the weak texture scene.

For example, the convolutional neural network may include an input layer, the input layer is sequentially connected with a convolutional layers, the number of convolutional layers, the size of convolutional cores, the step size and the filling may be set according to actual requirements, and then connected with a pooling layer, the pooling size, the step size and the filling of a pooling region may be set according to actual requirements, or may be set according to actual requirements to form a fully-connected layer, and finally, an image is output by the output layer.

S3, performing camera pose calculation according to the matching result to obtain pose transformation parameters of the second frame image relative to the first frame image;

it should be understood that the matching result includes a plurality of pairs of matched feature point pairs, the intrinsic matrix can be solved through the camera internal reference matrix and the feature point pairs, and the pose transformation parameters of the second frame image relative to the first frame image can be obtained by decomposing the intrinsic matrix.

it should be understood that the feature point pairs in the matching result are all two-dimensional points, depth information of the space points of the feature point pairs is obtained by calculating the depth values of the feature point pairs, and the two-dimensional feature point pairs can be converted into three-dimensional space points by adding the two-dimensional coordinate information to the depth information.

Generally, pose transformation parameters are R and t, R is a rotation matrix, and t is a translation matrix.

It should be understood that during the repeated execution of step S2, the remaining pictures in the picture sequence are read, for example, when steps S2 to S5 are executed for the second time after steps S2 to S5 are executed for the first time, the first frame picture and the second frame picture in the remaining pictures, that is, the third frame picture and the fourth frame picture of the entire picture sequence, are read.

The SLAM method provided by the embodiment is suitable for a weak texture scene, the images of the weak texture scene are subjected to feature point extraction and matching through a deep learning model, more appropriate feature detection and extraction rules can be abstracted from samples, then map point calculation is carried out, a three-dimensional point cloud map is constructed, the application of the SLAM method in the weak texture scene is realized, feature points with better stability and robustness can be provided based on deep learning features, the real-time positioning and map construction of the weak texture scene are realized, and the precision and accuracy of map construction can be improved.

Optionally, in some embodiments, as shown in fig. 2, in step S3, the method specifically includes:

s31, randomly extracting n characteristic point pairs from the matching result, wherein n is more than or equal to 8;

s32, acquiring a camera internal reference matrix of the monocular camera, calculating the normalized coordinates of n characteristic point pairs according to the camera internal reference matrix, and resolving according to the normalized coordinates of the n characteristic point pairs to obtain an essential matrix;

s33, repeating the steps S31 to S32 m times to obtain m intrinsic matrixes, and optimizing all the intrinsic matrixes according to a preset optimization algorithm to obtain an optimized target intrinsic matrix, wherein m is more than or equal to 1;

and S34, decomposing the target essential matrix to obtain pose transformation parameters.

Preferably, 8 pairs of feature points may be randomly extracted from the matching result.

It should be understood that the repeated optimization of the essential matrix is performed to make the solution of the essential matrix obtain more interior points in all the matching points, and the more interior points, the more reliable the solution of the essential matrix is proved.

It should be noted that the size of m affects the robustness and speed of the calculation, and the larger m is, the more reliable the calculation result is, but the corresponding calculation time is also long. Generally, it is desirable to calculate that the number of the outer points does not exceed 50% of the total number of the point pairs in the matching point set, and it can be satisfied by setting m to 500, and preferably, the value of m in this scheme may be 1000, so that the requirement of the number of the inner points can be satisfied, and the reliability of the calculation result can be ensured.

Optionally, in some embodiments, as shown in fig. 3, in step S33, the method specifically includes:

s331, repeating the steps S31 to S32 m times to obtain m essential matrixes, and respectively calculating the measurement value of each essential matrix;

s332, comparing the size of the metric value of each essential matrix, and taking the essential matrix with the minimum metric value as the optimized target essential matrix.

Optionally, in some embodiments, in step S331, the metric value of each essence matrix is calculated according to the following formula:

wherein C is a metric value, T is a preset threshold value, e_iThe maximum likelihood error of the ith characteristic point pair is 1, 2, …, n.

Wherein e is_iCan be calculated according to the following formula:

wherein,

and

representing the position of the matching point, x, solved by the essential matrix_iAnd y_iCoordinates of the ith pair of matching points are represented.

Optionally, in some embodiments, as shown in fig. 4, in step S4, the method specifically includes:

s41, acquiring any characteristic point pair in the matching result, wherein any characteristic point pair comprises a first characteristic point and a second characteristic point which are matched;

s42, sequentially calculating the depth value of the first characteristic point and the depth value of the second characteristic point according to the pose transformation parameters;

s43, calculating the depth average value of the depth values of the first characteristic points and the second characteristic points;

and S44, obtaining the three-dimensional coordinate of any characteristic point pair according to the two-dimensional coordinate of the first characteristic point, the two-dimensional coordinate of the second characteristic point and the depth average value, and generating a three-dimensional space point according to the three-dimensional coordinate.

It should be noted that the first feature point and the second feature point are homonymous points on the two images, and it is assumed that the point P is a spatial point corresponding to the homonymous point, and theoretically, a connecting line between the two points and the camera main point intersects the point P, but since there is noise disturbance in the observation, the two lines cannot intersect in practice, and therefore, the position of P can be solved by an averaging method.

Optionally, in some embodiments, in step S42, the depth value of the first feature point is calculated according to the following formula:

s₁x₂×Rx₁+x₂×t＝0

calculating the depth value of the second feature point according to the following formula:

s₂x₂＝s₁Rx₁+t

wherein s is₁Is the depth value of the first feature point, s₂Is the depth value, x, of the second feature point₁Is the normalized coordinate, x, of the first feature point₂And R and t are pose transformation parameters.

Optionally, in some embodiments, before step S5, the method further includes:

after each g frame of image is read, performing beam adjustment on the prior h frame of image of the current frame of image, optimizing the pose transformation parameters and three-dimensional space points of the h frame of image, wherein g is more than or equal to 1 and less than or equal to h.

For example, g may be 7 and h may be 15.

Preferably, after the first 15 frames of images are read, the first 15 frames of images may be subjected to global beam method adjustment to optimize the imaging quality, and then the last 15 frames of images may be subjected to local beam method adjustment every 7 frames.

It is understood that some or all of the alternative embodiments described above may be included in some embodiments.

In other embodiments of the present invention, a storage medium is further provided, where the storage medium stores instructions that, when read by a computer, cause the computer to execute the SLAM method for the weak texture scene according to the above technical solution.

In other embodiments of the present invention, there is also provided a SLAM navigation car for a weak texture scene, including:

a memory for storing a computer program;

As shown in fig. 5, a structural framework diagram is provided for an embodiment of the SLAM system for a weak texture scene according to the present invention, where the SLAM system is applicable to the weak texture scene, and includes:

the acquiring unit 1 is used for acquiring an image sequence acquired by a monocular camera;

the matching unit 2 is used for reading a first frame image and a second frame image in the image sequence, and extracting and matching feature points in the first frame image and the second frame image according to a preset deep learning model;

the pose calculating unit 3 is used for calculating the pose of the camera according to the matching result to obtain pose transformation parameters of the second frame image relative to the first frame image;

the map point resolving unit 4 is used for resolving map points of the feature point pairs in the matching result according to the pose transformation parameters and converting the two-dimensional feature point pairs into three-dimensional space points;

and the modeling unit 5 is used for generating a three-dimensional point cloud map of the current moment according to the three-dimensional space points.

Optionally, in some embodiments, the pose settlement unit specifically includes:

the characteristic point pair selection subunit is used for randomly extracting n characteristic point pairs from the matching result, wherein n is more than or equal to 8;

the intrinsic matrix resolving subunit is used for acquiring a camera internal parameter matrix of the monocular camera, calculating the normalized coordinates of the n characteristic point pairs according to the camera internal parameter matrix, and resolving according to the normalized coordinates of the n characteristic point pairs to obtain an intrinsic matrix;

the essence matrix optimization subunit is used for obtaining m essence matrices obtained by operating the essence matrix solving subunit m times, optimizing all the essence matrices according to a preset optimization algorithm to obtain an optimized target essence matrix, wherein m is more than or equal to 1;

and the essential matrix decomposition subunit is used for decomposing the target essential matrix to obtain pose transformation parameters.

Optionally, in some embodiments, the essence matrix optimization subunit is specifically configured to obtain m essence matrices obtained by operating the essence matrix solution subunit m times, and calculate a metric value of each essence matrix; and comparing the magnitude of the metric value of each essential matrix, and taking the essential matrix with the minimum metric value as the optimized target essential matrix.

Optionally, in some embodiments, the essence matrix optimization subunit is specifically configured to calculate the metric value of each essence matrix according to the following formula:

Optionally, in some embodiments, the map point calculating unit 4 specifically includes:

a characteristic point pair obtaining subunit, configured to obtain any characteristic point pair in the matching result, where any characteristic point pair includes a first characteristic point and a second characteristic point that have been matched;

the depth value calculation operator unit is used for calculating the depth value of the first characteristic point and the depth value of the second characteristic point in sequence according to the pose transformation parameter;

the depth average value operator unit is used for calculating the depth average value of the depth values of the first characteristic points and the second characteristic points;

and the three-dimensional coordinate calculating subunit is used for obtaining the three-dimensional coordinate of any characteristic point pair according to the two-dimensional coordinate of the first characteristic point, the two-dimensional coordinate of the second characteristic point and the depth average value, and generating a three-dimensional space point according to the three-dimensional coordinate.

Optionally, in some embodiments, the depth value operator unit is specifically configured to calculate the depth value of the first feature point according to the following formula:

s₁x₂×Rx₁+x₂×t＝0

s₂x₂＝s₁Rx₁+t

Optionally, in some embodiments, the method further comprises:

and the image optimization unit is used for performing beam adjustment on the previous h frame image of the current frame image after the matching unit 2 reads g frames of images, optimizing the pose transformation parameters and three-dimensional space points of the h frame image, and enabling g to be more than or equal to 1 and less than or equal to h.

It should be noted that the above embodiments are product embodiments corresponding to the previous method embodiments, and for the description of each optional implementation in the product embodiments, reference may be made to corresponding descriptions in the above method embodiments, and details are not described here again.

The reader should understand that in the description of this specification, reference to the description of the terms "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described method embodiments are merely illustrative, and for example, the division of steps into only one logical functional division may be implemented in practice in another way, for example, multiple steps may be combined or integrated into another step, or some features may be omitted, or not implemented.

The above method, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A SLAM method for weak texture scenes, comprising:

s1, acquiring an image sequence acquired by the monocular camera;

s6, repeating the steps S2 to S5 until all images in the image sequence are read, generating a three-dimensional point cloud map at all moments, and completing real-time positioning and map construction of the image sequence;

in step S3, the method specifically includes:

s32, acquiring a camera internal parameter matrix of the monocular camera, calculating the normalized coordinates of the n feature point pairs according to the camera internal parameter matrix, and resolving according to the normalized coordinates of the n feature point pairs to obtain an essential matrix;

s34, decomposing the target essential matrix to obtain pose transformation parameters;

in step S33, the method specifically includes:

s331, repeating the steps S31 to S32 m times to obtain m essential matrixes, and respectively calculating the measurement value of each essential matrix according to the following formula:

wherein C is a metric value, T is a preset threshold value, e_iFor the maximum likelihood error of the ith characteristic point pair, i ═1，2，...，n；

S332, comparing the magnitude of the metric value of each essential matrix, and taking the essential matrix with the minimum metric value as the optimized target essential matrix;

in step S4, the method specifically includes:

s41, acquiring any characteristic point pair in the matching result, wherein the any characteristic point pair comprises a first characteristic point and a second characteristic point which are matched;

s42, sequentially calculating the depth value of the first characteristic point and the depth value of the second characteristic point according to the pose transformation parameter;

s44, obtaining a three-dimensional coordinate of any one characteristic point pair according to the two-dimensional coordinate of the first characteristic point, the two-dimensional coordinate of the second characteristic point and the depth average value, and generating a three-dimensional space point according to the three-dimensional coordinate;

in step S42, the depth value of the first feature point is calculated according to the following formula:

s₁x₂×Rx₁+x₂×t＝0

s₂x₂＝s₁Rx₁+t

2. The SLAM method for weak texture scenes of claim 1, further comprising, before step S5:

3. A storage medium having stored therein instructions which, when read by a computer, cause the computer to execute the SLAM method for weak texture scenes according to claim 1 or 2.

4. A SLAM navigation vehicle for weak texture scenes comprising:

a memory for storing a computer program;

a processor for executing the computer program for implementing the SLAM method for weak texture scenes as claimed in claim 1 or 2.

5. A SLAM system for weak texture scenes, comprising:

the modeling unit is used for generating a three-dimensional point cloud map of the current moment according to the three-dimensional space points;

wherein, the position and posture resolving unit specifically comprises:

the essential matrix decomposition subunit is used for decomposing the target essential matrix to obtain pose transformation parameters;

the essence matrix optimization subunit is specifically configured to obtain m essence matrices obtained by operating the essence matrix solution subunit m times, and calculate a metric value of each essence matrix; comparing the magnitude of the metric value of each essential matrix, and taking the essential matrix with the minimum metric value as the optimized target essential matrix;

the essence matrix optimization subunit is specifically configured to calculate a metric value of each essence matrix according to the following formula:

wherein C is a metric value, T is a preset threshold value, e_iThe maximum likelihood error of the ith characteristic point pair is i ═ 1, 2.

The map point calculating unit specifically comprises:

the three-dimensional coordinate calculating subunit is used for obtaining a three-dimensional coordinate of any characteristic point pair according to the two-dimensional coordinate of the first characteristic point, the two-dimensional coordinate of the second characteristic point and the depth average value, and generating a three-dimensional space point according to the three-dimensional coordinate;

the depth value calculation operator unit is specifically configured to calculate a depth value of the first feature point according to the following formula:

s₁x₂×Rx₁+x₂×t＝0

s₂x₂＝s₁Rx₁+t