CN110319776B

CN110319776B - SLAM-based three-dimensional space distance measuring method and device

Info

Publication number: CN110319776B
Application number: CN201910596753.6A
Authority: CN
Inventors: 颜冰; 陈宇民; 代维菊; 马仪; 彭兆裕; 马御棠; 邹德旭; 王山; 洪志湖; 文刚; 龚泽威一
Original assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Current assignee: Electric Power Research Institute of Yunnan Power Grid Co Ltd
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2021-05-07
Anticipated expiration: 2039-07-03
Also published as: CN110319776A

Abstract

The application provides a three-dimensional space distance measuring method and device based on SLAM, wherein the method comprises the following steps: acquiring internal parameters of a camera; the method comprises the steps of carrying out shaking removal processing on a video to be processed to obtain a processed video; aiming at the processed video, calculating an initial depth value of the camera image corresponding to the real three-dimensional point; obtaining an external parameter of the camera according to the internal parameter and the initial depth value; and calculating the space distance of the real three-dimensional point according to the external parameters. The method and the device for measuring the three-dimensional space distance based on the SLAM can effectively solve the problems of inaccurate depth estimation and unstable characteristics caused by video jitter in the conventional distance measuring method.

Description

SLAM-based three-dimensional space distance measuring method and device

Technical Field

The application relates to the technical field of computer vision, in particular to a three-dimensional space distance measuring method and device based on SLAM.

Background

The existing distance measurement method generally adopts a calibration object in a scene to solve a monocular scale, which requires a marker in the scene, but the method is very inconvenient to implement in an actual scene. In order to eliminate the scale uncertainty of monocular SLAM (Simultaneous Localization and Mapping, instantaneous Localization and Mapping), when the ORB-SLAM (0 object Request Broker-Simultaneous Localization and Mapping, ORB feature-based three-dimensional Localization and Mapping) method is adopted, the ORB feature needs to be calculated once for each key frame at the front end, which is time-consuming. And the three-thread structure of the ORB-SLAM also imposes a heavy burden on the CPU. Therefore, LSD-SLAM (Large Scale Direct monomer-Simultaneous Localization and Mapping, ORB feature-based three-dimensional positioning and map construction) is mostly used for realizing reconstruction of semi-dense scenes, and time consumption of attitude estimation is reduced.

The LSD-SLAM method needs to initialize the image depth by a random number when estimating the depth map, and then continuously updates the reference frame and the depth map by an incremental stereo matching method, which may cause inaccuracy of depth estimation. Meanwhile, in practical application, video jitter causes instability of characteristics between video frames, which affects optimization effect of the SLAM back end based on characteristics.

Disclosure of Invention

The application provides a three-dimensional space distance measuring method and device based on SLAM, and aims to solve the problems of inaccurate depth estimation and unstable characteristics caused by video jitter in the conventional distance measuring method.

In a first aspect, the present application provides a method for measuring a three-dimensional spatial distance based on SLAM, where the method includes:

acquiring an internal parameter of a camera, wherein the internal parameter is a mapping relation between a pixel coordinate and a camera coordinate corresponding to a real three-dimensional point on a camera picture;

the method comprises the steps of carrying out shake removal processing on a video to be processed to obtain a processed video, wherein the video to be processed is a video obtained based on feature matching;

aiming at the processed video, calculating an initial depth value of the camera image corresponding to the real three-dimensional point;

obtaining an external parameter of the camera according to the internal parameter and the initial depth value, wherein the external parameter is a posture parameter corresponding to the camera;

and calculating the space distance of the real three-dimensional point according to the external parameters, wherein the space distance is the distance from the real three-dimensional point to the optical center of the camera.

Optionally, the acquiring intrinsic parameters of the camera includes:

acquiring 15-20 calibration pictures of different angles shot by a camera on a calibration plate;

and carrying out corner feature detection and feature matching on each calibration picture to obtain the internal parameters of the camera.

Optionally, the dithering removing processing is performed on the video to be processed, and obtaining the processed video includes:

matching feature points between adjacent frames of the camera picture by utilizing an SIFT feature matching method to obtain matching points;

adopting a random sampling consistency method to eliminate error points in the matching points to obtain effective matching points;

calculating the number of points of the average effective matching points of the frame to be processed and the two adjacent frames;

determining a maximum number of points and a minimum number of points in the number of points;

if the ratio of the minimum point number to the maximum point number is smaller than or equal to a preset jitter threshold value, determining the frame to be processed as a jitter frame;

and eliminating all jittering frames from the video to be processed to obtain the processed video.

Optionally, for the processed video, calculating an initial depth value of the real three-dimensional point corresponding to the camera image includes:

determining an initial reference frame and a sequence of key frames from the processed video;

matching the initial reference frame with each feature point in the key frame sequence to obtain a matching result;

calculating the parallax of each feature point in the adjacent key frames according to the matching result;

calculating a depth value corresponding to the feature point according to the parallax, the focal length of the camera and the baseline distance between two adjacent frames, wherein the depth value is the distance from the real three-dimensional point to the optical center of the camera;

and obtaining an initial depth value corresponding to the depth value of each characteristic point by adopting a least square method.

Optionally, the calculating the spatial distance of the real three-dimensional point according to the external parameters includes:

calculating the coordinate of the real three-dimensional point in a world coordinate system according to the following formula,

where u and v represent pixel coordinates of a projection of a true three-dimensional point in the camera image, f_xRepresenting the transverse proportional focal length of the camera, f_yRepresenting the longitudinal proportional focal length, u, of the camera₀And v₀Representing the coordinates of the principal point of the camera, R and t representing the extrinsic parameters of the camera, X_W、Y_W、Z_WRepresenting the coordinates of the real three-dimensional point in a world coordinate system;

the spatial distance of the true three-dimensional point is calculated according to the following formula,

where D represents the distance of the real three-dimensional point to the camera optical center.

In a second aspect, the present application provides a SLAM-based three-dimensional spatial distance measuring device, the device comprising:

the camera comprises an internal parameter acquisition unit, a parameter analysis unit and a parameter comparison unit, wherein the internal parameter acquisition unit is used for acquiring internal parameters of a camera, and the internal parameters are the mapping relation between corresponding pixel coordinates of a real three-dimensional point on a camera picture and camera coordinates;

the de-jittering processing unit is used for performing de-jittering processing on a video to be processed to obtain a processed video, wherein the video to be processed is a video obtained based on feature matching;

an initial depth value calculation unit, configured to calculate an initial depth value of the real three-dimensional point corresponding to the camera image for the processed video;

the external parameter calculating unit is used for obtaining an external parameter of the camera according to the internal parameter and the initial depth value, wherein the external parameter is a posture parameter corresponding to the camera;

and the spatial distance calculation unit is used for calculating the spatial distance of the real three-dimensional point according to the external parameters, wherein the spatial distance is the distance from the real three-dimensional point to the optical center of the camera.

Optionally, the internal parameter obtaining unit includes:

the calibration picture acquisition unit is used for acquiring 15-20 calibration pictures of different angles shot by the camera on the calibration plate;

and the internal parameter determining unit is used for carrying out corner feature detection and feature matching on each calibration picture to obtain the internal parameters of the camera.

Optionally, the debounce processing unit comprises:

a matching point obtaining unit, configured to match feature points between adjacent frames of the camera picture by using an SIFT feature matching method to obtain matching points;

the effective matching point determining unit is used for eliminating error points in the matching points by adopting a random sampling consistency method to obtain effective matching points;

the point number calculating unit is used for calculating the point number of the average effective matching points of the frame to be processed and the two adjacent frames;

a special dot number determination unit for determining a maximum dot number and a minimum dot number among the dot numbers;

a jitter frame determining unit, configured to determine that the frame to be processed is a jitter frame if a ratio of the minimum point number to the maximum point number is less than or equal to a preset jitter threshold;

and the jitter frame eliminating unit is used for eliminating all jitter frames from the video to be processed to obtain the processed video.

Optionally, the initial depth value calculation unit includes:

a special frame determining unit, configured to determine an initial reference frame and a key frame sequence from the processed video;

the matching result calculation unit is used for matching the initial reference frame with each feature point in the key frame sequence to obtain a matching result;

the parallax calculation unit is used for calculating the parallax of each feature point in the adjacent key frames according to the matching result;

the depth value calculating unit is used for calculating the depth value corresponding to the characteristic point according to the parallax, the focal length of the camera and the baseline distance between two adjacent frames, wherein the depth value is the distance from the real three-dimensional point to the optical center of the camera;

and the initial depth value obtaining unit is used for obtaining the initial depth value corresponding to the depth value of each characteristic point by adopting a least square method.

Optionally, the spatial distance calculation unit includes:

a world coordinate calculation unit for calculating the coordinates of the real three-dimensional point in a world coordinate system according to the following formula,

a spatial distance obtaining unit for calculating a spatial distance of the real three-dimensional point according to the following formula,

where D represents the distance of the real three-dimensional point to the camera optical center. The method comprises the following steps:

as can be seen from the above technologies, the present application provides a method and an apparatus for measuring a three-dimensional spatial distance based on SLAM, where the method includes: acquiring an internal parameter of a camera, wherein the internal parameter is a mapping relation between a pixel coordinate and a camera coordinate corresponding to a real three-dimensional point on a camera picture; the method comprises the steps of carrying out shaking removal processing on a video to be processed to obtain a processed video; aiming at the processed video, calculating an initial depth value of the camera image corresponding to the real three-dimensional point; obtaining an external parameter of the camera according to the internal parameter and the initial depth value; and calculating the space distance of the real three-dimensional point according to the external parameters. When the method is used, the camera is used for shooting an image of a real three-dimensional point, and internal parameters of the camera are determined according to the image of the real three-dimensional point; and then, carrying out shake removal processing on the video formed by the images of the cameras to obtain a processed video, and calculating the initial depth value of the real three-dimensional point according to the internal parameters and the processed video. And finally, calculating to obtain an external parameter of the camera, namely an attitude parameter of the camera by using the internal parameter and the initial depth value, and accurately calculating the space distance of the real three-dimensional point according to the external parameter. The method and the device for measuring the three-dimensional space distance based on the SLAM can effectively solve the problems of inaccurate depth estimation and unstable characteristics caused by video jitter in the conventional distance measuring method.

Drawings

In order to more clearly explain the technical solution of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a flowchart of a three-dimensional spatial distance measurement method based on SLAM according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a method for acquiring intrinsic parameters of a camera according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for video de-jittering according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a method for calculating initial depth values according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a SLAM-based three-dimensional spatial distance measuring device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart of a method for measuring a three-dimensional spatial distance based on SLAM according to an embodiment of the present application is shown, where the method includes:

s1, obtaining intrinsic parameters of the camera, wherein the intrinsic parameters are the mapping relation between the corresponding pixel coordinates of the real three-dimensional points on the camera picture and the camera coordinates.

Aiming at the problem of distance measurement from a camera to a three-dimensional space point, the invention aims to provide a three-dimensional space distance measurement method based on an SLAM, which can eliminate the scale uncertainty of the monocular SLAM without markers in a scene, eliminate the influence of the distribution of characteristic points on the distance measurement precision, weaken the influence of video jitter on the optimization effect of the SLAM rear end, and estimate a depth map more accurately.

And acquiring the internal parameters of the camera by adopting a Zhangyingyou calibration method.

Specifically, as shown in fig. 2, a flowchart of a method for acquiring intrinsic parameters of a camera provided in an embodiment of the present application is provided, where the method includes:

s101, acquiring 15-20 calibration pictures of different angles shot by a camera on a calibration plate;

s102, performing corner feature detection and feature matching on each calibration picture to obtain internal parameters of the camera.

After the angular point feature detection and the feature matching are carried out on each calibration picture, the mapping relation between the pixel coordinates of the camera picture and the camera coordinates can be obtained by calculation

Where u and v represent pixel coordinates of a projection of a true three-dimensional point in the camera image, f_xRepresenting the transverse proportional focal length of the camera, f_yRepresenting the longitudinal proportional focal length, u, of the camera₀And v₀Representing the principal point coordinates, X, of the camera_C、Y_C、Z_CRepresenting the coordinates of the real three-dimensional point in the camera coordinate system.

It should be noted that this step can be skipped if there are camera intrinsic parameters that have been calibrated in advance.

And S2, performing shake removal processing on the video to be processed to obtain the processed video, wherein the video to be processed is the video obtained based on feature matching.

Specifically, as shown in fig. 3, a flowchart of a method for video de-jittering provided by an embodiment of the present application is shown, where the method includes:

s201, matching feature points between adjacent frames of a camera picture by utilizing an SIFT feature matching method to obtain matching points;

s202, eliminating error points in the matching points by adopting a random sampling consistency method to obtain effective matching points;

s203, calculating the number of points of the average effective matching points of the frame to be processed and the two adjacent frames;

s204, determining the maximum point number and the minimum point number in the point numbers;

s205, if the ratio of the minimum point number to the maximum point number is less than or equal to a preset jitter threshold, determining that the frame to be processed is a jitter frame;

s206, all the jittering frames are removed from the video to be processed, and the processed video is obtained.

For input n-frame video sequence, matching adjacent frame time by SIFT (Scale-invariant Feature Transform) Feature matching methodAnd (4) eliminating error points in all the matching points by adopting a random sampling consistency method to obtain effective matching points. Assuming that the frame to be processed is the ith frame, the adjacent frames are the (i-1) th frame and the (i + 1) th frame, and it should be noted that if the frame is the first and last frames, there is only one adjacent frame. The calculated average effective matching point number is s_iS can be determined_iThe minimum number of points in is s_minThe maximum number of dots is s_maxThen the ratio of the minimum number of points to the maximum number of points is

Assume a preset jitter threshold of s_aThen if

The ith frame is a jitter frame, and the jitter frame will not continuously participate in the selection and loop-back detection process of the key frame of the LSD-SLAM algorithm; if it is not

Then the ith frame is not a jittery frame and should be retained.

And S3, aiming at the processed video, calculating an initial depth value of the camera image corresponding to the real three-dimensional point.

Specifically, as shown in fig. 4, a flowchart of a method for calculating an initial depth value provided in an embodiment of the present application is shown, where the method includes:

s301, determining an initial reference frame and a key frame sequence from the processed video;

s302, matching the initial reference frame with each feature point in the key frame sequence to obtain a matching result;

s303, calculating the parallax of each feature point in the adjacent key frames according to the matching result;

s304, calculating a depth value corresponding to the feature point according to the parallax, the focal length of the camera and the baseline distance between two adjacent frames, wherein the depth value is the distance from the real three-dimensional point to the optical center of the camera;

s305, obtaining an initial depth value corresponding to the depth value of each feature point by adopting a least square method.

Determining an initial reference frame f from a processed video_rAnd a sequence of key frames F_imIf the set of feature points in the keyframes is F, a pair of initial keyframes can be corrected based on the matching result, and each feature point P is calculated according to the following equation_jIn the case of the disparity of the neighboring key frames,

Z＝f·T/x_d

wherein Z represents the depth value, f represents the focal length of the camera, T represents the baseline distance between two adjacent frames, and x_dRepresenting parallax.

Finally, the initial depth value of each feature point can be calculated by using the least square method.

And S4, obtaining an external parameter of the camera according to the internal parameter and the initial depth value, wherein the external parameter is a posture parameter corresponding to the camera.

The video sequence is input into the LSD-SLAM, so that the processes of tracking, map construction and loop detection can be completed, the attitude estimation of the camera, namely the external parameters of the camera, and a semi-dense scene reconstruction can be obtained.

And S5, calculating the space distance of the real three-dimensional point according to the external parameters, wherein the space distance is the distance from the real three-dimensional point to the optical center of the camera.

Specifically, the coordinates of the real three-dimensional point in the world coordinate system are calculated according to the following formula,

Referring to fig. 5, a schematic diagram of a SLAM-based three-dimensional spatial distance measuring device according to an embodiment of the present application is shown, where the device includes:

the camera system comprises an internal parameter acquisition unit 1, a parameter analysis unit and a parameter comparison unit, wherein the internal parameter acquisition unit is used for acquiring internal parameters of a camera, and the internal parameters are the mapping relation between corresponding pixel coordinates of a real three-dimensional point on a camera picture and camera coordinates;

the de-jittering processing unit 2 is used for performing de-jittering processing on a video to be processed to obtain a processed video, wherein the video to be processed is a video obtained based on feature matching;

an initial depth value calculating unit 3, configured to calculate, for the processed video, an initial depth value of the camera image corresponding to the real three-dimensional point;

the external parameter calculating unit 4 is used for obtaining an external parameter of the camera according to the internal parameter and the initial depth value, wherein the external parameter is a posture parameter corresponding to the camera;

and the spatial distance calculation unit 5 is configured to calculate a spatial distance of the real three-dimensional point according to the external parameter, where the spatial distance is a distance from the real three-dimensional point to an optical center of a camera.

Optionally, the intrinsic parameter acquiring unit 1 includes: the calibration picture acquisition unit is used for acquiring 15-20 calibration pictures of different angles shot by the camera on the calibration plate; and the internal parameter determining unit is used for carrying out corner feature detection and feature matching on each calibration picture to obtain the internal parameters of the camera.

Optionally, the debounce processing unit 2 comprises: a matching point obtaining unit, configured to match feature points between adjacent frames of the camera picture by using an SIFT feature matching method to obtain matching points; the effective matching point determining unit is used for eliminating error points in the matching points by adopting a random sampling consistency method to obtain effective matching points; the point number calculating unit is used for calculating the point number of the average effective matching points of the frame to be processed and the two adjacent frames; a special dot number determination unit for determining a maximum dot number and a minimum dot number among the dot numbers; a jitter frame determining unit, configured to determine that the frame to be processed is a jitter frame if a ratio of the minimum point number to the maximum point number is less than or equal to a preset jitter threshold; and the jitter frame eliminating unit is used for eliminating all jitter frames from the video to be processed to obtain the processed video.

Optionally, the initial depth value calculation unit 3 includes: a special frame determining unit, configured to determine an initial reference frame and a key frame sequence from the processed video; the matching result calculation unit is used for matching the initial reference frame with each feature point in the key frame sequence to obtain a matching result; the parallax calculation unit is used for calculating the parallax of each feature point in the adjacent key frames according to the matching result; the depth value calculating unit is used for calculating the depth value corresponding to the characteristic point according to the parallax, the focal length of the camera and the baseline distance between two adjacent frames, wherein the depth value is the distance from the real three-dimensional point to the optical center of the camera; and the initial depth value obtaining unit is used for obtaining the initial depth value corresponding to the depth value of each characteristic point by adopting a least square method.

Optionally, the spatial distance calculating unit 5 includes: a world coordinate calculation unit for calculating the coordinates of the real three-dimensional point in a world coordinate system according to the following formula,

It should be noted that, in specific implementations, the present invention also provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, the program may include some or all of the steps in each embodiment of the user identity service providing method or the user registration method provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A three-dimensional space distance measuring method based on SLAM is characterized by comprising the following steps:

2. The method of claim 1, wherein the acquiring intrinsic parameters of the camera comprises:

3. The method of claim 1, wherein the de-jittering the to-be-processed video to obtain the processed video comprises:

4. The method of claim 1, wherein calculating, for the processed video, initial depth values for real three-dimensional points corresponding to a camera image comprises:

5. The method of claim 1, wherein said calculating the spatial distance of the true three-dimensional point from the extrinsic parameters comprises:

where u and v represent the pixel coordinates of the projection of a real three-dimensional point in the camera image, f_xRepresenting the transverse proportional focal length of the camera, f_yRepresenting the longitudinal proportional focal length, u, of the camera₀And v₀Representing the coordinates of the principal point of the camera, R and t representing the extrinsic parameters of the camera, X_W、Y_W、Z_WRepresenting the coordinates of the real three-dimensional point in a world coordinate system;

6. A SLAM-based three-dimensional spatial distance measuring apparatus, comprising:

7. The apparatus of claim 6, wherein the intrinsic parameter obtaining unit comprises:

8. The apparatus of claim 6, wherein the debounce processing unit comprises:

9. The apparatus of claim 6, wherein the initial depth value calculation unit comprises:

10. The apparatus according to claim 6, wherein the spatial distance calculating unit comprises: