CN112258618A - Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map - Google Patents

Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map Download PDF

Info

Publication number
CN112258618A
CN112258618A CN202011213908.2A CN202011213908A CN112258618A CN 112258618 A CN112258618 A CN 112258618A CN 202011213908 A CN202011213908 A CN 202011213908A CN 112258618 A CN112258618 A CN 112258618A
Authority
CN
China
Prior art keywords
map
point cloud
segmentation
geometric
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011213908.2A
Other languages
Chinese (zh)
Other versions
CN112258618B (en
Inventor
李京
龚建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aerospace Information Research Institute of CAS
Original Assignee
Aerospace Information Research Institute of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aerospace Information Research Institute of CAS filed Critical Aerospace Information Research Institute of CAS
Priority to CN202011213908.2A priority Critical patent/CN112258618B/en
Publication of CN112258618A publication Critical patent/CN112258618A/en
Application granted granted Critical
Publication of CN112258618B publication Critical patent/CN112258618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/04Texture mapping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing

Abstract

The invention discloses a semantic mapping and positioning method based on the fusion of a prior laser point cloud and a depth map, which comprises the following steps: s1, collecting prior laser point cloud data; s2, acquiring a depth image and an RGB image, generating an RGB-D point cloud based on the depth image, and carrying out initialization registration on the prior laser point cloud and the RGB-D point cloud; s3, providing camera pose constraint through the registered prior laser point cloud to correct the camera pose; s4, constructing a three-dimensional geometric point cloud map by adopting a front and back window optimization method; s5, performing geometric incremental segmentation on the three-dimensional geometric point cloud map, performing object identification and semantic segmentation on the RGB image, and fusing geometric incremental segmentation and semantic segmentation results to obtain a 3D geometric segmentation map of semantic enhanced geometric segmentation; and S6, performing semantic association and segmentation probability distribution updating on the object to complete the construction of the semantic map. The invention can effectively eliminate the accumulated error of large-scale indoor map building and positioning, and has high precision and real-time performance.

Description

Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
Technical Field
The invention relates to the technical field of positioning and map construction, in particular to a semantic map construction and positioning method based on the fusion of a prior laser point cloud and a depth map.
Background
At present, the most main technical difficulties hindering application of indoor and outdoor large scene augmented reality fusion, unmanned driving, robot navigation positioning and the like are dynamic tracking and recording of camera position and posture of a real scene and three-dimensional semantic map construction. The semantic map is characterized in that a traditional three-dimensional point cloud map is endowed with corresponding attribute values (such as the ground, walls or buildings), as the global satellite positioning system cannot meet the requirements on precision and robustness in many application scenes (such as outdoor high-precision augmented reality and unmanned driving), the error of an inertial attitude sensor gradually increases along with the drift of time, and the real-time three-dimensional drawing and camera position attitude calculation by utilizing computer vision and three-dimensional laser radar point cloud are continuously developed and perfected. The conventional method includes identification of markers previously placed outdoors from an image, matching based on natural feature points, and an algorithm based on a three-dimensional model of a known environment. In recent years, camera positioning is a very active research field at present due to the rapid development of mobile computing, computer vision, Augmented Reality (AR), indoor GIS and simultaneous positioning and mapping (SLAM) technologies. The use of mobile terminals and Unmanned Aerial Vehicles (UAVs) is increasing, and the development of the technology is promoted to meet the practical application requirements in the fields of AR, robotics, indoor navigation positioning, automatic driving and the like. In these emerging applications, camera localization and semantic mapping are often a key and viable technology. Recent sensors such as portable LiDAR ranging LiDAR and RGB-D cameras supplement the RGB image with depth information, which provides new opportunities for developing robust and practical applications, but RGB cameras alone are far from adequate.
Camera positioning is a key step in both online SLAM (Simultaneous positioning And composition) And offline SFM (Structure From Motion). SLAM incrementally estimates the pose of a mobile platform and simultaneously constructs a surrounding environment map, and due to the autonomous positioning and environment perception capabilities of SLAM, SLAM becomes a key prerequisite for autonomous operation of unmanned systems such as robots in unknown environments. Visual SLAM is an algorithm that uses a camera as a data input sensor, widely used in indoor environment mobile platforms. Compared with distance measuring instruments such as laser radars, the visual sensor has the advantages of low power consumption and small size, and can provide more abundant environment texture information for the mobile platform. Thus, visual SLAM has attracted increasing attention in the research community, for example, many researchers have attempted to improve the efficiency and robustness of visual SLAM methods. In the aspect of improving the efficiency, some feature extraction algorithms are provided, such as SURF features, BRISK features and ORB features. Furthermore, some algorithms introduce parallel computations to improve efficiency, such as parallel tracking and mapping for small-scene augmented reality AR workspaces. The first SLAM algorithm separates the positioning and mapping functions into two threads, and achieves real-time performance. A recent interesting development in the real-time visual SLAM algorithm is called ORB-SLAM3, which incorporates recent technology and uses real-time ORB functions for tracking, patterning, positioning and loop closure in large indoor and outdoor environments. In addition, with the rapid development of new sensors, new applications and new computing methods represented by deep learning, the positioning and mapping (SLAM) algorithm gradually breaks through the limitations of traditional computing methods based on filters, small scenes, pixel-by-pixel matching and the like, and develops towards a more intelligent direction. At present, the Semantic SLAM can acquire geometric structure information of a scene, can identify independent individual objects in the scene, acquire Semantic information such as positions, postures and functional attributes of the objects, and provide support for completing intelligent service tasks in complex scenes. The semantic SLAM serves as a research frontier and a hotspot in the current computer vision field and provides underlying algorithm support for hotspot applications such as augmented reality, scene understanding, robots, navigation positioning, man-machine interaction, automatic driving and the like, so that the simultaneous positioning and semantic map construction technology with high precision, high robustness and high real-time has very important scientific significance and application value semantics. With the recent development of two-dimensional image semantic segmentation and object detection algorithms, many methods effectively combine these algorithms with the most advanced SLAM method to obtain a highly accurate semantically segmented three-dimensional map. In the current mainstream methods, on one hand, a priori known 3D models are needed, since each instance of semantic class (such as chair) can change significantly even in a single environment, and an accurate 3D model is needed for all the changes, which is limited in practical implementation, and on the other hand, a semantic label is allocated to each 3D point by using a probabilistic model such as a hierarchical conditional random field, which consumes excessive computing resources and memory space, and these methods still compromise the real-time performance of the semantic SLAM algorithm.
Online real-time methods (such as SLAM) inevitably accumulate errors as the camera progresses, and the position and orientation errors of the camera typically accumulate over time, especially in large scenarios, and may subsequently destroy the usability of the application. Currently, application scenarios are generally limited to a small workspace, but the robustness and real-time of unknown camera positioning in large indoor and outdoor scenarios remains challenging and may prevent widespread use of positioning and mapping techniques; in addition, because the illumination conditions and texture features of large indoor and outdoor scenes are complex and dynamically changed, the map is built based on a single RGB video image or laser radar point cloud, feature points are often lost, or matching between point cloud frames fails, so that the single sensor cannot meet the requirements of large scene three-dimensional semantic reconstruction and positioning on robustness and instantaneity in an unknown environment.
Disclosure of Invention
The invention aims to provide a semantic mapping and positioning method based on the fusion of a prior laser point cloud and a depth map, which aims to solve the technical problems in the prior art, effectively eliminate the accumulated error of large-scale indoor mapping and positioning and have high mapping and positioning precision and real-time performance.
In order to achieve the purpose, the invention provides the following scheme: the invention provides a semantic mapping and positioning method based on the fusion of a prior laser point cloud and a depth map, which comprises the following steps:
s1, collecting integral prior laser point cloud data of a scene, and preprocessing the collected data;
s2, acquiring a depth image and an RGB image of a scene in real time through a camera fixed on a mobile platform, generating an RGB-D point cloud based on the depth image, and performing initialization registration on preprocessed prior laser point cloud data and the RGB-D point cloud by adopting a deep learning positioning neural network and a nearest neighbor ICP iterative algorithm;
s3, providing camera pose constraint through the registered prior laser point cloud data, correcting the camera pose and completing camera positioning;
s4, constructing a three-dimensional geometric point cloud map by adopting a multithread map segmentation front-back window optimization method based on camera pose constraints provided by prior laser point cloud data;
s5, performing geometric incremental segmentation on the three-dimensional geometric point cloud map, performing object identification and semantic segmentation on each frame of RGB image by adopting an object detector based on a convolutional network, fusing a geometric incremental segmentation result and a semantic segmentation result, and acquiring an object boundary to obtain a 3D geometric segmentation map of scene semantic enhanced geometric segmentation;
s6, updating the 3D geometric segmentation map subjected to semantic enhancement geometric segmentation through the correction result of the camera pose in the step S3, performing semantic association and segmentation probability distribution updating on the objects in the scene based on the updating result, and completing construction of the semantic map.
Preferably, in step S2, the method for performing initial registration on the preprocessed a priori laser point cloud data and the RGB-D point cloud includes:
firstly, constructing a deep learning network, aligning preprocessed prior laser point cloud data with the RGB-D point cloud to obtain continuous video frames marked by a camera pose, and training the deep learning positioning neural network through continuous video frame images and the camera pose;
secondly, inputting RGB image video frames collected by a camera into a trained deep learning positioning neural network to obtain an initial camera pose;
and updating the initial camera pose by adopting an ICP iterative algorithm to complete the initialization registration of the prior laser point cloud data and the RGB-D point cloud.
Preferably, the step S3 specifically includes the following steps:
s3.1, selecting a key frame to be corrected from a depth image acquired by a camera in real time, and generating RGB-D point cloud data through dynamic object elimination and depth image data denoising;
s3.2, acquiring local prior laser point cloud data according to the initial posture of the camera generated by the visual tracking thread;
and S3.3, based on the local prior laser point cloud data, acquiring the camera pose of the key frame to be corrected by adopting a point-to-surface ICP iterative algorithm, and transmitting the camera pose of the key frame to be corrected to a local map through a key frame visibility map.
Preferably, in the step S3.1, RGB-D point cloud data is generated based on the depth information of the camera and the pose of the tracking thread.
Preferably, the step S4 includes:
s4.1, acquiring key frames and 3-D ORB characteristic points of the depth images, and constructing a sparse map;
s4.2, adopting camera pose constraint provided by prior laser point cloud data to locally optimize the constructed sparse map;
s4.3, performing global optimization on the map subjected to local optimization by adopting a multithreading map segmentation optimization method;
and S4.4, further optimizing the map after global optimization by adopting a double-window optimization method to obtain the three-dimensional geometric point cloud map.
Preferably, in step S4.4, the dual-window optimization method includes a front window and a rear window; the front window constraint is a pose-map point constraint, and the front window is used for controlling the construction of a local map; the rear window constraint is a pose-pose constraint between the rear window and the front window key frame, and a pose-map point constraint between the rear window and the front window map point; the rear window is used to control global consistency of the map.
Preferably, the front window comprises a plurality of key frames in the map segment, a plurality of key frames with overlapped front and back map segments, and a map point set observed by the key frames; the back window includes the optimized set of keyframes in the front window.
Preferably, the step S5 specifically includes:
s5.1, segmenting the three-dimensional geometric point cloud map by adopting a normal vector edge analysis method of a concave punishment mechanism and a three-dimensional vertex distance threshold segmentation method to obtain a geometric segmentation map of the depth image;
s5.2, performing semantic segmentation and object identification on each frame of RGB image by adopting an object detector based on a convolutional network to obtain class information and class probability of an object and an object boundary segmentation map;
s5.3, carrying out superposition statistics on the geometric segmentation map and the object boundary segmentation map of the depth image, detecting whether the same object is divided into different parts or not according to the state that different image labels in the geometric segmentation map of the depth image fall into a first object boundary frame of the object boundary segmentation map, if so, updating the boundary of the object in the object boundary segmentation map, and obtaining the 3D geometric segmentation map of the scene semantic enhanced geometric segmentation.
Preferably, in step S6, the segmentation probability of each object in the 3D geometric segmentation map assigned to the semantically enhanced geometric segmentation is updated in real time by using a confidence-based method.
Preferably, the confidence-based method specifically includes:
based on the camera pose correction result in step S3, back-projecting the 3D geometric segmentation map of the semantic enhanced geometric segmentation to the current camera plane to obtain a label rendering segmentation map; and adopting an object detector based on a convolutional network to carry out object identification on the label rendering segmentation map to obtain a second object boundary box and a corresponding segmentation probability, associating the second object boundary box with an object boundary box in the 3D geometric segmentation map of the semantic enhanced geometric segmentation, and updating the object segmentation probability in the 3D geometric segmentation map of the semantic enhanced geometric segmentation through the segmentation probability corresponding to the second object boundary box to complete the construction of the semantic map.
The invention discloses the following technical effects:
(1) the invention provides a semantic mapping and positioning method based on fusion of a priori laser point cloud and a depth map, which is characterized in that the priori laser point cloud data is used as the knowledge of indoor 3-D scene construction and navigation, the high-precision geometric control information in the laser radar point cloud data of the known scene is utilized to eliminate the accumulated error of an RGB-D SLAM system in large-scale indoor operation, and the mapping and positioning precision is obviously improved;
(2) the invention combines deep learning and geometric SLAM system, realizes automatic initialization and motion recovery, thereby improving the stability of the system in severe environment;
(3) the method adopts a double-window multithreading map segment optimization method, ensures that the map building and positioning system runs in real time and at high precision under the guidance of prior laser point cloud data, constructs a large-scale precise and consistent dense map, and ensures the accuracy and the real-time performance of map construction; compared with centimeter-precision prior laser point cloud data, the standard deviation of 3D map construction in the travel distance of 100m is about 20cm, the relative accumulated error of the camera in a closed loop experiment is about 0.1%, and the relative accumulated error is improved by two orders of magnitude compared with a typical SLAM algorithm (the accumulated error is 3% -5%); in addition, 2D abundant texture image data from a camera and high-precision 3D ground laser point cloud data are automatically fused, so that the defect of insufficient texture information of the laser point cloud data is overcome;
(4) the invention adopts a probability updating mechanism based on the segmentation class labels instead of map elements, greatly reduces the calculation complexity and the consumption of calculation resources, and can ensure the real-time construction of the high-precision semantic map.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a semantic mapping and positioning method based on the fusion of a prior laser point cloud and a depth map according to the present invention;
FIG. 2 is prior to pre-processing and post-processing prior laser point cloud data in an embodiment of the invention; wherein, fig. 2(a) is prior laser point cloud data before preprocessing, and fig. 2(b) is prior laser point cloud data after preprocessing;
FIG. 3 is a schematic diagram of obtaining local prior laser point cloud data according to a camera pose provided by a tracking thread in an embodiment of the present invention;
FIG. 4 shows the registration result of RGB-D depth point cloud and prior lidar point cloud in the embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Referring to fig. 1, the present embodiment provides a semantic mapping and positioning method based on a priori laser point cloud and depth map fusion, which specifically includes the following steps:
s1, collecting integral prior laser point cloud data of a scene, and preprocessing the collected data;
in the embodiment, a 3D ground LiDAR scanner is adopted to collect prior laser point cloud data; A3D terrestrial LiDAR scanner can provide a measurement range of 100m to 1000m with a measurement accuracy of centimeters or more, a measurement frequency of 300,000 points/second, and a scanning field of view of 100 DEG x 360 deg. And combining the single-station laser scanning data together to form the integral laser point cloud data of the scene.
In the embodiment, the collected prior laser point cloud data is preprocessed by adopting a downsampling and KD-tree data structure; the method comprises the steps of utilizing cloudcompare point cloud processing software, adopting a voxel with the resolution of 5cm to carry out down-sampling on prior laser point cloud data, reducing the capacity of the laser point cloud data, and keeping the maximum geometric information of the point cloud data with the minimum data size. Because the 3D ground LiDAR scanner has a large data output, the invention can effectively improve the high efficiency and the real-time performance of semantic mapping and positioning through preprocessing. The prior laser point cloud data before and after the preprocessing are respectively shown in fig. 2(a) and fig. 2 (b).
S2, acquiring a depth image and an RGB image of a scene in real time through a camera fixed on a mobile platform, generating RGB-D point cloud based on the depth image, and performing initialization registration on preprocessed prior laser point cloud data and the RGB-D point cloud by adopting a deep learning positioning neural network and an ICP (Iterative Closest neighbor) Iterative algorithm; wherein the camera is an RGB-D camera.
If the prior laser point cloud data can not be accurately matched with the coordinate frame of the RGB-D point cloud, the prior laser point cloud data can not be used as the basis for subsequent optimization. The method adopts a deep learning positioning neural network to align the preprocessed prior laser point cloud data with RGB-D point cloud; the deep learning positioning neural network is converted from GoogLeNet, the GoogLeNet is a 22-layer convolution network and is provided with six initial modules and two additional intermediate classifiers, and an affine regression is used for replacing all classifiers in the GoogLeNet; at the same time, the softmax layer in google lenet is removed and each final fully-connected layer is modified to output a 7-dimensional pose vector representing the camera position and orientation.
The specific registration method comprises the following steps:
firstly, constructing a deep learning network, manually aligning prior laser point cloud data with RGB-D point cloud to obtain continuous video frames marked by camera poses, training the deep learning positioning neural network by using the continuous video frame images and the camera poses obtained by manually aligning the corresponding video frame images, and generating an average position error of 1m and an average angle error of 4 degrees; and the average position error and the average angle error are errors relative to the pose generated by manually matching the prior laser point cloud data and the RGB-D point cloud.
Secondly, inputting an RGB image video frame acquired by a camera into a trained deep learning positioning neural network to obtain a rough initial camera pose, wherein the rough initial camera pose has an average position error of 1m and an average angle error of 4 degrees, namely the prior laser point cloud data is not aligned with the RGB-D point cloud.
And updating the initial camera pose based on the rough initial camera pose by adopting an ICP iterative algorithm to obtain an accurate initial camera pose, finishing the initialization accurate registration of the prior laser point cloud data and the RGB-D point cloud, and finishing the automatic initialization of the RGB-D SLAM algorithm system guided by the laser data in real time. After initialization, the prior laser point cloud data is used as a true value data framework to provide real-time constraint for positioning and mapping.
S3, providing camera pose constraint through the registered prior laser point cloud data, correcting the camera pose and completing camera positioning; the method specifically comprises the following steps:
s3.1, selecting a key frame to be corrected from a depth image acquired by a camera in real time, and generating RGB-D point cloud data through dynamic object elimination and depth image data denoising;
firstly, selecting a key frame to be corrected:
in a typical visual SLAM approach, errors in camera pose estimation typically accumulate over time. Although it is not necessary to correct the accumulated error for each frame using a priori laser point cloud data, the accumulated error must be kept at a sufficiently low level within the initial range required for the ICP to converge to a global optimum. In order to ensure drift error free and real-time performance, in the present embodiment, the accumulated error of 50 key frames per interval is eliminated by a priori laser point cloud data. The interval of every 50 key frames is chosen because it is experimentally verified that the choice can maintain the best balance between successful convergence of the ICP algorithm and the real-time performance of the system. For example, if the interval of the keyframes is less than 15, the a priori laser point cloud data optimization thread will lag the tracking thread and cause a delay in the camera pose correction of the current frame. Intervals of more than 60 keyframes produce a higher error accumulation and result in failure of ICP to converge to the global optimum because the camera coarse pose accumulation error provided by the tracking thread is large and does not allow the ICP algorithm to converge successfully.
Furthermore, due to the complexity of the test environment, the keyframes to be corrected are further selected according to the actual operating conditions of the proposed algorithm. For example, in the case where the camera is rotated rapidly, since the tracking image becomes blurred and the feature matching failure rate increases, the accumulated error of the camera increases accordingly, and the interval between correction key frames needs to be narrowed. Therefore, the selection criteria of the keyframes to be corrected are: whether the change of the positive direction of the camera exceeds a preset threshold value or not before and after two adjacent key frames, if so, the camera rotates too fast, and key frame correction is required; in this embodiment, the preset threshold is 47 degrees.
Secondly, carrying out dynamic object elimination, cavity filling and denoising treatment on the depth image:
the embodiment adopts the consistency of the optical flow to detect the dynamic object and eliminates the detected dynamic object so as to eliminate the influence of the data of the dynamic object;
additionally, the RGB-D camera employs optical coding techniques to acquire depth images of a scene. Compared with a gray image, the depth image has object three-dimensional characteristic information and is not influenced by the irradiation direction of a light source, the emission characteristic of the surface of an object and the shadow-free characteristic, so that the three-dimensional geometry of the surface of the target object can be more clearly expressed. Compared with the method for extracting the geometric characteristics of the three-dimensional object from the gray level image, the depth image processing has no special limitation on the geometric and physical characteristics of the scenery, and the three-dimensional information is directly utilized, so that the problems of identification and positioning of the three-dimensional object are greatly simplified. However, the illumination change of the scene during measurement may also cause instability of depth data to a certain extent, and due to the infrared reflection characteristics of the surface material of the object, noise may be generated and voids may be generated at the edge and the shielding area of the object, which is not favorable for subsequent mapping and positioning. Therefore, the depth image needs to be subjected to cavity filling and noise removal processing, so that the accuracy of subsequent map construction and positioning is improved.
Hole filling is the estimation of depth values for hole locations using pixel depth values of the neighborhood around the hole, using spatial correlation. Due to the influence of complex environment, the RGB-D camera cannot directly obtain the depth information of the hole, and because surrounding pixels contain continuous depth values of the depth information in a small neighborhood range, the depth values of the surrounding pixels of the depth map can be used for estimating the missing depth information.
Thirdly, based on the denoised depth image, generating RGB-D point cloud data of the key frame to be corrected:
after a keyframe to be corrected is selected and depth image noise elimination is performed, RGB-D point cloud data are generated based on depth information of the RGB-D camera and a rough pose of a tracking thread. Because the data volume of the RGB-D point cloud is huge, the efficiency of the ICP iterative algorithm is influenced, and therefore the RGB-D point cloud voxels are downsampled into point clouds with the resolution of 6cm, and the matching of the RGB-D sparse point cloud data of the key frame to be corrected and the prior laser point cloud data is quickly obtained.
S3.2, acquiring local prior laser point cloud data according to the initial posture of the camera generated by the visual tracking thread;
in order to acquire the accurate pose of the camera from the local prior laser point cloud data, the local prior laser point cloud data is required in addition to the RGB-D point cloud generated by the key frame to be corrected according to the depth image information. According to the rough camera pose visually tracked by the key frame to be corrected, filtering the prior laser point cloud data of the whole scene to obtain local point cloud data to be corrected around the rough pose of the key frame to be corrected, as shown in fig. 3, ensuring that the local prior laser point cloud data and the RGB-D point cloud of the key frame to be corrected have higher overlapping degree, and ensuring that a subsequent iterative algorithm converges to a global optimal value; the overlapping degree between the acquired local prior laser point cloud data and the RGB-D point cloud of the key frame to be corrected depends on the accumulated error of the camera pose of the key frame to be corrected. In this embodiment, the filtering method for obtaining the local prior laser point cloud data according to the rough camera pose is as follows:
Figure BDA0002759697080000101
di=(qi(x)-tx)*(qi(x)-tx)+(qi(y)-ty)*(qi(y)-ty)
+(qi(z)-tz)*(qi(z)-tz)
Figure BDA0002759697080000102
q={qi|di<7.0&&θi<θ0+5°}
in the formula, a1、a2、a3、b1、b2、b3、c1、c2、c3For the corresponding component of the rotation matrix of the camera pose, TwcThe matrix is an inverse matrix of a camera pose matrix of the current key frame to be corrected; q is prior laser point cloud data, Qi∈Q;tx,ty,tzThe coordinates of the camera center under a prior laser point cloud data coordinate system are obtained; diThe distance from the ith prior laser point cloud data to the center of the camera; thetaiAn included angle between a vector formed by the ith prior laser point cloud data coordinate and the camera center coordinate and a forward direction vector of the camera is formed; theta0Is the maximum field angle of the camera; q is a prior laser point cloud set which is less than 7 meters away from the center of the camera and is in a field range with a field angle more than 5 degrees in front of the camera, and high overlapping degree of local prior laser point cloud data and RGB-D point cloud of a key frame to be corrected can be ensured. Because the prior laser point cloud data is organized and managed by adopting a KD-tree and is subjected to down-sampling treatment, the local prior laser point cloud data can be efficiently obtained by utilizing a filtering algorithm.
S3.3, based on local prior laser point cloud data, acquiring the camera pose of the key frame to be corrected by adopting a point-to-surface ICP iterative algorithm, and transmitting the camera pose to a local map through a key frame visibility map;
RGB-D point cloud data and local accurate prior laser point cloud data around the rough camera pose of the key frame to be corrected are obtained at the same time, therefore, the camera pose correction number is obtained from the two point cloud data in the next step. The ICP iterative algorithm adopts a rough camera pose of a to-be-corrected key frame visual tracking as an initial value, the accumulated error of the rough camera pose is not large because laser correction is performed once every 50 frames of key frames, and simultaneously, because an indoor environment structure meets the Manhattan world assumption and is mainly formed by splicing rectangular blocks in all directions, the ICP iterative algorithm adopts a point-to-surface distance error measurement mechanism in the embodiment. By combining the precedent factors of the two ICP algorithms and voxel downsampling sparse matching RGB-D point cloud data, in the embodiment, the ICP iterative algorithm constructs a cost function model based on a point surface and sets the iteration times to 10 times, so that the global optimum value can be converged robustly and quickly.
And (3) ICP iteration is based on the steps of constructing a cost function model algorithm by a point surface:
firstly, the method comprises the following steps: selecting a certain number of point sets in key frame RGB-D point cloud data to be corrected according to a uniform sampling rule;
secondly, the method comprises the following steps: constructing corresponding point pairs of the source point cloud data and the target point cloud data by adopting a nearest point matching rule; the source point cloud data is RGB-D point cloud data, and the target point cloud data is prior laser point cloud data;
thirdly, the method comprises the following steps: calculating rotation and translation parameters of the two point cloud data according to the corresponding point pairs to obtain a camera pose;
fourthly: transforming the source point cloud data according to the calculated rotation and translation parameters, and constructing a cost function model for the transformed point cloud data and the target point cloud data by adopting the point-to-surface error distance; further optimizing the camera pose obtained in the third step by using a cost function model to obtain a correction number of the camera pose;
fifth, the method comprises the following steps: judging the correction number of the camera pose, if the correction number of the camera pose is smaller than a preset threshold value or the iteration number is larger than the maximum iteration number, stopping iteration, and outputting a final rigid transformation parameter; if the condition is not met, the method goes to the first step, and iterative knowledge is continued until the condition is met.
It can be known from the ICP iteration steps that the initial relative positions of the two point sets have a certain influence on the convergence of the algorithm, and the ICP registration is preferably performed under a "close enough" condition, and fig. 4 is a registration result of RGB-D depth point cloud and prior lidar point cloud.
The accurate camera pose of the key frame to be corrected is obtained through an ICP algorithm, and because the control range of the map by the pose of the camera of a single correction key frame is not small, the map is not sufficiently optimized by taking the pose as unique control information, local map transmission needs to be carried out on the pose, and the control range of the pose in the map is expanded. The visual SLAM algorithm can know that the local map construction mainly comprises the steps of acquiring the pose of a camera and the local map construction in a camera motion tracking and local map tracking mode, so that the relative pose of each pair of co-view key frames is accurate, the accumulated error is small, the acquired accurate pose of the key frame to be corrected can be locally transmitted through the accurate relative pose between the co-view key frames, the local map points observed by the corresponding key frames are corrected, and the control range of accurate pose information is further expanded. Through the accurate pose correction, the accurate camera pose is transmitted to the local map, so that the accuracy of the local map is improved, the range of control information in the map is expanded, and preparation is made for next map segmentation optimization.
S4, constructing a three-dimensional geometric point cloud map by adopting a multithread map segmentation front-back window optimization method based on camera pose constraints provided by prior laser point cloud data;
while the prior laser point cloud data may be used as a map model of the scene, the prior laser point cloud data contains only geometric spatial information and no texture information. In addition, due to the shielding problem in the laser scanning process, missing scanning of laser point cloud data of some detailed scenes is caused. Therefore, the invention constructs a high-precision map with texture information under the constraint of prior laser point cloud data so as to improve the content and the utilization rate of the prior laser point cloud data map and repair data loopholes.
S4.1, acquiring key frames and 3-D ORB characteristic points of the depth images, and constructing a sparse map;
the map construction method is based on an ORB-SLAM algorithm framework, and the ORB-SLAM algorithm framework has sparse map construction threads. The sparse map is composed of key frames and corresponding 3-D ORB feature points, wherein the key frames and the 3-D ORB feature points are created by the method comprising the following steps:
firstly, establishing 3-D ORB characteristic points through projection according to the pose of a key frame camera and the depth data of the ORB characteristic points;
secondly, a selection strategy of every 50 frames is applied to the candidate objects in the local mapping thread, and the final key frame and the 3-D ORB characteristic points in the local map are preferentially selected.
S4.2, local optimization is carried out on the constructed sparse map by adopting accurate camera pose constraint provided by prior laser point cloud data;
local optimization and global optimization of sparse maps are applied in ORB-SLAM without external constraints. Therefore, after long-distance tracking, errors in the camera pose and the sparse map may accumulate. The invention further optimizes the sparse graph constructed by the ORB-SLAM by using the accurate camera pose constraint provided by the prior laser point cloud data so as to eliminate the accumulated error. Because the optimization operation is performed once every 50 key frames, the optimization frequency is high. Therefore, in order to obtain good real-time performance and realize the accuracy and consistency of sparse map optimization, a multi-thread map segmentation-based front-back window optimization strategy is adopted. After optimization, accurate camera poses of all key frames can be obtained, and a dense point cloud model of a scene is recovered by integrating depth image data of all key frames. Then, a semi-dense map is constructed based on the keyframes, and the map is fused with 2D rich texture information from the image data and high-precision 3D geometric information from ground prior laser point cloud data.
S4.3, performing global optimization on the map subjected to local optimization by adopting a multithreading map segmentation optimization method;
although the camera pose is acquired through ICP (inductively coupled plasma) matching with the laser point cloud and key frame common view transmission, the map does not realize global optimization. Therefore, in order to obtain good real-time performance and a global optimal and consistent map, the invention is based on a multithreading map segmentation double-window optimization strategy, which specifically comprises the following steps: and correcting the first and last key frames of the current map segment by adopting an ICP (inductively coupled plasma) algorithm, taking the accurate pose obtained from the prior laser point cloud data as a true value, and optimizing the map segment by adopting a double-window method.
In multi-thread segment-based map optimization, a visual tracking thread runs on a current map segment, and an a priori laser point cloud data optimization thread runs on a previous map segment; when the visual tracking thread starts tracking from t1To t2When the next map segment is obtained, the prior laser point cloud data optimization thread simultaneously completes the optimization of the previous map segment and starts the optimization of the next map segment, so that the real-time laser data optimization and pose tracking performance can be ensured. The pose graph of the visual SLAM consists of a set of keyframe vertices M, a set of three-dimensional points P, and a set of opposing edges ε. Each key frame vertex miSaving absolute pose T of cameraiAnd the absolute pose T of the slave camera is recordediThree-dimensional feature point x observed underkE P, and the pixel observation z of the three-dimensional characteristic points on the imageik. At two position and posture vertexes miAnd mjEdge E betweenijThere is a co-visibility weight wijI.e. at miAnd mjThe number of commonly visible feature points in (a). Two keyframe pose vertices relative pose is implicitly defined as
Figure BDA0002759697080000131
After the camera pose correction, the local map around the key frame to be corrected has accurate pose control information, in the map containing 45 key frames, the key frame at the front end of the map is the tail end of the optimized map at the upper section, and the tail end part of the map is also corrected by checking laser point cloud data, so that the front end and the rear end of the map to be optimized have accurate pose control information. Meanwhile, in order to maintain the consistency of the map, in the embodiment, when the map segment is optimized, the map segment and the upper map segment have an overlapping degree of key frames of 10 frames, so that the optimized maps at the two ends have common constraints, meanwhile, the observed map points in the map segment and the optimized key frames are included in the optimization variables of the segmentation optimization process, the common map points are combined, and finally, the map with the accuracy consistent with the scale is obtained.
S4.4, further optimizing the map after global optimization by adopting a double-window optimization method to obtain a three-dimensional geometric point cloud map;
the invention optimizes the map after the global optimization in the step S4.3 by adopting a double-window optimization strategy, wherein the optimization window is divided into a front window and a rear window, the front window comprises 45 key frames in the map section, 10 key frames overlapped by the front map section and the rear map section and a map point set observed by the key frames, the front window constraint is a pose-map point constraint, and the front window constraint is used for controlling the accurate construction of a local map. The back window comprises an optimized key frame set in the front window, the back window is constrained to pose-pose constraints between key frames of the front window and pose-map point constraints between map points of the front window, and the back window is used for controlling the global consistency of the map segment and the optimized map. Meanwhile, since the map points in the front window may have been optimized for multiple times before, and the pose-fixed key frame of the rear window is far from the map in the front window, the constraint influence is small, and in order to improve the optimization efficiency and reduce the memory consumption, the map points with the optimization times exceeding the preset threshold and the optimized key frames far from the map section (i.e. exceeding 4m) are removed from the current optimization variables by comprehensively considering; in this embodiment, the preset threshold is 6 times.
The elimination strategy of the map points of the front window is as follows: firstly, initially determining optimized key frames including a newly-built key frame and a key frame overlapped with a front section map by a front window; the map points observed by the key frame are included in the optimized variables of the map section, the optimized times of the map points are ensured to be less than or equal to a preset threshold value, if the optimized times of the map points are greater than the preset threshold value, the map points are considered to be accurate enough, and the map points are not added into the map sectionIn the optimization variables, a front window map point optimization variable set V is finally obtained through screeningp
The elimination strategy of the key frames of the rear window is as follows: and adding the map points left after the elimination of the map points of the front window and the optimized key frame into the fixed key frame optimization variable of the rear window as a true value constraint condition of the map section to ensure the consistency of the front map and the rear map. And meanwhile, considering that the constraint influence on the map section is small for the fixed key frames far away from the map section, deleting the fixed key frames far away from the map section, only keeping the fixed key frames with the minimum distance value less than a preset threshold value away from the map section, and keeping the pose of the fixed key frames fixed in the optimization process to serve as a true value constraint condition. Finally obtaining fixed key frame optimization variable V of rear window through screeningkf-outer
The method for constructing the error equation of the front window comprises the following steps:
and obtaining the map point optimization variable of the map section through the screening of the map points. Based on the map point-pose constraint, an error equation is constructed for the map point optimization variable and the keyframe pose variable of the front window, as shown in the following formula:
eij=zij-h(ξi,pij)pij∈Vp
wherein h () is a three-dimensional feature point pijProjection function of the projection image to pixel coordinates via camera perspective, zijThe pixel coordinate value, xi, of the jth map point observed in the ith key frame in the front window in the imageiFor lie algebra representation of camera pose, pijThe world coordinate of the jth map point observed for the key frame, eijIs the error between the observed and calculated values. Summing the square of the back projection errors of all map points in the ith key frame to obtain:
Figure BDA0002759697080000151
where n is the number of map points observed by the key frame. Summing the e (i) errors of all the optimized keyframes in the front window to obtain the total error of the map:
Figure BDA0002759697080000152
in the formula, kfiFor key frames contained within the front window, ElAnd (5) an integral error function based on map point-pose constraint is applied to the front window, and m is the number of key frames in the front window.
The error equation construction method of the rear window comprises the following steps:
in order to keep the consistency of the map, the key frame of the rear window is incorporated into a total error equation, the constraint of the rear supplementary window on the map of the front window comprises map point-pose constraint and pose-pose constraint, and the error equations are respectively constructed for the two constraints;
the map point-pose constraint error equation is shown as follows:
Figure BDA0002759697080000153
in the formula, kfkFor key-frames contained in the rear window, kfk∈Vkf-outer;pkj∈Vpα is the number of key frames in the rear window, and k is the index number of the key frames.
The pose-pose constraint error equation is shown as follows:
Figure BDA0002759697080000161
in the formula (I), the compound is shown in the specification,
Figure BDA0002759697080000162
is the relative pose error, T, in SE (3) tangent spacekFor the rear window key frame vertex vkPosition and attitude of (1), TiFor front window keyframe vertices viThe pose of (a);
Figure BDA0002759697080000164
is a binary constraint TkjThe accuracy matrix of (a) is determined,
Figure BDA0002759697080000165
wkithe number of map points which are observed together by the poses of the two key frames is shown as I, which is an identity matrix.
Uniformly balancing and optimizing error equations obtained by the front window and the rear window, and optimizing the map section; in order to keep consistency of the front map and the rear map, in the process of optimizing the map section, the key frame pose xi of the rear window is setkAs a fixed value, a constant is set as a true value constraint condition for the segment of map optimization. And meanwhile, the pose of the end key frame in the front window is also set to be a constant, because the pose is the accurate pose obtained by matching the prior laser point cloud data ICP, the pose is used as a true value constraint condition and is not used as an unknown optimization variable. Meanwhile, the equation is nonlinear, and all unknown optimization variables (the pose variable of the front window key frame camera and the three-dimensional coordinate variable of the map point) are derived from the obtained total error equation. Constraint error equation e for each map point-pose relative to the upper front windowijIn other words, the optimization variables include three position variables of 6 pose variables of the camera (i.e., 6 degrees of freedom lie algebra of the camera pose) and coordinates of the map point (i.e., three-dimensional coordinates (x, y, z) of the map point). Wherein, the Jacobian matrix obtained by derivation is:
Figure BDA0002759697080000163
in the formula, x is all key frame position and posture variables and map point three-dimensional coordinate variables in the front window, xiiFor key frame vertices viLie algebra, p, corresponding to a matrix of camera posesjFor key frame vertices viObserved jth map point three-dimensional coordinates. Since the error equation is only associated with the key frame vertex viAnd key frame vertex viThe observed map points are related, and thus co-ordinated and related to other map pointsPartial derivatives of key frame pose are respectively zero 02×3,02×6. And for keyframe vertices v in the back windowkFor the front window map point VjGenerated map point-pose constraint error equation ekjWherein the key frame vertex vkTaking the pose of the camera as a constant and not as an optimization variable, and performing derivation on the three-dimensional coordinate variable of the map point:
Figure BDA0002759697080000171
in the formula, pjFor the rear window key frame vertex vkThe three-dimensional coordinates of the map point j in the front window observed.
For rear window keyframe vertices vkFor front window key frame vertex viGenerated pose-pose constraint error equation
Figure BDA0002759697080000172
The error equation has no map point three-dimensional coordinate variable, and only the key frame vertex v in the front windowiAnd (3) derivation of camera pose variables:
Figure BDA0002759697080000173
in the formula, xiiFor front window keyframe vertices viThe camera pose of (1).
And (3) combining the Jacobian matrixes obtained by derivation of all error equations to construct a Hessian matrix H, and finally simplifying an error cost function model:
L=HΔx
in the formula, H is a Hessian matrix of a total error equation, a Schur elimination method is adopted to eliminate (marginalize) the sparse Hessian matrix H, and a pose variable delta x, namely the correction numbers of all key frame pose variables and map point three-dimensional coordinate variables, is finally obtained through an LM self-adaptive step length iterative algorithm, so that the camera pose and map point coordinates in front window optimization variables are obtained, the laser data equivalent precision is achieved, and the global consistency of maps of front and rear sections is guaranteed.
S5, performing geometric incremental segmentation on the three-dimensional geometric point cloud map, performing object identification and semantic segmentation on each frame of RGB image by adopting an object detector based on a convolutional network, fusing a geometric incremental segmentation result and a semantic segmentation result, and acquiring an object boundary to obtain a 3D geometric segmentation map of scene semantic enhanced geometric segmentation; the method specifically comprises the following steps:
s5.1, guiding the RGB-D SLAM to construct a high-precision three-dimensional geometric point cloud map of an indoor scene through the prior laser point cloud data in the step 4, performing geometric incremental segmentation on the constructed three-dimensional geometric point cloud map, and performing geometric segmentation on a depth image acquired by the RGB-D camera frame by frame in order to ensure the real-time performance of a subsequently constructed semantic map. Under the condition that most of real world objects are formed by convex shapes, the method adopts a normal vector edge analysis method of a concave penalty mechanism and a three-dimensional vertex distance threshold segmentation method to obtain a geometric segmentation map of the depth image.
S5.2, performing semantic segmentation and object identification on each frame of RGB image by adopting an object detector based on a convolutional network to obtain class information and class probability of an object and an object boundary segmentation map;
to identify object instances in the scene, an object detector based on a convolutional neural network CNN is applied to the current input RGB image. Recent research shows that CNN-based object detectors can achieve real-time performance while maintaining high recognition accuracy, and perform real-time frame-by-frame object detection on input images by Fast-RCNN, and give input images It(u),
Figure BDA0002759697080000181
X is more than or equal to 0 and less than W, y is more than or equal to 0 and less than H, and Fast-RCNN outputs a group of bounding boxes Bi,
Figure BDA0002759697080000182
I is more than 0 and less than K, K is the number of output bounding boxes and the classification class c,
Figure BDA0002759697080000187
probability of classifying the class
Figure BDA0002759697080000183
Assigned to each bounding box Bi
S5.3, carrying out superposition statistics on the geometric segmentation map and the object boundary segmentation map of the depth image, detecting whether the same object is divided into different parts or not according to the state that different image labels in the geometric segmentation map of the depth image fall into a first object boundary frame of the object boundary segmentation map, if so, updating the boundary of the object in the object boundary segmentation map, and obtaining a 3D geometric segmentation map of scene semantic enhancement geometric segmentation;
since the geometric segmentation of the depth image is processed only by the geometric information, some whole objects in the target scene may be divided into different segment parts (for example, a chair is divided into a backrest part and a seating surface part), and the object boundary segmentation map has an over-segmentation phenomenon, and the embodiment performs enhanced geometric segmentation by using semantic information provided in a form of a bounding box based on CNN neural network object detection and a semantic segmentation result, and specifically includes:
geometrically segmented map of depth images
Figure BDA0002759697080000184
Performing superposition statistics with the CNN neural network object detection image of the RGB image if the geometric segmentation map of the depth image
Figure BDA0002759697080000185
First object bounding Box B falling into the object boundary segmentation graphiClass label in (1)n、ImAnd the divided region satisfies the following condition, the divided region is regarded as In、ImBelong to the same object andn、Imas the boundary of the object:
Figure BDA0002759697080000186
in the formula, Ψ isGeometric partition component In、ImA set of conditions that are satisfied for different parts of the same object,
Figure BDA0002759697080000191
the representation is labeled as a label class liIs selected to be the set of pixels of (a),
Figure BDA0002759697080000192
is shown in bounding box BiIs marked as a label class liSet of pixels of σsameIs a ratio threshold. The formula is obtained by considering the bounding box BiTarget tag l in (1)iTo verify whether the tag li is part of the object instance; if the formula condition is satisfied, the merged geometric segmentation is divided into different parts (I) belonging to the same objectn,Im)。
By adopting the deep learning object detection enhanced geometric segmentation strategy, the segmentation real-time performance and higher segmentation precision can be ensured, and complete and consistent semantic information is provided for the subsequent high-precision semantic map construction.
And S6, updating the 3D geometric segmentation map subjected to semantic enhancement geometric segmentation through the correction result of the camera pose in the step S3, and performing semantic association and segmentation probability distribution updating on the objects in the scene based on the updating result to complete the construction of the semantic map. While traditional methods assign classification class probabilities to each element that makes up a 3D map, the present invention assigns class probabilities to each neural network semantic segmentation instance. In this way, semantic object instances identified by each neural network segmentation will be assigned to a discrete probability distribution P (c | I)1...t) And probability confidence t. Assuming N is the number of classification classes, the spatial complexity of storing the classification class probabilities is O (N)l),NlTo partition the number of tags, the spatial complexity is (N) relative to conventional methodss),NsIs the number of elements of the three-dimensional map. Due to Ns>>NlTherefore, the invention reduces the memory occupation and the calculation complexity, and accords with human semantics by distributing semantics in a region mode rather than an element modeA more natural way of identifying objects by tags. Based on the above concept, in order to correctly fuse the output of the object detector with the three-dimensional geometric composition, the embodiment adopts a confidence-based method to update the label l assigned to each classification category in real timeiClass probability of (1), camera pose T estimated using SLAM frameworktAnd with each surface skAssociated 3D position x (k) rendering a 3D geometric segmentation map updating the semantically enhanced geometric segmentation on the current image plane by a segmentation class label l representing the segmentation areaiAnd generating an associated label rendering segmentation graph. And adopting an object detector based on a convolutional network to carry out object identification on the label rendering segmentation map to obtain a second object boundary box and a corresponding segmentation probability, associating the second object boundary box with an object boundary box in the 3D geometric segmentation map subjected to semantic enhanced geometric segmentation, and updating the object segmentation probability in the 3D geometric segmentation map subjected to semantic enhanced geometric segmentation through the segmentation probability corresponding to the second object boundary box to complete the construction of the high-precision semantic map.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims (10)

1. A semantic mapping and positioning method based on the fusion of a prior laser point cloud and a depth map is characterized by comprising the following steps:
s1, collecting integral prior laser point cloud data of a scene, and preprocessing the collected data;
s2, acquiring a depth image and an RGB image of a scene in real time through a camera fixed on a mobile platform, generating an RGB-D point cloud based on the depth image, and performing initialization registration on preprocessed prior laser point cloud data and the RGB-D point cloud by adopting a deep learning positioning neural network and a nearest neighbor ICP iterative algorithm;
s3, providing camera pose constraint through the registered prior laser point cloud data, correcting the camera pose and completing camera positioning;
s4, constructing a three-dimensional geometric point cloud map by adopting a multithread map segmentation front-back window optimization method based on camera pose constraints provided by prior laser point cloud data;
s5, performing geometric incremental segmentation on the three-dimensional geometric point cloud map, performing object identification and semantic segmentation on each frame of RGB image by adopting an object detector based on a convolutional network, fusing a geometric incremental segmentation result and a semantic segmentation result, and acquiring an object boundary to obtain a 3D geometric segmentation map of scene semantic enhanced geometric segmentation;
s6, updating the 3D geometric segmentation map subjected to semantic enhancement geometric segmentation through the correction result of the camera pose in the step S3, performing semantic association and segmentation probability distribution updating on the objects in the scene based on the updating result, and completing construction of the semantic map.
2. The semantic mapping and localization method based on the fusion of the prior laser point cloud and the depth map of claim 1, wherein in the step S2, the method for performing the initialization registration on the preprocessed prior laser point cloud data and the RGB-D point cloud comprises:
firstly, constructing a deep learning network, aligning preprocessed prior laser point cloud data with the RGB-D point cloud to obtain continuous video frames marked by a camera pose, and training the deep learning positioning neural network through continuous video frame images and the camera pose;
secondly, inputting RGB image video frames collected by a camera into a trained deep learning positioning neural network to obtain an initial camera pose;
and updating the initial camera pose by adopting an ICP iterative algorithm to complete the initialization registration of the prior laser point cloud data and the RGB-D point cloud.
3. The semantic mapping and localization method based on the fusion of the prior laser point cloud and the depth map according to claim 1, wherein the step S3 specifically comprises the following steps:
s3.1, selecting a key frame to be corrected from a depth image acquired by a camera in real time, and generating RGB-D point cloud data through dynamic object elimination and depth image data denoising;
s3.2, acquiring local prior laser point cloud data according to the initial posture of the camera generated by the visual tracking thread;
and S3.3, based on the local prior laser point cloud data, acquiring the camera pose of the key frame to be corrected by adopting a point-to-surface ICP iterative algorithm, and transmitting the camera pose of the key frame to be corrected to a local map through a key frame visibility map.
4. The method of semantic mapping and localization based on a priori laser point cloud and depth map fusion of claim 3, wherein in step S3.1, RGB-D point cloud data is generated based on the depth information of the camera and the pose of the tracking thread.
5. The semantic mapping and localization method based on the fusion of the prior laser point cloud and the depth map of claim 1, wherein the step S4 comprises:
s4.1, acquiring key frames and 3-D ORB characteristic points of the depth images, and constructing a sparse map;
s4.2, adopting camera pose constraint provided by prior laser point cloud data to locally optimize the constructed sparse map;
s4.3, performing global optimization on the map subjected to local optimization by adopting a multithreading map segmentation optimization method;
and S4.4, further optimizing the map after global optimization by adopting a double-window optimization method to obtain the three-dimensional geometric point cloud map.
6. The method for semantic mapping and localization based on a priori laser point cloud and depth map fusion of claim 5, wherein in the step S4.4, the dual-window optimization method comprises a front window and a rear window; the front window constraint is a pose-map point constraint, and the front window is used for controlling the construction of a local map; the rear window constraint is a pose-pose constraint between the rear window and the front window key frame, and a pose-map point constraint between the rear window and the front window map point; the rear window is used to control global consistency of the map.
7. The method of claim 6, wherein the front window comprises a plurality of key frames in a map segment, a plurality of key frames with overlapping front and back map segments, and a set of map points observed by the key frames; the back window includes the optimized set of keyframes in the front window.
8. The semantic mapping and localization method based on the fusion of the prior laser point cloud and the depth map according to claim 1, wherein the step S5 specifically includes:
s5.1, segmenting the three-dimensional geometric point cloud map by adopting a normal vector edge analysis method of a concave punishment mechanism and a three-dimensional vertex distance threshold segmentation method to obtain a geometric segmentation map of the depth image;
s5.2, performing semantic segmentation and object identification on each frame of RGB image by adopting an object detector based on a convolutional network to obtain class information and class probability of an object and an object boundary segmentation map;
s5.3, carrying out superposition statistics on the geometric segmentation map and the object boundary segmentation map of the depth image, detecting whether the same object is divided into different parts or not according to the state that different image labels in the geometric segmentation map of the depth image fall into a first object boundary frame of the object boundary segmentation map, if so, updating the boundary of the object in the object boundary segmentation map, and obtaining the 3D geometric segmentation map of the scene semantic enhanced geometric segmentation.
9. The method for semantic mapping and localization based on a fusion of a priori laser point cloud and a depth map as claimed in claim 8, wherein in step S6, a confidence-based method is used to update the segmentation probability of each object in the 3D geometric segmentation map assigned to the semantically enhanced geometric segmentation in real time.
10. The semantic mapping and localization method based on the fusion of the prior laser point cloud and the depth map according to claim 9, wherein the confidence-based method specifically comprises:
based on the camera pose correction result in step S3, back-projecting the 3D geometric segmentation map of the semantic enhanced geometric segmentation to the current camera plane to obtain a label rendering segmentation map; and adopting an object detector based on a convolutional network to carry out object identification on the label rendering segmentation map to obtain a second object boundary box and a corresponding segmentation probability, associating the second object boundary box with an object boundary box in the 3D geometric segmentation map of the semantic enhanced geometric segmentation, and updating the object segmentation probability in the 3D geometric segmentation map of the semantic enhanced geometric segmentation through the segmentation probability corresponding to the second object boundary box to complete the construction of the semantic map.
CN202011213908.2A 2020-11-04 2020-11-04 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map Active CN112258618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011213908.2A CN112258618B (en) 2020-11-04 2020-11-04 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011213908.2A CN112258618B (en) 2020-11-04 2020-11-04 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map

Publications (2)

Publication Number Publication Date
CN112258618A true CN112258618A (en) 2021-01-22
CN112258618B CN112258618B (en) 2021-05-14

Family

ID=74268696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011213908.2A Active CN112258618B (en) 2020-11-04 2020-11-04 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map

Country Status (1)

Country Link
CN (1) CN112258618B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112902953A (en) * 2021-01-26 2021-06-04 中国科学院国家空间科学中心 Autonomous pose measurement method based on SLAM technology
CN113156419A (en) * 2021-02-24 2021-07-23 清华大学 Specific language navigation method based on radar and visual multi-mode fusion
CN113222940A (en) * 2021-05-17 2021-08-06 哈尔滨工业大学 Method for automatically grabbing workpiece by robot based on RGB-D image and CAD model
CN113362247A (en) * 2021-06-11 2021-09-07 山东大学 Semantic live-action three-dimensional reconstruction method and system of laser fusion multi-view camera
CN113378694A (en) * 2021-06-08 2021-09-10 北京百度网讯科技有限公司 Method and device for generating target detection and positioning system and target detection and positioning
CN113409331A (en) * 2021-06-08 2021-09-17 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, terminal, and readable storage medium
CN113432600A (en) * 2021-06-09 2021-09-24 北京科技大学 Robot instant positioning and map construction method and system based on multiple information sources
CN113587933A (en) * 2021-07-29 2021-11-02 山东山速机器人科技有限公司 Indoor mobile robot positioning method based on branch-and-bound algorithm
CN113625288A (en) * 2021-06-15 2021-11-09 中国科学院自动化研究所 Camera and laser radar pose calibration method and device based on point cloud registration
CN113689540A (en) * 2021-07-22 2021-11-23 清华大学 Object reconstruction method and device based on RGB video
CN113739786A (en) * 2021-07-30 2021-12-03 国网江苏省电力有限公司电力科学研究院 Indoor environment sensing method, device and equipment for quadruped robot
CN113778096A (en) * 2021-09-15 2021-12-10 上海景吾智能科技有限公司 Positioning and model building method and system for indoor robot
CN113808262A (en) * 2021-10-08 2021-12-17 合肥安达创展科技股份有限公司 Building model generation system based on depth map analysis
CN114216452A (en) * 2021-12-06 2022-03-22 北京云迹科技股份有限公司 Robot positioning method and device
CN114565616A (en) * 2022-03-03 2022-05-31 湖南大学无锡智能控制研究院 Unstructured road state parameter estimation method and system
CN114742893A (en) * 2022-06-09 2022-07-12 浙江大学湖州研究院 3D laser data training and rapid positioning method based on deep learning
CN115019270A (en) * 2022-05-31 2022-09-06 电子科技大学 Automatic driving night target detection method based on sparse point cloud prior information
CN115187843A (en) * 2022-07-28 2022-10-14 中国测绘科学研究院 Depth map fusion method based on object space voxel and geometric feature constraint
CN115376109A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Obstacle detection method, obstacle detection device, and storage medium
CN115841510A (en) * 2022-09-19 2023-03-24 南京航空航天大学 Method for estimating depth and normal vector of single image in desktop curling scene based on geometric knowledge and deep learning
CN115937383A (en) * 2022-09-21 2023-04-07 北京字跳网络技术有限公司 Method and device for rendering image, electronic equipment and storage medium
CN116299500A (en) * 2022-12-14 2023-06-23 江苏集萃清联智控科技有限公司 Laser SLAM positioning method and device integrating target detection and tracking
CN116817892A (en) * 2023-08-28 2023-09-29 之江实验室 Cloud integrated unmanned aerial vehicle route positioning method and system based on visual semantic map
CN117739954A (en) * 2024-02-07 2024-03-22 未来机器人(深圳)有限公司 Map local updating method and device and electronic equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115690338A (en) * 2021-07-30 2023-02-03 北京图森智途科技有限公司 Map construction method, map construction device, map construction equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160154999A1 (en) * 2014-12-02 2016-06-02 Nokia Technologies Oy Objection recognition in a 3d scene
CN107862735A (en) * 2017-09-22 2018-03-30 北京航空航天大学青岛研究院 A kind of RGBD method for reconstructing three-dimensional scene based on structural information
CN108509918A (en) * 2018-04-03 2018-09-07 中国人民解放军国防科技大学 Target detection and tracking method fusing laser point cloud and image
US20180288385A1 (en) * 2017-04-03 2018-10-04 Sony Corporation Using super imposition to render a 3d depth map
CN109597096A (en) * 2018-12-01 2019-04-09 北醒(北京)光子科技有限公司 A kind of laser radar point cloud processing system and method
CN110349250A (en) * 2019-06-28 2019-10-18 浙江大学 A kind of three-dimensional rebuilding method of the indoor dynamic scene based on RGBD camera
CN110378196A (en) * 2019-05-29 2019-10-25 电子科技大学 A kind of road vision detection method of combination laser point cloud data
CN110666791A (en) * 2019-08-29 2020-01-10 江苏大学 RGBD robot nursing system and method based on deep learning
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN110969649A (en) * 2019-11-29 2020-04-07 上海有个机器人有限公司 Matching evaluation method, medium, terminal and device of laser point cloud and map
CN111080659A (en) * 2019-12-19 2020-04-28 哈尔滨工业大学 Environmental semantic perception method based on visual information
CN111209978A (en) * 2020-04-20 2020-05-29 浙江欣奕华智能科技有限公司 Three-dimensional visual repositioning method and device, computing equipment and storage medium
CN111242000A (en) * 2020-01-09 2020-06-05 电子科技大学 Road edge detection method combining laser point cloud steering
CN111581313A (en) * 2020-04-25 2020-08-25 华南理工大学 Semantic SLAM robustness improvement method based on instance segmentation
CN111609852A (en) * 2019-02-25 2020-09-01 北京奇虎科技有限公司 Semantic map construction method, sweeping robot and electronic equipment

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160154999A1 (en) * 2014-12-02 2016-06-02 Nokia Technologies Oy Objection recognition in a 3d scene
US20180288385A1 (en) * 2017-04-03 2018-10-04 Sony Corporation Using super imposition to render a 3d depth map
CN107862735A (en) * 2017-09-22 2018-03-30 北京航空航天大学青岛研究院 A kind of RGBD method for reconstructing three-dimensional scene based on structural information
CN108509918A (en) * 2018-04-03 2018-09-07 中国人民解放军国防科技大学 Target detection and tracking method fusing laser point cloud and image
CN109597096A (en) * 2018-12-01 2019-04-09 北醒(北京)光子科技有限公司 A kind of laser radar point cloud processing system and method
CN111609852A (en) * 2019-02-25 2020-09-01 北京奇虎科技有限公司 Semantic map construction method, sweeping robot and electronic equipment
CN110378196A (en) * 2019-05-29 2019-10-25 电子科技大学 A kind of road vision detection method of combination laser point cloud data
CN110349250A (en) * 2019-06-28 2019-10-18 浙江大学 A kind of three-dimensional rebuilding method of the indoor dynamic scene based on RGBD camera
CN110666791A (en) * 2019-08-29 2020-01-10 江苏大学 RGBD robot nursing system and method based on deep learning
CN110969649A (en) * 2019-11-29 2020-04-07 上海有个机器人有限公司 Matching evaluation method, medium, terminal and device of laser point cloud and map
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN111080659A (en) * 2019-12-19 2020-04-28 哈尔滨工业大学 Environmental semantic perception method based on visual information
CN111242000A (en) * 2020-01-09 2020-06-05 电子科技大学 Road edge detection method combining laser point cloud steering
CN111209978A (en) * 2020-04-20 2020-05-29 浙江欣奕华智能科技有限公司 Three-dimensional visual repositioning method and device, computing equipment and storage medium
CN111581313A (en) * 2020-04-25 2020-08-25 华南理工大学 Semantic SLAM robustness improvement method based on instance segmentation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔李燕: "基于RGB-D的三维点云目标分割", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112902953A (en) * 2021-01-26 2021-06-04 中国科学院国家空间科学中心 Autonomous pose measurement method based on SLAM technology
CN113156419A (en) * 2021-02-24 2021-07-23 清华大学 Specific language navigation method based on radar and visual multi-mode fusion
CN113222940B (en) * 2021-05-17 2022-07-12 哈尔滨工业大学 Method for automatically grabbing workpiece by robot based on RGB-D image and CAD model
CN113222940A (en) * 2021-05-17 2021-08-06 哈尔滨工业大学 Method for automatically grabbing workpiece by robot based on RGB-D image and CAD model
CN113409331A (en) * 2021-06-08 2021-09-17 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, terminal, and readable storage medium
CN113378694A (en) * 2021-06-08 2021-09-10 北京百度网讯科技有限公司 Method and device for generating target detection and positioning system and target detection and positioning
CN113409331B (en) * 2021-06-08 2024-04-12 Oppo广东移动通信有限公司 Image processing method, image processing device, terminal and readable storage medium
CN113432600A (en) * 2021-06-09 2021-09-24 北京科技大学 Robot instant positioning and map construction method and system based on multiple information sources
CN113432600B (en) * 2021-06-09 2022-08-16 北京科技大学 Robot instant positioning and map construction method and system based on multiple information sources
CN113362247B (en) * 2021-06-11 2023-08-15 山东大学 Semantic real scene three-dimensional reconstruction method and system for laser fusion multi-view camera
CN113362247A (en) * 2021-06-11 2021-09-07 山东大学 Semantic live-action three-dimensional reconstruction method and system of laser fusion multi-view camera
CN113625288A (en) * 2021-06-15 2021-11-09 中国科学院自动化研究所 Camera and laser radar pose calibration method and device based on point cloud registration
CN113689540A (en) * 2021-07-22 2021-11-23 清华大学 Object reconstruction method and device based on RGB video
CN113689540B (en) * 2021-07-22 2024-04-23 清华大学 Object reconstruction method and device based on RGB video
CN113587933A (en) * 2021-07-29 2021-11-02 山东山速机器人科技有限公司 Indoor mobile robot positioning method based on branch-and-bound algorithm
CN113587933B (en) * 2021-07-29 2024-02-02 山东山速机器人科技有限公司 Indoor mobile robot positioning method based on branch-and-bound algorithm
CN113739786A (en) * 2021-07-30 2021-12-03 国网江苏省电力有限公司电力科学研究院 Indoor environment sensing method, device and equipment for quadruped robot
CN113778096A (en) * 2021-09-15 2021-12-10 上海景吾智能科技有限公司 Positioning and model building method and system for indoor robot
CN113808262A (en) * 2021-10-08 2021-12-17 合肥安达创展科技股份有限公司 Building model generation system based on depth map analysis
CN114216452A (en) * 2021-12-06 2022-03-22 北京云迹科技股份有限公司 Robot positioning method and device
CN114216452B (en) * 2021-12-06 2024-03-19 北京云迹科技股份有限公司 Positioning method and device for robot
CN114565616A (en) * 2022-03-03 2022-05-31 湖南大学无锡智能控制研究院 Unstructured road state parameter estimation method and system
CN115019270A (en) * 2022-05-31 2022-09-06 电子科技大学 Automatic driving night target detection method based on sparse point cloud prior information
CN115019270B (en) * 2022-05-31 2024-04-19 电子科技大学 Automatic driving night target detection method based on sparse point cloud priori information
CN114742893A (en) * 2022-06-09 2022-07-12 浙江大学湖州研究院 3D laser data training and rapid positioning method based on deep learning
CN115187843A (en) * 2022-07-28 2022-10-14 中国测绘科学研究院 Depth map fusion method based on object space voxel and geometric feature constraint
CN115187843B (en) * 2022-07-28 2023-03-14 中国测绘科学研究院 Depth map fusion method based on object space voxel and geometric feature constraint
CN115841510A (en) * 2022-09-19 2023-03-24 南京航空航天大学 Method for estimating depth and normal vector of single image in desktop curling scene based on geometric knowledge and deep learning
CN115841510B (en) * 2022-09-19 2023-12-12 南京航空航天大学 Depth and normal vector estimation method of single image in desktop curling scene based on geometric knowledge and deep learning
CN115937383A (en) * 2022-09-21 2023-04-07 北京字跳网络技术有限公司 Method and device for rendering image, electronic equipment and storage medium
CN115937383B (en) * 2022-09-21 2023-10-10 北京字跳网络技术有限公司 Method, device, electronic equipment and storage medium for rendering image
CN115376109A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Obstacle detection method, obstacle detection device, and storage medium
CN116299500A (en) * 2022-12-14 2023-06-23 江苏集萃清联智控科技有限公司 Laser SLAM positioning method and device integrating target detection and tracking
CN116299500B (en) * 2022-12-14 2024-03-15 江苏集萃清联智控科技有限公司 Laser SLAM positioning method and device integrating target detection and tracking
CN116817892A (en) * 2023-08-28 2023-09-29 之江实验室 Cloud integrated unmanned aerial vehicle route positioning method and system based on visual semantic map
CN116817892B (en) * 2023-08-28 2023-12-19 之江实验室 Cloud integrated unmanned aerial vehicle route positioning method and system based on visual semantic map
CN117739954A (en) * 2024-02-07 2024-03-22 未来机器人(深圳)有限公司 Map local updating method and device and electronic equipment

Also Published As

Publication number Publication date
CN112258618B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN112258618B (en) Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN111798475B (en) Indoor environment 3D semantic map construction method based on point cloud deep learning
CN109544677B (en) Indoor scene main structure reconstruction method and system based on depth image key frame
CN109166149B (en) Positioning and three-dimensional line frame structure reconstruction method and system integrating binocular camera and IMU
Kim et al. Deep learning based vehicle position and orientation estimation via inverse perspective mapping image
EP3977346A1 (en) Simultaneous localization and mapping method, device, system and storage medium
CN111862201B (en) Deep learning-based spatial non-cooperative target relative pose estimation method
CN111201451A (en) Method and device for detecting object in scene based on laser data and radar data of scene
CN113537208A (en) Visual positioning method and system based on semantic ORB-SLAM technology
CN114424250A (en) Structural modeling
Zhao et al. OFM-SLAM: a visual semantic SLAM for dynamic indoor environments
CN111998862B (en) BNN-based dense binocular SLAM method
Zhang et al. Vehicle global 6-DoF pose estimation under traffic surveillance camera
Rangesh et al. Ground plane polling for 6dof pose estimation of objects on the road
Alcantarilla et al. Large-scale dense 3D reconstruction from stereo imagery
Han et al. Urban scene LOD vectorized modeling from photogrammetry meshes
Zhu et al. A review of 6d object pose estimation
CN115018999A (en) Multi-robot-cooperation dense point cloud map construction method and device
Herb et al. Lightweight semantic mesh mapping for autonomous vehicles
CN112950786A (en) Vehicle three-dimensional reconstruction method based on neural network
WO2023030062A1 (en) Flight control method and apparatus for unmanned aerial vehicle, and device, medium and program
Wirges et al. Self-supervised flow estimation using geometric regularization with applications to camera image and grid map sequences
CN114972491A (en) Visual SLAM method, electronic device, storage medium and product
CN114202701A (en) Unmanned aerial vehicle vision repositioning method based on object semantics
Leishman et al. Robust Motion Estimation with RBG-D Cameras

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant