CN113313824B - Three-dimensional semantic map construction method - Google Patents
Three-dimensional semantic map construction method Download PDFInfo
- Publication number
- CN113313824B CN113313824B CN202110394816.7A CN202110394816A CN113313824B CN 113313824 B CN113313824 B CN 113313824B CN 202110394816 A CN202110394816 A CN 202110394816A CN 113313824 B CN113313824 B CN 113313824B
- Authority
- CN
- China
- Prior art keywords
- image
- map
- global
- matching
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 19
- 230000004927 fusion Effects 0.000 claims abstract description 15
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000004422 calculation algorithm Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 8
- 230000001133 acceleration Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000001514 detection method Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000011156 evaluation Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000004321 preservation Methods 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 6
- 230000008569 process Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 2
- 230000008014 freezing Effects 0.000 description 2
- 238000007710 freezing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000010257 thawing Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
- G06T17/05—Geographic models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Geometry (AREA)
- Software Systems (AREA)
- Remote Sensing (AREA)
- Computer Graphics (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the technical field of map construction, and particularly relates to a three-dimensional semantic map construction method, which comprises a registration image thread, a local map and global map thread, a semantic map thread, a fusion thread and a global thread which can be processed in parallel based on a GPU; meanwhile, the gesture solving, semantic segmentation, image fusion, matching and other calculation processes are carried out on the scene images, so that the SLAM system is stronger in instantaneity and faster in map construction speed, semantic information is fused on the three-dimensional images, the expression forms of the maps are enriched, the scene maps can be understood by unmanned mobile platform equipment such as an unmanned plane and a robot through more dimensions, the motion trail can be controlled more accurately, and the performance of the unmanned mobile platform is improved.
Description
Technical Field
The invention belongs to the technical field of map construction, and particularly relates to a three-dimensional semantic map construction method.
Background
SLAM (Simultaneouslocalizationand mapping, synchronous positioning and instant composition) is a technique for acquiring three-dimensional information of a scene by a sensor, which can position itself and distinguish environments according to the scene information. The SLAM comprises a laser SLAM and a visual SLAM, wherein a sensor for acquiring scene data in the laser SLAM is a laser radar, is generally used in the aerospace and automobile industries, has high precision but high cost, acquires scene image data through a camera in the visual SLAM, has lower cost, and is generally used in the field of unmanned aerial vehicle and robot autonomous navigation.
In the fields of unmanned aerial vehicles and robots, the traditional map cannot meet the diversified application requirements, and with the development of depth sensors, semantic maps are widely applied to the fields of unmanned aerial vehicles and robots in the autonomous navigation field. Semantic maps typically include spatial attribute information, such as the planar structure of a building, room distribution, etc., as well as semantic attribute information, such as individual room attributes and functions, and object class and location information within a room, etc. The goal of semantic map construction is to precisely mark semantic information on a map.
As chinese patent CN111080659a discloses an environmental semantic perception method based on visual information, comprising: acquiring environmental image information by using a Kinect V1.0 camera to obtain a registered color image and a registered depth image; based on the registered color map and depth map, through an ORB_SLAM2 process, calculating the three-dimensional pose of the camera according to the ORB characteristic points extracted from each frame to obtain pose information of the camera; carrying out semantic segmentation on each frame of image to generate semantic color information; generating a point cloud synchronously according to the input depth map and an internal reference matrix of the camera; registering semantic color information into the point cloud to obtain a local semantic point cloud result; fusing the camera pose information with the local semantic point cloud result to obtain new global semantic point cloud information; and representing the fused global semantic point cloud information by using the octree map to obtain a final three-dimensional octree semantic map. However, in the implementation process, it is found that the response speed and the control accuracy of the motion trail of the unmanned aerial vehicle or the robot are seriously affected because the ORB feature extraction is adopted, and the map construction speed is not fast enough, so that the use experience is poor.
Disclosure of Invention
The invention provides a three-dimensional semantic map construction method for overcoming at least one defect in the prior art, which is based on GPU multithread processing, can improve map construction speed and realizes real-time map construction.
In order to solve the technical problems, the invention adopts the following technical scheme:
the three-dimensional semantic map construction method comprises the following steps:
registering image threads, local map and global map threads, semantic map threads, fusion threads and global threads which can be processed in parallel based on a GPU (graphics processor);
the registration image thread is used for acquiring a color image and a depth image of a scene, and preprocessing the color image and the depth image to obtain a registration image;
the local map and global map thread is used for solving the pose between the multi-frame images according to the registration image and the depth image, and performing three-dimensional reconstruction by using the pose, the color image and the depth image to obtain a local map and a global map; the semantic map thread is used for carrying out semantic segmentation on the plurality of registration images by using a PSP Net (Pyramid Scene Parsing Network, pyramid scene analysis network) to obtain two-dimensional semantic images;
the fusion thread is used for respectively fusing the two-dimensional semantic image with the local map and the global map to obtain the local semantic map and the global semantic map;
the global thread is used for matching the local semantic map and the global semantic map to obtain a global consistency dense semantic map.
According to the scheme, through multithreading based on the GPU, calculation processing such as pose solving, semantic segmentation, image fusion and matching is performed on the scene image, so that the SLAM system is stronger in instantaneity and faster in map construction speed, meanwhile, semantic information is fused on the three-dimensional image, the expression forms of the map are enriched, the scene map can be understood through more dimensionalities of unmanned mobile platform equipment such as an unmanned plane and a robot, the motion trail can be controlled more accurately, and the performance of the unmanned mobile platform is improved.
Preferably, the registering image thread specifically includes:
calibrating a depth camera comprising an infrared camera and a color camera to obtain an internal parameter and an external parameter of the depth camera;
respectively utilizing an infrared camera and a color camera in the depth camera to acquire a depth image and a color image of a multi-frame scene;
and registering the depth image and the color image according to the external participation and the internal reference to obtain a multi-frame registration image.
Preferably, the local map and global map threads include:
performing block division on the multi-frame registration image to obtain a plurality of image blocks, wherein frame stacking exists between adjacent image blocks;
performing feature extraction on the registration image in each image block by using a Scale Invariant Feature Transform (SIFT) extraction algorithm based on Graphic Processing Unit (GPU) acceleration to obtain feature points, and selecting a coordinate system of a frame of registration image as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Saving poor intra-block relevance as global image relevance matching M 2 The method comprises the steps of carrying out a first treatment on the surface of the According to M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose;
and carrying out three-dimensional dense reconstruction on the scene according to the pose and the depth image and the color image obtained in the registration image thread to obtain a local map and a global map.
Preferably, the magnitudes of the feature points in the SIFT extraction algorithm are specifically expressed as follows:
the direction is specifically expressed as:
wherein A (x, y) is the magnitude of the feature point, x and y are the pixel positions of the feature point in the image, I (x+1, y), I (x-1, y), I (x, y+1) and I (x, y-1) are all adjacent pixels of the feature point in the Gaussian differential pyramid, and θ (x, y) is the pointing direction of the feature point.
Preferably, the probability model in the GMS matching algorithm is:
the evaluation score formula of the feature point pair is as follows:
wherein P is the difference between correct matching and incorrect matching, P true For correct matching, p false For error matching, mean true And mean false Average of match correct and match error, std true And std false The variances of the correct matching and the incorrect matching are respectively; i F 1i The I is the number of features in the feature point matching grid; i and j are the matching point areas in the two frames of images respectively, K is the current grid number, K is the total grid number,for pair of units { i } k ,j k Number of matches between.
Preferably, the registering the depth image and the color image according to the external and internal parameters specifically includes:
converting coordinates of all pixel points in the depth image to an infrared camera coordinate system;
converting coordinates of all points under the infrared camera coordinate system to a world coordinate system;
converting the coordinates of all points in the world coordinate system to a color camera coordinate system;
mapping the coordinates of all points under the coordinate system of the color camera to a color plane of the normalized plane;
and obtaining a transformation matrix between the infrared camera and the color camera.
Preferably, the semantic map thread specifically includes:
extracting features of the registration images to obtain feature layers;
pooling the feature layers to generate pyramid pooling features;
flattening and upsampling pyramid pooling features;
and performing CONCAT (merging) with the feature layer, and obtaining a local semantic map and a global semantic map through a convolutional neural network.
Preferably, the specific formula for fusing the local map and the global map by using the TSDF model in the local map and the global map thread is as follows:
the specific formula of the de-fusion construction is as follows:
wherein D (v) is the sign distance value of the voxel, W (v) is the voxel weight value, D i (v) And w is equal to i (v) The projection distance from the voxel to the i-th frame depth image and the integral weight respectively,is the updated voxel symbol distance value.
Preferably, the fusion model adopted in the fusion thread is as follows:
wherein C is i-1 (o) and W i-1 (o) respectively fusing the confidence and reliability weights of the category for the voxels corresponding to the ith frame,and->For the image in the ith frame of imageClass confidence and reliability weights for the element p.
Preferably, the specific formula for matching the local semantic map and the global semantic map in the global thread is as follows:
de-fusion:
the accuracy calculation formula is:
wherein W is local And W is equal to global Weight values of a local semantic Map and a global semantic Map, map (v, C) i-1 (o)) local With Map (v, C) i-1 (o)) global The local semantic map and the global semantic map are respectively; s is S 1 And S is equal to 2 Respectively, the three-dimensional semantic model surface area measured by using a meshlab tool, S is the three-dimensional reconstruction model surface area measured by using the meshlab tool, and k 1 And k is equal to 2 Respectively S 1 、S 2 Weight coefficient of (c) in the above-mentioned formula (c).
Compared with the prior art, the beneficial effects are that:
compared with the traditional ORB feature extraction, the SIFT algorithm feature extraction based on GPU acceleration has the advantages of higher extraction speed and better robustness; in addition, the multi-thread processing based on the GPU can simultaneously perform semantic segmentation, pose calculation and image fusion on the registered images, and the fused images can be released one by one, so that the GPU has enough memory to perform real-time fusion rendering on the images, real-time map construction is realized, three-dimensional images and semantic information are fused, the understanding capacity of unmanned mobile platforms such as unmanned aerial vehicles and robots to the environment is improved, the unmanned mobile platforms are enabled to move more accurately and flexibly, and the performance of products is improved.
Drawings
FIG. 1 is a schematic block diagram of a process of a local map and global map thread of a three-dimensional semantic map construction method according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a block division in a local map and global map thread of a three-dimensional semantic map construction method according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a semantic map thread in a three-dimensional semantic map construction method according to an embodiment of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship described in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are orientations or positional relationships indicated by terms "upper", "lower", "left", "right", "long", "short", etc., based on the orientations or positional relationships shown in the drawings, this is merely for convenience in describing the present invention and simplifying the description, and is not an indication or suggestion that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and are not to be construed as limitations of the present patent, and that it is possible for those of ordinary skill in the art to understand the specific meaning of the terms described above according to specific circumstances.
The technical scheme of the invention is further specifically described by the following specific embodiments with reference to the accompanying drawings:
examples:
as shown in fig. 1, a three-dimensional semantic map construction method includes:
registering image threads, local map and global map threads, semantic map threads, fusion threads and global threads which can be processed in parallel based on the GPU;
the registration image thread is used for acquiring a color image and a depth image of a scene, and preprocessing the color image and the depth image to obtain a registration image; wherein the registered image is colored;
the local map and global map thread is used for solving the pose between the multi-frame images according to the registration image and the depth image, and performing three-dimensional reconstruction by using the pose, the color image and the depth image to obtain a local map and a global map;
the semantic map thread is used for carrying out semantic segmentation on the plurality of registration images by using the PSP Net to obtain a two-dimensional semantic image;
the fusion thread is used for respectively fusing the two-dimensional semantic map with the local map and the global map to obtain the local semantic map and the global semantic map;
the global thread is used for matching the local semantic map and the global semantic map to obtain a global consistency dense semantic map.
The registration image thread in this embodiment specifically includes:
calibrating a depth camera comprising an infrared camera and a color camera to obtain an internal parameter and an external parameter of the depth camera; the depth camera can adopt Kinect V2, specifically, the Kinect V2 is used for shooting a checkerboard, and the camera is calibrated to obtain an internal reference matrix of the cameraAnd the external reference matrix->Wherein R is a rotation matrix of 3x3, t is a translation vector of 3x1, f x And f y Normalized focal lengths of the image x-axis and the image y-axis respectively, c x And c y The position of the coordinates of the center point of the image;
respectively utilizing an infrared camera and a color camera in the depth camera to acquire a depth image and a color image of a multi-frame scene;
and registering the depth image and the color image according to the external participation and the internal reference to obtain a multi-frame registration image.
The local map and global map threads in this embodiment include:
taking fifteen frames of images as units, performing block division on the multi-frame registration images to obtain a plurality of image blocks, wherein three frames of stacks exist between adjacent image blocks; of course, each image block and the number of stacks between image blocks are only one referenced embodiment and should not be construed as limiting the present solution.
Performing feature extraction on the registration image in each image block by using a Scale Invariant Feature Transform (SIFT) extraction algorithm based on Graphic Processing Unit (GPU) acceleration to obtain feature points, and selecting a coordinate system of a frame of registration image as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Saving poor intra-block relevance as global image relevance matching M 2 The method comprises the steps of carrying out a first treatment on the surface of the According to M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose; wherein the pose comprises a local pose and a global pose;
in addition, solving the pose by the gauss newton method in this embodiment specifically includes:
constructing a nonlinear optimization objective function:
X * =argminE align (X),
the specific calculation process is as follows:
R=3N corr +|E|(|D i |+|I i |),
F(X k )=F(X k-1 )+J F (X k-1 )ΔX,
J F (X k-1 ) T J F (X k-1 )ΔX * =-J F (X k-1 ) T F(X k-1 ),
wherein X is the pose of the camera, X * Is the optimal solution of the pose X, E align (X) is the alignment objective function of the coefficient features and the dense luminosity and set constraint, r i (X) residual term for pose representation, N corr Is the total corresponding relation quantity in the image block, |D i I and I i The i is the size of the i frame depth image and the color image after downsampling, which are 64X53 = 3392, the i E is the number of frame pair sets, and the E is a frame pair set comprising a frame pair (i, j), i frame and j frame, F (X) k-1 ) In the form of vector of the residual error term of the pose of the previous frame of image, J F For the jacobian matrix corresponding to the vector, Δx=x k -X k-1 Delta X is the difference between the pose of the current frame and the pose of the previous frame * Deviation value for pose optimal solution, (X) k-1 ) T Is a matrix (X) k-1 ) Is a transposed matrix of (a);
and then, carrying out three-dimensional dense reconstruction on the scene according to the pose and the depth image and the color image obtained in the registration image thread to obtain a local map and a global map.
The amplitude of the feature point in the SIFT extraction algorithm in this embodiment is specifically expressed as:
the direction is specifically expressed as:
wherein A (x, y) is the magnitude of the feature point, x and y are the pixel positions of the feature point in the image, I (x+1, y), I (x-1, y), I (x, y+1) and I (x, y-1) are all adjacent pixels of the feature point in the Gaussian differential pyramid, and θ (x, y) is the pointing direction of the feature point.
The probability model in the GMS matching algorithm in this embodiment is:
the evaluation score formula of the feature point pair is as follows:
wherein P is the difference between correct matching and incorrect matching, P true For correct matching, p false For error matching, mean true And mean false Average of match correct and match error, std true And std false The variances of the correct matching and the incorrect matching are respectively; i F 1i The I is the number of features in the feature point matching grid; i and j are the matching point areas in the two frames of images respectively, K is the current grid number, K is the total grid number,for pair of units { i } k ,j k Number of matches between.
The registering of the depth image and the color image according to the external participation and the internal reference in the embodiment specifically includes: converting coordinates of all pixel points in the depth image into an infrared camera coordinate system, wherein the specific formula is as follows:
wherein Z is c For depth values, i.e. the distance of an object in space to the depth camera,an inverse matrix of an internal reference matrix of the infrared camera is +.>P is the pixel coordinate of the midpoint of the depth image IR_camera Converting the coordinates of the pixel points into coordinates under an infrared camera coordinate system;
converting coordinates of all points in an infrared camera coordinate system into a world coordinate system, wherein the specific formula is as follows:
wherein,for converting world coordinate system into inverse matrix of transformation matrix under infrared camera coordinate system, P w World coordinates that are points in the depth image;
converting coordinates of all points in the world coordinate system into a color camera coordinate system, wherein the specific formula is as follows:
P Color_camera =T wColor_camera P w ,
wherein T is wColor_camera The transformation matrix from the world coordinate system to the color camera coordinate system, P Color_camera The coordinates of the color camera corresponding to the midpoint of the depth image;
mapping the coordinates of all points in the color camera coordinate system to a normalized plane Z c Color plane=1, the specific formula is:
wherein K is Color_camera Is an internal reference matrix of the color camera,is the normalized mapping plane of the image,the pixel points in the final registered image are obtained;
let z=1, then the pixels of the registered image have the following relationship with the pixels of the depth image:
removing the external parameters K of two cameras Color_camera ,Finally, a transformation matrix between the infrared camera and the color camera is obtained:
the following expression is obtained after the above expansion and simplification:
wherein T is wColor_camera For the transformation matrix of the world coordinate system into the color camera coordinate system,for converting world coordinate system into inverse matrix of transformation matrix of color camera coordinate system, T IR2Color Conversion matrix for converting infrared camera into color camera, R w2Color For world coordinate conversion into a rotation matrix in color camera coordinates, +.>Converting world coordinates into inverse matrix of rotation matrix under infrared camera coordinates, t w2Color Translation vector, t, for world coordinate conversion to color camera coordinates w2Color For translation vector conversion of world coordinates to color camera coordinates, T IR2Color Representing a transformation matrix of an infrared camera of size 4*4 to a color camera.
The semantic map thread in this embodiment specifically includes:
extracting features of the registration images to obtain feature layers;
pooling the feature layers to generate pyramid pooling features; the sizes of the pooling cores are 1x1,2x2,3x3 and 6x6 respectively;
flattening and upsampling pyramid pooling features;
performing CONCAT with the feature layer, and obtaining a local semantic map and a global semantic map through a convolutional neural network;
the network is trained by adopting a VOC2007 data set containing 21 kinds of information, the PSP Net backbone network is MobileNet V2, the number of training epochs (training generation number) is 140, and the ratio of the training set to the verification set is 9: and 1, performing freezing training on the first 50 epochs, namely freezing a part of training weights to accelerate the training speed. The BacthSize was set to 4, thawing was started when epoch=51, and all weights were trained. It should be noted that, the parameters used in this embodiment are all reference embodiments, and are not to be construed as limiting the present scheme, and in the specific implementation process, the parameters may be changed according to the device performance, training accuracy, and the like.
The specific formula for fusing the local map and the global map by using the TSDF model in the local map and the global map thread in the embodiment is as follows:
the specific formula of the de-fusion construction is as follows:
wherein D (v) is the sign distance value of the voxel, W (v) is the voxel weight value, D i (v) And w is equal to i (v) The projection distance from the voxel to the i-th frame depth image and the integral weight respectively,is the updated voxel symbol distance value.
The fusion model adopted in the fusion thread in this embodiment is:
wherein C is i-1 (o) and W i-1 (o) respectively fusing the confidence and reliability weights of the category for the voxels corresponding to the ith frame,and->The category confidence and reliability weight of the pixel p in the ith frame image.
In order to perfect the details of the global semantic map by utilizing the local semantic map, the local semantic map and the global semantic map are matched in the global thread in the embodiment, and the specific formula is as follows:
de-fusion:
the accuracy calculation formula is:
wherein W is local And W is equal to global Weight values of a local semantic Map and a global semantic Map, map (v, C) i-1 (o)) local With Map (v, C) i-1 (o)) global The local semantic map and the global semantic map are respectively fused; s is S 1 And S is equal to 2 Respectively, the three-dimensional semantic model surface area measured by using a meshlab tool, S is the three-dimensional reconstruction model surface area measured by using the meshlab tool, and k 1 And k is equal to 2 Respectively S 1 、S 2 Weight coefficient of (c) in the above-mentioned formula (c).
The present invention is described with reference to flowchart illustrations or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application, it being understood that each flowchart illustration or block in the flowchart illustrations or block diagrams, and combinations of flowcharts or blocks in the flowchart illustrations or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.
Claims (6)
1. The three-dimensional semantic map construction method is characterized by comprising the following steps of:
registering image threads, local map and global map threads, semantic map threads, fusion threads and global threads which can be processed in parallel based on the GPU;
the registration image thread is used for acquiring a color image and a depth image of a scene, and preprocessing the color image and the depth image to obtain a registration image;
the local map and global map thread is used for solving the pose between the images according to the registration image and the depth image, and performing three-dimensional reconstruction by using the pose, the color image and the depth image to obtain a local map and a global map;
the local map and global map thread comprises:
performing block division on the multi-frame registration image to obtain a plurality of image blocks, wherein frame stacking exists between adjacent image blocks;
carrying out feature extraction on the registration images in each image block by using a SIFT extraction algorithm based on GPU acceleration to obtain feature points, and selecting a coordinate system of a frame of registration images as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Saving poor intra-block relevance as global image relevance matching M 2 ;
According to M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose;
according to the pose and the depth image and the color image obtained in the registration image thread, carrying out three-dimensional dense reconstruction on the scene to obtain a local map and a global map;
the semantic map thread is used for carrying out semantic segmentation on the plurality of registration images by using the PSP Net to obtain a two-dimensional semantic image;
the fusion thread is used for respectively fusing the two-dimensional semantic image with the local map and the global map to obtain a local semantic map and a global semantic map;
the global thread is used for matching the local semantic map and the global semantic map to obtain a global consistency dense semantic map.
2. The three-dimensional semantic map building method according to claim 1, wherein the registering image thread specifically comprises:
calibrating a depth camera comprising an infrared camera and a color camera to obtain an internal parameter and an external parameter of the depth camera;
respectively utilizing an infrared camera and a color camera in the depth camera to continuously acquire a depth image and a color image of a multi-frame scene;
and registering the depth image and the color image according to the external participation and the internal reference to obtain a multi-frame registration image.
3. The method of claim 2, wherein the local map and global map threads comprise:
performing block division on the multi-frame registration image to obtain a plurality of image blocks, wherein frame stacking exists between adjacent image blocks;
carrying out feature extraction on the registration images in each image block by using a SIFT extraction algorithm based on GPU acceleration to obtain feature points, and selecting a coordinate system of a frame of registration images as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Closing the blockPoor-connectivity preservation as global image association matching M 2 ;
According to said M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose;
and carrying out three-dimensional dense reconstruction on the scene according to the pose and the depth image and the color image obtained in the registration image thread to obtain a local map and a global map.
4. A three-dimensional semantic map building method according to claim 3, wherein the probability model in the GMS matching algorithm is:
the evaluation score formula of the feature point pair is as follows:
wherein P is the difference between correct matching and incorrect matching, P true For correct matching, p false For error matching, mean true And mean false Average of match correct and match error, std true And std false The variances of the correct matching and the incorrect matching are respectively; i F 1i The I is the number of features in the feature point matching grid; i and j are the matching point areas in the two frames of images respectively, K is the current grid number, K is the total grid number,for pair of units { i } k ,j k Number of matches between.
5. The three-dimensional semantic map construction method according to claim 2, wherein registering the depth image and the color image according to the external participation and the internal reference specifically comprises:
converting coordinates of all pixel points in the depth image to an infrared camera coordinate system;
converting points under the infrared camera coordinate system into a world coordinate system;
converting points in the world coordinate system into a color camera coordinate system;
mapping points under a color camera coordinate system to a color plane of the normalized plane;
and obtaining a transformation matrix between the infrared camera and the color camera.
6. A three-dimensional semantic map construction method according to claim 3, wherein the semantic map thread specifically comprises:
extracting features of the registration images to obtain feature layers;
pooling the feature layers to generate pyramid pooling features;
flattening and upsampling the pyramid pooling feature;
merging with the feature layer, and obtaining a local semantic map and a global semantic map through a convolutional neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110394816.7A CN113313824B (en) | 2021-04-13 | 2021-04-13 | Three-dimensional semantic map construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110394816.7A CN113313824B (en) | 2021-04-13 | 2021-04-13 | Three-dimensional semantic map construction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113313824A CN113313824A (en) | 2021-08-27 |
CN113313824B true CN113313824B (en) | 2024-03-15 |
Family
ID=77372349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110394816.7A Active CN113313824B (en) | 2021-04-13 | 2021-04-13 | Three-dimensional semantic map construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113313824B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116817892B (en) * | 2023-08-28 | 2023-12-19 | 之江实验室 | Cloud integrated unmanned aerial vehicle route positioning method and system based on visual semantic map |
CN117788306A (en) * | 2023-12-18 | 2024-03-29 | 上海贝特威自动化科技有限公司 | Multithreading-based multi-focal-length tab image fusion method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080659A (en) * | 2019-12-19 | 2020-04-28 | 哈尔滨工业大学 | Environmental semantic perception method based on visual information |
-
2021
- 2021-04-13 CN CN202110394816.7A patent/CN113313824B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080659A (en) * | 2019-12-19 | 2020-04-28 | 哈尔滨工业大学 | Environmental semantic perception method based on visual information |
Non-Patent Citations (3)
Title |
---|
SSE指令集在~(60)Co集装箱CT系统图像重建算法中的应用;宋麒;罗志宇;丛鹏;;核电子学与探测技术(01);全文 * |
基于激光SLAM和深度学习的语义地图构建;何松;孙静;郭乐江;陈梁;;计算机技术与发展(09);全文 * |
基于特征匹配与运动补偿的视频稳像算法;唐佳林;郑杰锋;李熙莹;苏秉华;;计算机应用研究(02);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113313824A (en) | 2021-08-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111429514B (en) | Laser radar 3D real-time target detection method integrating multi-frame time sequence point cloud | |
CN109682381B (en) | Omnidirectional vision based large-view-field scene perception method, system, medium and equipment | |
CN111862126B (en) | Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm | |
CN113819890B (en) | Distance measuring method, distance measuring device, electronic equipment and storage medium | |
CN109726627A (en) | A kind of detection method of neural network model training and common ground line | |
CN113705521A (en) | Head pose estimation method combined with YOLO-MobilenetV3 face detection | |
Ding et al. | Vehicle pose and shape estimation through multiple monocular vision | |
CN110825101A (en) | Unmanned aerial vehicle autonomous landing method based on deep convolutional neural network | |
CN113313824B (en) | Three-dimensional semantic map construction method | |
CN111998862B (en) | BNN-based dense binocular SLAM method | |
US20240013505A1 (en) | Method, system, medium, equipment and terminal for inland vessel identification and depth estimation for smart maritime | |
CN112489099B (en) | Point cloud registration method and device, storage medium and electronic equipment | |
Mseddi et al. | YOLOv5 based visual localization for autonomous vehicles | |
CN110148177A (en) | For determining the method, apparatus of the attitude angle of camera, calculating equipment, computer readable storage medium and acquisition entity | |
CN110260866A (en) | A kind of robot localization and barrier-avoiding method of view-based access control model sensor | |
Li et al. | Aruco marker detection under occlusion using convolutional neural network | |
CN108225273A (en) | A kind of real-time runway detection method based on sensor priori | |
CN115578460A (en) | Robot grabbing method and system based on multi-modal feature extraction and dense prediction | |
CN114358133B (en) | Method for detecting looped frames based on semantic-assisted binocular vision SLAM | |
CN111626241A (en) | Face detection method and device | |
CN117523461B (en) | Moving target tracking and positioning method based on airborne monocular camera | |
CN114494435A (en) | Rapid optimization method, system and medium for matching and positioning of vision and high-precision map | |
Crombez et al. | Using dense point clouds as environment model for visual localization of mobile robot | |
CN115953471A (en) | Indoor scene multi-scale vector image retrieval and positioning method, system and medium | |
Li-Chee-Ming et al. | Determination of UAS trajectory in a known environment from FPV video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |