CN113313824B - Three-dimensional semantic map construction method - Google Patents

Three-dimensional semantic map construction method Download PDF

Info

Publication number
CN113313824B
CN113313824B CN202110394816.7A CN202110394816A CN113313824B CN 113313824 B CN113313824 B CN 113313824B CN 202110394816 A CN202110394816 A CN 202110394816A CN 113313824 B CN113313824 B CN 113313824B
Authority
CN
China
Prior art keywords
image
map
global
matching
semantic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110394816.7A
Other languages
Chinese (zh)
Other versions
CN113313824A (en
Inventor
刘立林
罗志宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202110394816.7A priority Critical patent/CN113313824B/en
Publication of CN113313824A publication Critical patent/CN113313824A/en
Application granted granted Critical
Publication of CN113313824B publication Critical patent/CN113313824B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of map construction, and particularly relates to a three-dimensional semantic map construction method, which comprises a registration image thread, a local map and global map thread, a semantic map thread, a fusion thread and a global thread which can be processed in parallel based on a GPU; meanwhile, the gesture solving, semantic segmentation, image fusion, matching and other calculation processes are carried out on the scene images, so that the SLAM system is stronger in instantaneity and faster in map construction speed, semantic information is fused on the three-dimensional images, the expression forms of the maps are enriched, the scene maps can be understood by unmanned mobile platform equipment such as an unmanned plane and a robot through more dimensions, the motion trail can be controlled more accurately, and the performance of the unmanned mobile platform is improved.

Description

Three-dimensional semantic map construction method
Technical Field
The invention belongs to the technical field of map construction, and particularly relates to a three-dimensional semantic map construction method.
Background
SLAM (Simultaneouslocalizationand mapping, synchronous positioning and instant composition) is a technique for acquiring three-dimensional information of a scene by a sensor, which can position itself and distinguish environments according to the scene information. The SLAM comprises a laser SLAM and a visual SLAM, wherein a sensor for acquiring scene data in the laser SLAM is a laser radar, is generally used in the aerospace and automobile industries, has high precision but high cost, acquires scene image data through a camera in the visual SLAM, has lower cost, and is generally used in the field of unmanned aerial vehicle and robot autonomous navigation.
In the fields of unmanned aerial vehicles and robots, the traditional map cannot meet the diversified application requirements, and with the development of depth sensors, semantic maps are widely applied to the fields of unmanned aerial vehicles and robots in the autonomous navigation field. Semantic maps typically include spatial attribute information, such as the planar structure of a building, room distribution, etc., as well as semantic attribute information, such as individual room attributes and functions, and object class and location information within a room, etc. The goal of semantic map construction is to precisely mark semantic information on a map.
As chinese patent CN111080659a discloses an environmental semantic perception method based on visual information, comprising: acquiring environmental image information by using a Kinect V1.0 camera to obtain a registered color image and a registered depth image; based on the registered color map and depth map, through an ORB_SLAM2 process, calculating the three-dimensional pose of the camera according to the ORB characteristic points extracted from each frame to obtain pose information of the camera; carrying out semantic segmentation on each frame of image to generate semantic color information; generating a point cloud synchronously according to the input depth map and an internal reference matrix of the camera; registering semantic color information into the point cloud to obtain a local semantic point cloud result; fusing the camera pose information with the local semantic point cloud result to obtain new global semantic point cloud information; and representing the fused global semantic point cloud information by using the octree map to obtain a final three-dimensional octree semantic map. However, in the implementation process, it is found that the response speed and the control accuracy of the motion trail of the unmanned aerial vehicle or the robot are seriously affected because the ORB feature extraction is adopted, and the map construction speed is not fast enough, so that the use experience is poor.
Disclosure of Invention
The invention provides a three-dimensional semantic map construction method for overcoming at least one defect in the prior art, which is based on GPU multithread processing, can improve map construction speed and realizes real-time map construction.
In order to solve the technical problems, the invention adopts the following technical scheme:
the three-dimensional semantic map construction method comprises the following steps:
registering image threads, local map and global map threads, semantic map threads, fusion threads and global threads which can be processed in parallel based on a GPU (graphics processor);
the registration image thread is used for acquiring a color image and a depth image of a scene, and preprocessing the color image and the depth image to obtain a registration image;
the local map and global map thread is used for solving the pose between the multi-frame images according to the registration image and the depth image, and performing three-dimensional reconstruction by using the pose, the color image and the depth image to obtain a local map and a global map; the semantic map thread is used for carrying out semantic segmentation on the plurality of registration images by using a PSP Net (Pyramid Scene Parsing Network, pyramid scene analysis network) to obtain two-dimensional semantic images;
the fusion thread is used for respectively fusing the two-dimensional semantic image with the local map and the global map to obtain the local semantic map and the global semantic map;
the global thread is used for matching the local semantic map and the global semantic map to obtain a global consistency dense semantic map.
According to the scheme, through multithreading based on the GPU, calculation processing such as pose solving, semantic segmentation, image fusion and matching is performed on the scene image, so that the SLAM system is stronger in instantaneity and faster in map construction speed, meanwhile, semantic information is fused on the three-dimensional image, the expression forms of the map are enriched, the scene map can be understood through more dimensionalities of unmanned mobile platform equipment such as an unmanned plane and a robot, the motion trail can be controlled more accurately, and the performance of the unmanned mobile platform is improved.
Preferably, the registering image thread specifically includes:
calibrating a depth camera comprising an infrared camera and a color camera to obtain an internal parameter and an external parameter of the depth camera;
respectively utilizing an infrared camera and a color camera in the depth camera to acquire a depth image and a color image of a multi-frame scene;
and registering the depth image and the color image according to the external participation and the internal reference to obtain a multi-frame registration image.
Preferably, the local map and global map threads include:
performing block division on the multi-frame registration image to obtain a plurality of image blocks, wherein frame stacking exists between adjacent image blocks;
performing feature extraction on the registration image in each image block by using a Scale Invariant Feature Transform (SIFT) extraction algorithm based on Graphic Processing Unit (GPU) acceleration to obtain feature points, and selecting a coordinate system of a frame of registration image as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Saving poor intra-block relevance as global image relevance matching M 2 The method comprises the steps of carrying out a first treatment on the surface of the According to M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose;
and carrying out three-dimensional dense reconstruction on the scene according to the pose and the depth image and the color image obtained in the registration image thread to obtain a local map and a global map.
Preferably, the magnitudes of the feature points in the SIFT extraction algorithm are specifically expressed as follows:
the direction is specifically expressed as:
wherein A (x, y) is the magnitude of the feature point, x and y are the pixel positions of the feature point in the image, I (x+1, y), I (x-1, y), I (x, y+1) and I (x, y-1) are all adjacent pixels of the feature point in the Gaussian differential pyramid, and θ (x, y) is the pointing direction of the feature point.
Preferably, the probability model in the GMS matching algorithm is:
the evaluation score formula of the feature point pair is as follows:
wherein P is the difference between correct matching and incorrect matching, P true For correct matching, p false For error matching, mean true And mean false Average of match correct and match error, std true And std false The variances of the correct matching and the incorrect matching are respectively; i F 1i The I is the number of features in the feature point matching grid; i and j are the matching point areas in the two frames of images respectively, K is the current grid number, K is the total grid number,for pair of units { i } k ,j k Number of matches between.
Preferably, the registering the depth image and the color image according to the external and internal parameters specifically includes:
converting coordinates of all pixel points in the depth image to an infrared camera coordinate system;
converting coordinates of all points under the infrared camera coordinate system to a world coordinate system;
converting the coordinates of all points in the world coordinate system to a color camera coordinate system;
mapping the coordinates of all points under the coordinate system of the color camera to a color plane of the normalized plane;
and obtaining a transformation matrix between the infrared camera and the color camera.
Preferably, the semantic map thread specifically includes:
extracting features of the registration images to obtain feature layers;
pooling the feature layers to generate pyramid pooling features;
flattening and upsampling pyramid pooling features;
and performing CONCAT (merging) with the feature layer, and obtaining a local semantic map and a global semantic map through a convolutional neural network.
Preferably, the specific formula for fusing the local map and the global map by using the TSDF model in the local map and the global map thread is as follows:
the specific formula of the de-fusion construction is as follows:
wherein D (v) is the sign distance value of the voxel, W (v) is the voxel weight value, D i (v) And w is equal to i (v) The projection distance from the voxel to the i-th frame depth image and the integral weight respectively,is the updated voxel symbol distance value.
Preferably, the fusion model adopted in the fusion thread is as follows:
wherein C is i-1 (o) and W i-1 (o) respectively fusing the confidence and reliability weights of the category for the voxels corresponding to the ith frame,and->For the image in the ith frame of imageClass confidence and reliability weights for the element p.
Preferably, the specific formula for matching the local semantic map and the global semantic map in the global thread is as follows:
de-fusion:
the accuracy calculation formula is:
wherein W is local And W is equal to global Weight values of a local semantic Map and a global semantic Map, map (v, C) i-1 (o)) local With Map (v, C) i-1 (o)) global The local semantic map and the global semantic map are respectively; s is S 1 And S is equal to 2 Respectively, the three-dimensional semantic model surface area measured by using a meshlab tool, S is the three-dimensional reconstruction model surface area measured by using the meshlab tool, and k 1 And k is equal to 2 Respectively S 1 、S 2 Weight coefficient of (c) in the above-mentioned formula (c).
Compared with the prior art, the beneficial effects are that:
compared with the traditional ORB feature extraction, the SIFT algorithm feature extraction based on GPU acceleration has the advantages of higher extraction speed and better robustness; in addition, the multi-thread processing based on the GPU can simultaneously perform semantic segmentation, pose calculation and image fusion on the registered images, and the fused images can be released one by one, so that the GPU has enough memory to perform real-time fusion rendering on the images, real-time map construction is realized, three-dimensional images and semantic information are fused, the understanding capacity of unmanned mobile platforms such as unmanned aerial vehicles and robots to the environment is improved, the unmanned mobile platforms are enabled to move more accurately and flexibly, and the performance of products is improved.
Drawings
FIG. 1 is a schematic block diagram of a process of a local map and global map thread of a three-dimensional semantic map construction method according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a block division in a local map and global map thread of a three-dimensional semantic map construction method according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a semantic map thread in a three-dimensional semantic map construction method according to an embodiment of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the invention; for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions; it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted. The positional relationship described in the drawings are for illustrative purposes only and are not to be construed as limiting the invention.
The same or similar reference numbers in the drawings of embodiments of the invention correspond to the same or similar components; in the description of the present invention, it should be understood that, if there are orientations or positional relationships indicated by terms "upper", "lower", "left", "right", "long", "short", etc., based on the orientations or positional relationships shown in the drawings, this is merely for convenience in describing the present invention and simplifying the description, and is not an indication or suggestion that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, so that the terms describing the positional relationships in the drawings are merely for exemplary illustration and are not to be construed as limitations of the present patent, and that it is possible for those of ordinary skill in the art to understand the specific meaning of the terms described above according to specific circumstances.
The technical scheme of the invention is further specifically described by the following specific embodiments with reference to the accompanying drawings:
examples:
as shown in fig. 1, a three-dimensional semantic map construction method includes:
registering image threads, local map and global map threads, semantic map threads, fusion threads and global threads which can be processed in parallel based on the GPU;
the registration image thread is used for acquiring a color image and a depth image of a scene, and preprocessing the color image and the depth image to obtain a registration image; wherein the registered image is colored;
the local map and global map thread is used for solving the pose between the multi-frame images according to the registration image and the depth image, and performing three-dimensional reconstruction by using the pose, the color image and the depth image to obtain a local map and a global map;
the semantic map thread is used for carrying out semantic segmentation on the plurality of registration images by using the PSP Net to obtain a two-dimensional semantic image;
the fusion thread is used for respectively fusing the two-dimensional semantic map with the local map and the global map to obtain the local semantic map and the global semantic map;
the global thread is used for matching the local semantic map and the global semantic map to obtain a global consistency dense semantic map.
The registration image thread in this embodiment specifically includes:
calibrating a depth camera comprising an infrared camera and a color camera to obtain an internal parameter and an external parameter of the depth camera; the depth camera can adopt Kinect V2, specifically, the Kinect V2 is used for shooting a checkerboard, and the camera is calibrated to obtain an internal reference matrix of the cameraAnd the external reference matrix->Wherein R is a rotation matrix of 3x3, t is a translation vector of 3x1, f x And f y Normalized focal lengths of the image x-axis and the image y-axis respectively, c x And c y The position of the coordinates of the center point of the image;
respectively utilizing an infrared camera and a color camera in the depth camera to acquire a depth image and a color image of a multi-frame scene;
and registering the depth image and the color image according to the external participation and the internal reference to obtain a multi-frame registration image.
The local map and global map threads in this embodiment include:
taking fifteen frames of images as units, performing block division on the multi-frame registration images to obtain a plurality of image blocks, wherein three frames of stacks exist between adjacent image blocks; of course, each image block and the number of stacks between image blocks are only one referenced embodiment and should not be construed as limiting the present solution.
Performing feature extraction on the registration image in each image block by using a Scale Invariant Feature Transform (SIFT) extraction algorithm based on Graphic Processing Unit (GPU) acceleration to obtain feature points, and selecting a coordinate system of a frame of registration image as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Saving poor intra-block relevance as global image relevance matching M 2 The method comprises the steps of carrying out a first treatment on the surface of the According to M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose; wherein the pose comprises a local pose and a global pose;
in addition, solving the pose by the gauss newton method in this embodiment specifically includes:
constructing a nonlinear optimization objective function:
X * =argminE align (X),
the specific calculation process is as follows:
R=3N corr +|E|(|D i |+|I i |),
F(X k )=F(X k-1 )+J F (X k-1 )ΔX,
J F (X k-1 ) T J F (X k-1 )ΔX * =-J F (X k-1 ) T F(X k-1 ),
wherein X is the pose of the camera, X * Is the optimal solution of the pose X, E align (X) is the alignment objective function of the coefficient features and the dense luminosity and set constraint, r i (X) residual term for pose representation, N corr Is the total corresponding relation quantity in the image block, |D i I and I i The i is the size of the i frame depth image and the color image after downsampling, which are 64X53 = 3392, the i E is the number of frame pair sets, and the E is a frame pair set comprising a frame pair (i, j), i frame and j frame, F (X) k-1 ) In the form of vector of the residual error term of the pose of the previous frame of image, J F For the jacobian matrix corresponding to the vector, Δx=x k -X k-1 Delta X is the difference between the pose of the current frame and the pose of the previous frame * Deviation value for pose optimal solution, (X) k-1 ) T Is a matrix (X) k-1 ) Is a transposed matrix of (a);
and then, carrying out three-dimensional dense reconstruction on the scene according to the pose and the depth image and the color image obtained in the registration image thread to obtain a local map and a global map.
The amplitude of the feature point in the SIFT extraction algorithm in this embodiment is specifically expressed as:
the direction is specifically expressed as:
wherein A (x, y) is the magnitude of the feature point, x and y are the pixel positions of the feature point in the image, I (x+1, y), I (x-1, y), I (x, y+1) and I (x, y-1) are all adjacent pixels of the feature point in the Gaussian differential pyramid, and θ (x, y) is the pointing direction of the feature point.
The probability model in the GMS matching algorithm in this embodiment is:
the evaluation score formula of the feature point pair is as follows:
wherein P is the difference between correct matching and incorrect matching, P true For correct matching, p false For error matching, mean true And mean false Average of match correct and match error, std true And std false The variances of the correct matching and the incorrect matching are respectively; i F 1i The I is the number of features in the feature point matching grid; i and j are the matching point areas in the two frames of images respectively, K is the current grid number, K is the total grid number,for pair of units { i } k ,j k Number of matches between.
The registering of the depth image and the color image according to the external participation and the internal reference in the embodiment specifically includes: converting coordinates of all pixel points in the depth image into an infrared camera coordinate system, wherein the specific formula is as follows:
wherein Z is c For depth values, i.e. the distance of an object in space to the depth camera,an inverse matrix of an internal reference matrix of the infrared camera is +.>P is the pixel coordinate of the midpoint of the depth image IR_camera Converting the coordinates of the pixel points into coordinates under an infrared camera coordinate system;
converting coordinates of all points in an infrared camera coordinate system into a world coordinate system, wherein the specific formula is as follows:
wherein,for converting world coordinate system into inverse matrix of transformation matrix under infrared camera coordinate system, P w World coordinates that are points in the depth image;
converting coordinates of all points in the world coordinate system into a color camera coordinate system, wherein the specific formula is as follows:
P Color_camera =T wColor_camera P w
wherein T is wColor_camera The transformation matrix from the world coordinate system to the color camera coordinate system, P Color_camera The coordinates of the color camera corresponding to the midpoint of the depth image;
mapping the coordinates of all points in the color camera coordinate system to a normalized plane Z c Color plane=1, the specific formula is:
wherein K is Color_camera Is an internal reference matrix of the color camera,is the normalized mapping plane of the image,the pixel points in the final registered image are obtained;
let z=1, then the pixels of the registered image have the following relationship with the pixels of the depth image:
removing the external parameters K of two cameras Color_cameraFinally, a transformation matrix between the infrared camera and the color camera is obtained:
the following expression is obtained after the above expansion and simplification:
wherein T is wColor_camera For the transformation matrix of the world coordinate system into the color camera coordinate system,for converting world coordinate system into inverse matrix of transformation matrix of color camera coordinate system, T IR2Color Conversion matrix for converting infrared camera into color camera, R w2Color For world coordinate conversion into a rotation matrix in color camera coordinates, +.>Converting world coordinates into inverse matrix of rotation matrix under infrared camera coordinates, t w2Color Translation vector, t, for world coordinate conversion to color camera coordinates w2Color For translation vector conversion of world coordinates to color camera coordinates, T IR2Color Representing a transformation matrix of an infrared camera of size 4*4 to a color camera.
The semantic map thread in this embodiment specifically includes:
extracting features of the registration images to obtain feature layers;
pooling the feature layers to generate pyramid pooling features; the sizes of the pooling cores are 1x1,2x2,3x3 and 6x6 respectively;
flattening and upsampling pyramid pooling features;
performing CONCAT with the feature layer, and obtaining a local semantic map and a global semantic map through a convolutional neural network;
the network is trained by adopting a VOC2007 data set containing 21 kinds of information, the PSP Net backbone network is MobileNet V2, the number of training epochs (training generation number) is 140, and the ratio of the training set to the verification set is 9: and 1, performing freezing training on the first 50 epochs, namely freezing a part of training weights to accelerate the training speed. The BacthSize was set to 4, thawing was started when epoch=51, and all weights were trained. It should be noted that, the parameters used in this embodiment are all reference embodiments, and are not to be construed as limiting the present scheme, and in the specific implementation process, the parameters may be changed according to the device performance, training accuracy, and the like.
The specific formula for fusing the local map and the global map by using the TSDF model in the local map and the global map thread in the embodiment is as follows:
the specific formula of the de-fusion construction is as follows:
wherein D (v) is the sign distance value of the voxel, W (v) is the voxel weight value, D i (v) And w is equal to i (v) The projection distance from the voxel to the i-th frame depth image and the integral weight respectively,is the updated voxel symbol distance value.
The fusion model adopted in the fusion thread in this embodiment is:
wherein C is i-1 (o) and W i-1 (o) respectively fusing the confidence and reliability weights of the category for the voxels corresponding to the ith frame,and->The category confidence and reliability weight of the pixel p in the ith frame image.
In order to perfect the details of the global semantic map by utilizing the local semantic map, the local semantic map and the global semantic map are matched in the global thread in the embodiment, and the specific formula is as follows:
de-fusion:
the accuracy calculation formula is:
wherein W is local And W is equal to global Weight values of a local semantic Map and a global semantic Map, map (v, C) i-1 (o)) local With Map (v, C) i-1 (o)) global The local semantic map and the global semantic map are respectively fused; s is S 1 And S is equal to 2 Respectively, the three-dimensional semantic model surface area measured by using a meshlab tool, S is the three-dimensional reconstruction model surface area measured by using the meshlab tool, and k 1 And k is equal to 2 Respectively S 1 、S 2 Weight coefficient of (c) in the above-mentioned formula (c).
The present invention is described with reference to flowchart illustrations or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application, it being understood that each flowchart illustration or block in the flowchart illustrations or block diagrams, and combinations of flowcharts or blocks in the flowchart illustrations or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (6)

1. The three-dimensional semantic map construction method is characterized by comprising the following steps of:
registering image threads, local map and global map threads, semantic map threads, fusion threads and global threads which can be processed in parallel based on the GPU;
the registration image thread is used for acquiring a color image and a depth image of a scene, and preprocessing the color image and the depth image to obtain a registration image;
the local map and global map thread is used for solving the pose between the images according to the registration image and the depth image, and performing three-dimensional reconstruction by using the pose, the color image and the depth image to obtain a local map and a global map;
the local map and global map thread comprises:
performing block division on the multi-frame registration image to obtain a plurality of image blocks, wherein frame stacking exists between adjacent image blocks;
carrying out feature extraction on the registration images in each image block by using a SIFT extraction algorithm based on GPU acceleration to obtain feature points, and selecting a coordinate system of a frame of registration images as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Saving poor intra-block relevance as global image relevance matching M 2
According to M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose;
according to the pose and the depth image and the color image obtained in the registration image thread, carrying out three-dimensional dense reconstruction on the scene to obtain a local map and a global map;
the semantic map thread is used for carrying out semantic segmentation on the plurality of registration images by using the PSP Net to obtain a two-dimensional semantic image;
the fusion thread is used for respectively fusing the two-dimensional semantic image with the local map and the global map to obtain a local semantic map and a global semantic map;
the global thread is used for matching the local semantic map and the global semantic map to obtain a global consistency dense semantic map.
2. The three-dimensional semantic map building method according to claim 1, wherein the registering image thread specifically comprises:
calibrating a depth camera comprising an infrared camera and a color camera to obtain an internal parameter and an external parameter of the depth camera;
respectively utilizing an infrared camera and a color camera in the depth camera to continuously acquire a depth image and a color image of a multi-frame scene;
and registering the depth image and the color image according to the external participation and the internal reference to obtain a multi-frame registration image.
3. The method of claim 2, wherein the local map and global map threads comprise:
performing block division on the multi-frame registration image to obtain a plurality of image blocks, wherein frame stacking exists between adjacent image blocks;
carrying out feature extraction on the registration images in each image block by using a SIFT extraction algorithm based on GPU acceleration to obtain feature points, and selecting a coordinate system of a frame of registration images as a world coordinate system;
matching the characteristic points according to a GMS matching algorithm, filtering out mismatching points, and storing the intra-block relevance as local image relevance matching M 1 Closing the blockPoor-connectivity preservation as global image association matching M 2
According to said M 1 And M is as follows 2 Solving the pose between each frame of registration images by using a Gauss Newton method, and carrying out loop detection on the current pose;
and carrying out three-dimensional dense reconstruction on the scene according to the pose and the depth image and the color image obtained in the registration image thread to obtain a local map and a global map.
4. A three-dimensional semantic map building method according to claim 3, wherein the probability model in the GMS matching algorithm is:
the evaluation score formula of the feature point pair is as follows:
wherein P is the difference between correct matching and incorrect matching, P true For correct matching, p false For error matching, mean true And mean false Average of match correct and match error, std true And std false The variances of the correct matching and the incorrect matching are respectively; i F 1i The I is the number of features in the feature point matching grid; i and j are the matching point areas in the two frames of images respectively, K is the current grid number, K is the total grid number,for pair of units { i } k ,j k Number of matches between.
5. The three-dimensional semantic map construction method according to claim 2, wherein registering the depth image and the color image according to the external participation and the internal reference specifically comprises:
converting coordinates of all pixel points in the depth image to an infrared camera coordinate system;
converting points under the infrared camera coordinate system into a world coordinate system;
converting points in the world coordinate system into a color camera coordinate system;
mapping points under a color camera coordinate system to a color plane of the normalized plane;
and obtaining a transformation matrix between the infrared camera and the color camera.
6. A three-dimensional semantic map construction method according to claim 3, wherein the semantic map thread specifically comprises:
extracting features of the registration images to obtain feature layers;
pooling the feature layers to generate pyramid pooling features;
flattening and upsampling the pyramid pooling feature;
merging with the feature layer, and obtaining a local semantic map and a global semantic map through a convolutional neural network.
CN202110394816.7A 2021-04-13 2021-04-13 Three-dimensional semantic map construction method Active CN113313824B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110394816.7A CN113313824B (en) 2021-04-13 2021-04-13 Three-dimensional semantic map construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110394816.7A CN113313824B (en) 2021-04-13 2021-04-13 Three-dimensional semantic map construction method

Publications (2)

Publication Number Publication Date
CN113313824A CN113313824A (en) 2021-08-27
CN113313824B true CN113313824B (en) 2024-03-15

Family

ID=77372349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110394816.7A Active CN113313824B (en) 2021-04-13 2021-04-13 Three-dimensional semantic map construction method

Country Status (1)

Country Link
CN (1) CN113313824B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116817892B (en) * 2023-08-28 2023-12-19 之江实验室 Cloud integrated unmanned aerial vehicle route positioning method and system based on visual semantic map
CN117788306A (en) * 2023-12-18 2024-03-29 上海贝特威自动化科技有限公司 Multithreading-based multi-focal-length tab image fusion method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080659A (en) * 2019-12-19 2020-04-28 哈尔滨工业大学 Environmental semantic perception method based on visual information

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080659A (en) * 2019-12-19 2020-04-28 哈尔滨工业大学 Environmental semantic perception method based on visual information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
SSE指令集在~(60)Co集装箱CT系统图像重建算法中的应用;宋麒;罗志宇;丛鹏;;核电子学与探测技术(01);全文 *
基于激光SLAM和深度学习的语义地图构建;何松;孙静;郭乐江;陈梁;;计算机技术与发展(09);全文 *
基于特征匹配与运动补偿的视频稳像算法;唐佳林;郑杰锋;李熙莹;苏秉华;;计算机应用研究(02);全文 *

Also Published As

Publication number Publication date
CN113313824A (en) 2021-08-27

Similar Documents

Publication Publication Date Title
CN111429514B (en) Laser radar 3D real-time target detection method integrating multi-frame time sequence point cloud
CN109682381B (en) Omnidirectional vision based large-view-field scene perception method, system, medium and equipment
CN111862126B (en) Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm
CN113819890B (en) Distance measuring method, distance measuring device, electronic equipment and storage medium
CN109726627A (en) A kind of detection method of neural network model training and common ground line
CN113705521A (en) Head pose estimation method combined with YOLO-MobilenetV3 face detection
Ding et al. Vehicle pose and shape estimation through multiple monocular vision
CN110825101A (en) Unmanned aerial vehicle autonomous landing method based on deep convolutional neural network
CN113313824B (en) Three-dimensional semantic map construction method
CN111998862B (en) BNN-based dense binocular SLAM method
US20240013505A1 (en) Method, system, medium, equipment and terminal for inland vessel identification and depth estimation for smart maritime
CN112489099B (en) Point cloud registration method and device, storage medium and electronic equipment
Mseddi et al. YOLOv5 based visual localization for autonomous vehicles
CN110148177A (en) For determining the method, apparatus of the attitude angle of camera, calculating equipment, computer readable storage medium and acquisition entity
CN110260866A (en) A kind of robot localization and barrier-avoiding method of view-based access control model sensor
Li et al. Aruco marker detection under occlusion using convolutional neural network
CN108225273A (en) A kind of real-time runway detection method based on sensor priori
CN115578460A (en) Robot grabbing method and system based on multi-modal feature extraction and dense prediction
CN114358133B (en) Method for detecting looped frames based on semantic-assisted binocular vision SLAM
CN111626241A (en) Face detection method and device
CN117523461B (en) Moving target tracking and positioning method based on airborne monocular camera
CN114494435A (en) Rapid optimization method, system and medium for matching and positioning of vision and high-precision map
Crombez et al. Using dense point clouds as environment model for visual localization of mobile robot
CN115953471A (en) Indoor scene multi-scale vector image retrieval and positioning method, system and medium
Li-Chee-Ming et al. Determination of UAS trajectory in a known environment from FPV video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant