CN111340939A - Indoor three-dimensional semantic map construction method - Google Patents

Indoor three-dimensional semantic map construction method Download PDF

Info

Publication number
CN111340939A
CN111340939A CN202010108398.6A CN202010108398A CN111340939A CN 111340939 A CN111340939 A CN 111340939A CN 202010108398 A CN202010108398 A CN 202010108398A CN 111340939 A CN111340939 A CN 111340939A
Authority
CN
China
Prior art keywords
dimensional
indoor
image
semantic
rgb
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010108398.6A
Other languages
Chinese (zh)
Other versions
CN111340939B (en
Inventor
赵芳
曾碧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202010108398.6A priority Critical patent/CN111340939B/en
Publication of CN111340939A publication Critical patent/CN111340939A/en
Application granted granted Critical
Publication of CN111340939B publication Critical patent/CN111340939B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/206Instruments for performing navigational calculations specially adapted for indoor navigation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • G01C21/30Map- or contour-matching
    • G01C21/32Structuring or formatting of map data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention belongs to the field of three-dimensional reconstruction and scene understanding, and particularly relates to an indoor three-dimensional semantic map construction method, aiming at solving the technical problems that a family service robot understands semantic information of the surrounding environment, is convenient for man-machine interaction, executes high-level intelligent operation and the like. The method comprises the steps of firstly, carrying out image acquisition on an indoor scene by using an RGB-D sensor, carrying out target detection or semantic segmentation on a two-dimensional color image to obtain corresponding semantic information, simultaneously repairing a depth image to carry out three-dimensional reconstruction, and finally fusing the image semantic information into a three-dimensional map to obtain the indoor three-dimensional semantic map. The technical scheme of the invention can realize accurate and accurate three-dimensional information perception, has important significance for the family service robot, and is also suitable for application such as indoor augmented reality and three-dimensional indoor design.

Description

Indoor three-dimensional semantic map construction method
Technical Field
The invention relates to the field of three-dimensional reconstruction and scene understanding, in particular to an indoor three-dimensional semantic map construction method and system.
Background
The rapid and accurate three-dimensional information perception is a key technology for emerging applications such as family service robots, indoor augmented reality and three-dimensional indoor design. In recent years, with the development of depth sensors (e.g., microsoft Kinect, intel real sense, etc.), three-dimensional scanning technology has been greatly advanced. The depth map and color map collected by these sensors can be conveniently used to generate a dense three-dimensional model of the scanned object. And the research development of the indoor scene three-dimensional semantic map construction is promoted. The semantic map can be widely applied to the fields of robots, navigation, human-computer interaction and the like. An indoor semantic map typically includes spatial attribute information, such as the floor structure of a building, room distribution, etc., as well as semantic attribute information, such as individual room attributes and functions, and object class and location information within a room, etc. The goal of semantic map building is to accurately label semantic information on a map.
Through the literature retrieval of the prior art, the literature 1 (Wuhao. robot map construction research [ D ]. Jinan: Shandong university, 2011.) utilizes the QRCode technology to paste a two-dimensional code as an artificial landmark on a large object in a family semi-unknown environment so as to construct a semantic map capable of describing the object-room affiliation relationship; document 2 (zhao journey. based on visual-voice interactive indoor level map construction and navigation system [ D ]. mansion door: mansion door university, 2014.) realizes a grid-topology-semantic multi-level map from bottom to top by a visual tracking human body and voice labeling technology, but relies on manual human intervention in the process of map construction; document 3(SHENG W, DU J, CHENG Q, et al. robot management mapping and computing adaptive reliability recognition: A wearable sensing and computing adaptive approach [ J ]. Robotics and Autonomous Systems,2015,68(C):47-58.) creatively proposes to use wearable devices to recognize human body actions and establish a Bayesian framework based on the relationship between human body actions and object types to construct semantic maps, but the wearing of wearable devices is somewhat cumbersome for practical applications.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides an indoor three-dimensional semantic map construction method based on an RGB-D sensor, which can construct a map containing room semantic information and room object semantic information so that a robot can execute high-level intelligent operation and better serve human beings.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the invention provides a method for constructing an indoor three-dimensional semantic map, which comprises the following steps:
step S1, data acquisition; collecting color depth RGB-D image information of an indoor environment by using an RGB-D sensor, wherein the color depth RGB-D image information comprises an RGB image and a depth image;
step S2, obtaining semantic information: carrying out target detection or semantic segmentation on the acquired two-dimensional RGB image by using a deep learning algorithm to obtain corresponding semantic information;
step S3, repairing the depth image;
step S4, building a three-dimensional map of the indoor environment: constructing a three-dimensional map by using the repaired indoor environment RGB-D image;
and step S5, forming a three-dimensional semantic map: and fusing the target with semantic information obtained in the step S2 with the indoor three-dimensional map obtained in the step S4 through coordinate position conversion, and carrying out assignment and labeling on the map by using a label to form the indoor environment three-dimensional semantic map.
In a preferred embodiment, the specific steps of step S1 are as follows:
the user can scan the indoor environment by holding the equipment with the RGB-D sensor or by the mobile robot with the RGB-D sensor to obtain continuous RGB-D images.
In a preferred embodiment, the target detection method in step S2 is YOLOv 3.
In a preferred embodiment, the step S3 uses a parallelized real-time depth image restoration algorithm based on the CUDA technique.
In a preferred embodiment, the step S4 employs a modified three-dimensional reconstruction BundleFusion algorithm.
The invention provides an indoor three-dimensional semantic map construction system in a second aspect, which comprises a data acquisition module, a three-dimensional dense reconstruction module and a semantic fusion dense reconstruction module;
the data acquisition module acquires color depth RGB-D image information of an indoor environment and divides the color depth RGB-D image information into an RGB image and a depth image; respectively carrying out RGB image target detection/semantic segmentation and CUDA depth image restoration;
the three-dimensional dense reconstruction module performs corresponding relation matching between frames on the input aligned color and depth data streams, then performs global pose optimization, corrects the overall drift, and keeps the model in a continuously dynamic updating state in the whole reconstruction process;
the semantic fusion dense reconstruction module is used for carrying out target detection or semantic segmentation on the image acquired by the camera, integrating the semantic result of the obtained image into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayes updating, and realizing the construction of an indoor scene three-dimensional semantic map facing the service robot.
In a preferred scheme, the CUDA depth image restoration method specifically includes the following steps:
the invalid points on each depth image are filtered using equation (1).
Figure BDA0002389154770000031
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs that the standard quantity is calculated by the formula (2);
Figure BDA0002389154770000032
the weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
Figure BDA0002389154770000033
in the formula:
Figure BDA0002389154770000034
is the standard deviation of a spatial gaussian function,
Figure BDA0002389154770000035
is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
In a preferred embodiment, said three-dimensional dense reconstruction module,
in the aspect of matching, a coarse-fine parallel global optimization method is used; using sparse SIFT feature points to perform rough registration, and then using dense luminosity and geometric constraint to perform finer registration;
in the aspect of position and attitude optimization, a layered local-to-global optimization method is used, the method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local position and attitude optimization is carried out on all frames in the chunk; on the second layer, only all the chunk key frames are used for mutual correlation and then global optimization; the method has the advantages that the key frames can be separated, and the storage and the data to be processed are reduced;
in the aspect of dense scene reconstruction, reconstruction errors caused by accumulated drift or calculation in the featureless region are corrected based on the attitude estimation.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the method for constructing the indoor three-dimensional semantic map provided by the invention establishes the three-dimensional scene map by scanning the surrounding environment of the indoor scene by using the RGB-D sensor, and meanwhile, obtains semantic information (walls, doors and windows, the ground, various furniture and the like) which can enable the robot to automatically understand the surrounding environment by using a deep learning algorithm, and finally realizes the construction of the three-dimensional semantic map of the indoor scene; the method has important significance for the home service robot to really understand the surrounding environment and achieve the real purpose of intelligent semantic perception, and has important reference value for acquiring scene three-dimensional information for emerging applications such as indoor augmented reality and three-dimensional indoor design.
Drawings
Fig. 1 is a flowchart of a method for constructing an indoor scene three-dimensional semantic map according to the present invention.
FIG. 2 is a schematic flow chart of an indoor scene three-dimensional semantic map construction system according to the present invention;
FIG. 3 is an original depth image generated by Kinect;
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The invention provides a method for constructing an indoor three-dimensional semantic map, which comprises the following steps:
step S1, data acquisition; collecting color depth RGB-D image information of an indoor environment by using an RGB-D sensor, wherein the color depth RGB-D image information comprises an RGB image and a depth image;
step S2, obtaining semantic information: carrying out target detection or semantic segmentation on the acquired two-dimensional RGB image by using a deep learning algorithm to obtain corresponding semantic information;
step S3, repairing the depth image;
step S4, building a three-dimensional map of the indoor environment: constructing a three-dimensional map by using the repaired indoor environment RGB-D image;
and step S5, forming a three-dimensional semantic map: and fusing the target with semantic information obtained in the step S2 with the indoor three-dimensional map obtained in the step S3 through coordinate position conversion, and carrying out assignment and labeling on the map by using a label to form the indoor environment three-dimensional semantic map.
In a preferred embodiment, the specific steps of step S1 are as follows:
the user can scan the indoor environment by holding the equipment with the RGB-D sensor or by the mobile robot with the RGB-D sensor to obtain continuous RGB-D images.
In a preferred embodiment, the target detection method in step S2 is YOLOv 3.
In a preferred embodiment, the step S3 uses a parallelized real-time depth image restoration algorithm based on the CUDA technique.
In a preferred embodiment, the step S4 employs a modified three-dimensional reconstruction BundleFusion algorithm.
Example 2
The invention provides an indoor three-dimensional semantic map construction system in a second aspect, which comprises a data acquisition module, a three-dimensional dense reconstruction module and a semantic fusion dense reconstruction module;
the data acquisition module acquires color depth RGB-D image information of an indoor environment and divides the color depth RGB-D image information into an RGB image and a depth image; respectively carrying out RGB image target detection/semantic segmentation and CUDA depth image restoration;
the three-dimensional dense reconstruction module performs corresponding relation matching between frames on the input aligned color and depth data streams, then performs global pose optimization, corrects the overall drift, and keeps the model in a continuously dynamic updating state in the whole reconstruction process;
the semantic fusion dense reconstruction module is used for carrying out target detection or semantic segmentation on the image acquired by the camera, integrating the semantic result of the obtained image into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayes updating, and realizing the construction of an indoor scene three-dimensional semantic map facing the service robot.
In a preferred scheme, the CUDA depth image restoration method specifically includes the following steps:
the invalid points on each depth image are filtered using equation (1).
Figure BDA0002389154770000051
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs that the standard quantity is calculated by the formula (2);
Figure BDA0002389154770000052
the weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
Figure BDA0002389154770000061
in the formula:
Figure BDA0002389154770000062
is the standard deviation of a spatial gaussian function,
Figure BDA0002389154770000063
is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
In a preferred embodiment, said three-dimensional dense reconstruction module,
in the aspect of matching, a coarse-fine parallel global optimization method is used; using sparse SIFT feature points to perform rough registration, and then using dense luminosity and geometric constraint to perform finer registration;
in the aspect of position and attitude optimization, a layered local-to-global optimization method is used, the method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local position and attitude optimization is carried out on all frames in the chunk; on the second layer, only all the chunk key frames are used for mutual correlation and then global optimization; the method has the advantages that the key frames can be separated, and the storage and the data to be processed are reduced;
in the aspect of dense scene reconstruction, reconstruction errors caused by accumulated drift or calculation in the featureless region are corrected based on the attitude estimation.
Example 3
The embodiment of the invention provides a detailed flow diagram of an indoor scene three-dimensional semantic map construction method. The method mainly comprises three modules of data acquisition, three-dimensional dense reconstruction, semantic fusion dense reconstruction and the like.
Where the data collection uses RGB-D sensors, embodiments of the invention may scan the indoor environment with a user holding a depth sensor equipped device (e.g., KinectV2) or with a mobile robot equipped with a depth sensor, collecting continuous image data. The RGB-D image data includes an RGB color image and a depth image. The depth image can directly reflect real three-dimensional environment information, as shown in fig. 3. Due to the fact that self equipment, the surface material of an object, the region shielding and the like exist, a large number of invalid regions such as black edges and black holes exist in an original depth image generated by the Kinect, and the use of the depth image is greatly influenced. The embodiment of the invention uses a parallel real-time depth image restoration algorithm based on the CUDA technology to realize real-time and effective restoration of the depth image on the mobile robot.
In the embodiment, in order to parallelize the image restoration program, the image is divided, the size of the depth image of the Kinect v2 is 512 × 424, 12 lines of pixels above and below the image are omitted, 32 × 20 is used as a block, grid of 16 × 20 is formed, the grid is uploaded to a GPU after the image division is completed, and the image restoration program is executed in parallel by the GPU, and invalid points on each image are filtered by using a formula (1).
Figure BDA0002389154770000071
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs the standard quantity calculated by the formula (2).
Figure BDA0002389154770000072
The weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
Figure BDA0002389154770000073
in the formula:
Figure BDA0002389154770000074
is the standard deviation of a spatial gaussian function,
Figure BDA0002389154770000075
is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
The three-dimensional dense reconstruction module in fig. 2 is mainly completed based on a BundleFusion algorithm, and according to the embodiment of the invention, invalid point repairing processing is firstly performed on the acquired original depth image so as to solve the problem that the matching error of the key point is accumulated due to the existence of noise in the sensor. And then carrying out corresponding relation matching between frames on the input aligned color and depth data streams, then carrying out global pose optimization, correcting the overall drift, and keeping the model in a continuously dynamic updating state in the whole reconstruction process.
In the aspect of matching, a coarse-to-fine parallel global optimization method is used. First a coarser registration is performed using sparse SIFT feature points, and then a finer registration is performed using dense photometric and geometric constraints.
In terms of pose optimization, a hierarchical local-to-global optimization method is used. The method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local pose optimization is carried out on all frames in the chunk. On the second level, only all the chunk's key frames are used for inter-correlation and then global optimization. The method has the advantages of being capable of separating out key frames and reducing storage and data to be processed.
In terms of dense scene reconstruction, the key point is the symmetric update of the model: if an updated frame estimate is to be added, the old frame is removed and then re-integrated at the new pose. Based on the method, reconstruction errors caused by accumulated drift or calculation in the featureless area can be corrected as long as better attitude estimation is carried out, so that the model is more and more accurate.
The semantic information in the semantic fusion dense reconstruction module in fig. 2 can be obtained by a target detection or semantic segmentation method. Benefiting from the development of deep learning in recent years, the computer vision field obtains a plurality of remarkable achievements, wherein the achievements comprise target detection and semantic segmentation of images, a better target detection algorithm is a YOLO series, and can meet the requirement of a real-time detection task, wherein the YOLOv3 balances speed and precision by changing the size of a model structure; the average precision of the better semantic segmentation method Deeplabv3 reaches 85.2 percent. The algorithms are used for carrying out target detection or semantic segmentation on the images acquired by the camera, and the semantic results of the obtained images are integrated into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayesian update, so that the construction of an indoor scene three-dimensional semantic map facing a service robot is realized.
The method comprises the steps of scanning the surrounding environment of the indoor scene by using an RGB-D sensor to establish a three-dimensional scene map, and acquiring semantic information (walls, doors and windows, the ground, various furniture and the like) which enables a robot to automatically understand the surrounding environment by using a deep learning algorithm, so that the three-dimensional semantic map of the indoor scene is constructed finally; the method has important significance for the home service robot to really understand the surrounding environment and achieve the real purpose of intelligent semantic perception, and has important reference value for acquiring scene three-dimensional information for emerging applications such as indoor augmented reality and three-dimensional indoor design.
The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (8)

1. An indoor three-dimensional semantic map construction method is characterized by comprising the following steps:
s1, data acquisition; collecting color depth RGB-D image information of an indoor environment by using an RGB-D sensor, wherein the color depth RGB-D image information comprises an RGB image and a depth image;
s2, obtaining semantic information: carrying out target detection or semantic segmentation on the acquired two-dimensional RGB image by using a deep learning algorithm to obtain corresponding semantic information;
s3, repairing the depth image;
s4, constructing an indoor environment three-dimensional map: constructing a three-dimensional map by using the repaired indoor environment RGB-D image;
s5, forming a three-dimensional semantic map: and fusing the target with semantic information obtained in the step S2 with the indoor three-dimensional map obtained in the step S4 through coordinate position conversion, and carrying out assignment and labeling on the map by using a label to form the indoor environment three-dimensional semantic map.
2. The indoor three-dimensional semantic map construction method according to claim 1, wherein the specific steps of the step S1 are as follows:
the user can scan the indoor environment by holding the equipment with the RGB-D sensor or by the mobile robot with the RGB-D sensor to obtain continuous RGB-D images.
3. The indoor three-dimensional semantic map construction method according to claim 2, wherein the target detection method in the step S2 is YOLOv 3.
4. The indoor three-dimensional semantic map construction method according to claim 3, wherein the step S3 uses a parallelized real-time depth image restoration algorithm based on CUDA technology.
5. The indoor three-dimensional semantic map construction method according to claim 3, wherein the step S4 adopts a modified three-dimensional reconstruction Bundlefusion algorithm.
6. An indoor three-dimensional semantic map construction system based on the method of claims 1-5, which is characterized by comprising a data acquisition module, a three-dimensional dense reconstruction module and a semantic fusion dense reconstruction module;
the data acquisition module acquires color depth RGB-D image information of an indoor environment and divides the color depth RGB-D image information into an RGB image and a depth image; respectively carrying out RGB image target detection/semantic segmentation and CUDA depth image restoration;
the three-dimensional dense reconstruction module performs corresponding relation matching between frames on the input aligned color and depth data streams, then performs global pose optimization, corrects the overall drift, and keeps the model in a continuously dynamic updating state in the whole reconstruction process;
the semantic fusion dense reconstruction module is used for carrying out target detection or semantic segmentation on the image acquired by the camera, integrating the semantic result of the obtained image into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayes updating, and realizing the construction of an indoor scene three-dimensional semantic map facing the service robot.
7. The indoor three-dimensional semantic map construction system according to claim 6, wherein the CUDA depth image restoration comprises the following specific steps:
the invalid points on each depth image are filtered using equation (1).
Figure FDA0002389154760000021
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs that the standard quantity is calculated by the formula (2);
Figure FDA0002389154760000022
the weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
Figure FDA0002389154760000023
in the formula:
Figure FDA0002389154760000024
is the standard deviation of a spatial gaussian function,
Figure FDA0002389154760000025
is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
8. The indoor three-dimensional semantic map building system according to claim 6, wherein the three-dimensional dense reconstruction module,
in the aspect of matching, a coarse-fine parallel global optimization method is used; using sparse SIFT feature points to perform rough registration, and then using dense luminosity and geometric constraint to perform finer registration;
in the aspect of position and attitude optimization, a layered local-to-global optimization method is used, the method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local position and attitude optimization is carried out on all frames in the chunk; on the second layer, only all the chunk key frames are used for mutual correlation and then global optimization;
in the aspect of dense scene reconstruction, reconstruction errors caused by accumulated drift or calculation in the featureless region are corrected based on the attitude estimation.
CN202010108398.6A 2020-02-21 2020-02-21 Indoor three-dimensional semantic map construction method Active CN111340939B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010108398.6A CN111340939B (en) 2020-02-21 2020-02-21 Indoor three-dimensional semantic map construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010108398.6A CN111340939B (en) 2020-02-21 2020-02-21 Indoor three-dimensional semantic map construction method

Publications (2)

Publication Number Publication Date
CN111340939A true CN111340939A (en) 2020-06-26
CN111340939B CN111340939B (en) 2023-04-18

Family

ID=71187107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010108398.6A Active CN111340939B (en) 2020-02-21 2020-02-21 Indoor three-dimensional semantic map construction method

Country Status (1)

Country Link
CN (1) CN111340939B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113447012A (en) * 2021-05-10 2021-09-28 天津大学 Service robot 2D semantic map generation method and device based on deep learning
CN113467267A (en) * 2021-07-28 2021-10-01 珠海格力电器股份有限公司 Control method of intelligent home system and intelligent home system
WO2022021661A1 (en) * 2020-07-27 2022-02-03 深圳大学 Gaussian process-based visual positioning method, system, and storage medium
CN114494267A (en) * 2021-11-30 2022-05-13 北京国网富达科技发展有限责任公司 Substation and cable tunnel scene semantic construction system and method
CN116311023A (en) * 2022-12-27 2023-06-23 广东长盈科技股份有限公司 Equipment inspection method and system based on 5G communication and virtual reality
CN117132727A (en) * 2023-10-23 2023-11-28 光轮智能(北京)科技有限公司 Map data acquisition method, computer readable storage medium and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017079918A1 (en) * 2015-11-11 2017-05-18 中国科学院深圳先进技术研究院 Indoor scene scanning reconstruction method and apparatus
WO2018129715A1 (en) * 2017-01-13 2018-07-19 浙江大学 Simultaneous positioning and dense three-dimensional reconstruction method
CN109658449A (en) * 2018-12-03 2019-04-19 华中科技大学 A kind of indoor scene three-dimensional rebuilding method based on RGB-D image
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017079918A1 (en) * 2015-11-11 2017-05-18 中国科学院深圳先进技术研究院 Indoor scene scanning reconstruction method and apparatus
WO2018129715A1 (en) * 2017-01-13 2018-07-19 浙江大学 Simultaneous positioning and dense three-dimensional reconstruction method
CN109658449A (en) * 2018-12-03 2019-04-19 华中科技大学 A kind of indoor scene three-dimensional rebuilding method based on RGB-D image
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022021661A1 (en) * 2020-07-27 2022-02-03 深圳大学 Gaussian process-based visual positioning method, system, and storage medium
CN113447012A (en) * 2021-05-10 2021-09-28 天津大学 Service robot 2D semantic map generation method and device based on deep learning
CN113467267A (en) * 2021-07-28 2021-10-01 珠海格力电器股份有限公司 Control method of intelligent home system and intelligent home system
CN114494267A (en) * 2021-11-30 2022-05-13 北京国网富达科技发展有限责任公司 Substation and cable tunnel scene semantic construction system and method
CN116311023A (en) * 2022-12-27 2023-06-23 广东长盈科技股份有限公司 Equipment inspection method and system based on 5G communication and virtual reality
CN117132727A (en) * 2023-10-23 2023-11-28 光轮智能(北京)科技有限公司 Map data acquisition method, computer readable storage medium and electronic device
CN117132727B (en) * 2023-10-23 2024-02-06 光轮智能(北京)科技有限公司 Map data acquisition method, computer readable storage medium and electronic device

Also Published As

Publication number Publication date
CN111340939B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN111340939B (en) Indoor three-dimensional semantic map construction method
CN111798475B (en) Indoor environment 3D semantic map construction method based on point cloud deep learning
US10127670B2 (en) Computer vision systems and methods for detecting and modeling features of structures in images
CN107292234B (en) Indoor scene layout estimation method based on information edge and multi-modal features
WO2021004416A1 (en) Method and apparatus for establishing beacon map on basis of visual beacons
CN112634451A (en) Outdoor large-scene three-dimensional mapping method integrating multiple sensors
CN114365201A (en) Structural annotation
CN112396595B (en) Semantic SLAM method based on point-line characteristics in dynamic environment
CN111860651B (en) Monocular vision-based semi-dense map construction method for mobile robot
CN111462210A (en) Monocular line feature map construction method based on epipolar constraint
CN112734765A (en) Mobile robot positioning method, system and medium based on example segmentation and multi-sensor fusion
CN115272596A (en) Multi-sensor fusion SLAM method oriented to monotonous texture-free large scene
CN112396656A (en) Outdoor mobile robot pose estimation method based on fusion of vision and laser radar
CN114332394A (en) Semantic information assistance-based dynamic scene three-dimensional reconstruction method
CN111998862A (en) Dense binocular SLAM method based on BNN
CN116619358A (en) Self-adaptive positioning optimization and mapping method for autonomous mining robot
Yin et al. CoMask: Corresponding mask-based end-to-end extrinsic calibration of the camera and LiDAR
Yu et al. Accurate and robust visual localization system in large-scale appearance-changing environments
Zhou et al. A state-of-the-art review on SLAM
Zhang et al. Accurate real-time SLAM based on two-step registration and multimodal loop detection
Zhao et al. A review of visual SLAM for dynamic objects
Zhang et al. Hybrid iteration and optimization-based three-dimensional reconstruction for space non-cooperative targets with monocular vision and sparse lidar fusion
CN108491826A (en) A kind of extraction method of remote sensing image building
CN113744397B (en) Real-time object-level semantic map construction and updating method and device
Chen et al. An improved Snake model for refinement of LiDAR-derived building roof contours using aerial images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant