CN111340939A - Indoor three-dimensional semantic map construction method - Google Patents
Indoor three-dimensional semantic map construction method Download PDFInfo
- Publication number
- CN111340939A CN111340939A CN202010108398.6A CN202010108398A CN111340939A CN 111340939 A CN111340939 A CN 111340939A CN 202010108398 A CN202010108398 A CN 202010108398A CN 111340939 A CN111340939 A CN 111340939A
- Authority
- CN
- China
- Prior art keywords
- dimensional
- indoor
- image
- semantic
- rgb
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000010276 construction Methods 0.000 title claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000001514 detection method Methods 0.000 claims abstract description 17
- 230000011218 segmentation Effects 0.000 claims abstract description 14
- 238000005457 optimization Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 13
- 230000004927 fusion Effects 0.000 claims description 12
- HPTJABJPZMULFH-UHFFFAOYSA-N 12-[(Cyclohexylcarbamoyl)amino]dodecanoic acid Chemical compound OC(=O)CCCCCCCCCCCNC(=O)NC1CCCCC1 HPTJABJPZMULFH-UHFFFAOYSA-N 0.000 claims description 10
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000003190 augmentative effect Effects 0.000 abstract description 4
- 238000013461 design Methods 0.000 abstract description 4
- 230000008447 perception Effects 0.000 abstract description 4
- 230000003993 interaction Effects 0.000 abstract description 2
- 241000282414 Homo sapiens Species 0.000 description 5
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 240000005373 Panax quinquefolius Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/20—Instruments for performing navigational calculations
- G01C21/206—Instruments for performing navigational calculations specially adapted for indoor navigation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/26—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
- G01C21/28—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
- G01C21/30—Map- or contour-matching
- G01C21/32—Structuring or formatting of map data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G06T5/77—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Abstract
The invention belongs to the field of three-dimensional reconstruction and scene understanding, and particularly relates to an indoor three-dimensional semantic map construction method, aiming at solving the technical problems that a family service robot understands semantic information of the surrounding environment, is convenient for man-machine interaction, executes high-level intelligent operation and the like. The method comprises the steps of firstly, carrying out image acquisition on an indoor scene by using an RGB-D sensor, carrying out target detection or semantic segmentation on a two-dimensional color image to obtain corresponding semantic information, simultaneously repairing a depth image to carry out three-dimensional reconstruction, and finally fusing the image semantic information into a three-dimensional map to obtain the indoor three-dimensional semantic map. The technical scheme of the invention can realize accurate and accurate three-dimensional information perception, has important significance for the family service robot, and is also suitable for application such as indoor augmented reality and three-dimensional indoor design.
Description
Technical Field
The invention relates to the field of three-dimensional reconstruction and scene understanding, in particular to an indoor three-dimensional semantic map construction method and system.
Background
The rapid and accurate three-dimensional information perception is a key technology for emerging applications such as family service robots, indoor augmented reality and three-dimensional indoor design. In recent years, with the development of depth sensors (e.g., microsoft Kinect, intel real sense, etc.), three-dimensional scanning technology has been greatly advanced. The depth map and color map collected by these sensors can be conveniently used to generate a dense three-dimensional model of the scanned object. And the research development of the indoor scene three-dimensional semantic map construction is promoted. The semantic map can be widely applied to the fields of robots, navigation, human-computer interaction and the like. An indoor semantic map typically includes spatial attribute information, such as the floor structure of a building, room distribution, etc., as well as semantic attribute information, such as individual room attributes and functions, and object class and location information within a room, etc. The goal of semantic map building is to accurately label semantic information on a map.
Through the literature retrieval of the prior art, the literature 1 (Wuhao. robot map construction research [ D ]. Jinan: Shandong university, 2011.) utilizes the QRCode technology to paste a two-dimensional code as an artificial landmark on a large object in a family semi-unknown environment so as to construct a semantic map capable of describing the object-room affiliation relationship; document 2 (zhao journey. based on visual-voice interactive indoor level map construction and navigation system [ D ]. mansion door: mansion door university, 2014.) realizes a grid-topology-semantic multi-level map from bottom to top by a visual tracking human body and voice labeling technology, but relies on manual human intervention in the process of map construction; document 3(SHENG W, DU J, CHENG Q, et al. robot management mapping and computing adaptive reliability recognition: A wearable sensing and computing adaptive approach [ J ]. Robotics and Autonomous Systems,2015,68(C):47-58.) creatively proposes to use wearable devices to recognize human body actions and establish a Bayesian framework based on the relationship between human body actions and object types to construct semantic maps, but the wearing of wearable devices is somewhat cumbersome for practical applications.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides an indoor three-dimensional semantic map construction method based on an RGB-D sensor, which can construct a map containing room semantic information and room object semantic information so that a robot can execute high-level intelligent operation and better serve human beings.
In order to solve the technical problems, the technical scheme of the invention is as follows:
the invention provides a method for constructing an indoor three-dimensional semantic map, which comprises the following steps:
step S1, data acquisition; collecting color depth RGB-D image information of an indoor environment by using an RGB-D sensor, wherein the color depth RGB-D image information comprises an RGB image and a depth image;
step S2, obtaining semantic information: carrying out target detection or semantic segmentation on the acquired two-dimensional RGB image by using a deep learning algorithm to obtain corresponding semantic information;
step S3, repairing the depth image;
step S4, building a three-dimensional map of the indoor environment: constructing a three-dimensional map by using the repaired indoor environment RGB-D image;
and step S5, forming a three-dimensional semantic map: and fusing the target with semantic information obtained in the step S2 with the indoor three-dimensional map obtained in the step S4 through coordinate position conversion, and carrying out assignment and labeling on the map by using a label to form the indoor environment three-dimensional semantic map.
In a preferred embodiment, the specific steps of step S1 are as follows:
the user can scan the indoor environment by holding the equipment with the RGB-D sensor or by the mobile robot with the RGB-D sensor to obtain continuous RGB-D images.
In a preferred embodiment, the target detection method in step S2 is YOLOv 3.
In a preferred embodiment, the step S3 uses a parallelized real-time depth image restoration algorithm based on the CUDA technique.
In a preferred embodiment, the step S4 employs a modified three-dimensional reconstruction BundleFusion algorithm.
The invention provides an indoor three-dimensional semantic map construction system in a second aspect, which comprises a data acquisition module, a three-dimensional dense reconstruction module and a semantic fusion dense reconstruction module;
the data acquisition module acquires color depth RGB-D image information of an indoor environment and divides the color depth RGB-D image information into an RGB image and a depth image; respectively carrying out RGB image target detection/semantic segmentation and CUDA depth image restoration;
the three-dimensional dense reconstruction module performs corresponding relation matching between frames on the input aligned color and depth data streams, then performs global pose optimization, corrects the overall drift, and keeps the model in a continuously dynamic updating state in the whole reconstruction process;
the semantic fusion dense reconstruction module is used for carrying out target detection or semantic segmentation on the image acquired by the camera, integrating the semantic result of the obtained image into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayes updating, and realizing the construction of an indoor scene three-dimensional semantic map facing the service robot.
In a preferred scheme, the CUDA depth image restoration method specifically includes the following steps:
the invalid points on each depth image are filtered using equation (1).
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs that the standard quantity is calculated by the formula (2);
the weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
in the formula:is the standard deviation of a spatial gaussian function,is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
In a preferred embodiment, said three-dimensional dense reconstruction module,
in the aspect of matching, a coarse-fine parallel global optimization method is used; using sparse SIFT feature points to perform rough registration, and then using dense luminosity and geometric constraint to perform finer registration;
in the aspect of position and attitude optimization, a layered local-to-global optimization method is used, the method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local position and attitude optimization is carried out on all frames in the chunk; on the second layer, only all the chunk key frames are used for mutual correlation and then global optimization; the method has the advantages that the key frames can be separated, and the storage and the data to be processed are reduced;
in the aspect of dense scene reconstruction, reconstruction errors caused by accumulated drift or calculation in the featureless region are corrected based on the attitude estimation.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the method for constructing the indoor three-dimensional semantic map provided by the invention establishes the three-dimensional scene map by scanning the surrounding environment of the indoor scene by using the RGB-D sensor, and meanwhile, obtains semantic information (walls, doors and windows, the ground, various furniture and the like) which can enable the robot to automatically understand the surrounding environment by using a deep learning algorithm, and finally realizes the construction of the three-dimensional semantic map of the indoor scene; the method has important significance for the home service robot to really understand the surrounding environment and achieve the real purpose of intelligent semantic perception, and has important reference value for acquiring scene three-dimensional information for emerging applications such as indoor augmented reality and three-dimensional indoor design.
Drawings
Fig. 1 is a flowchart of a method for constructing an indoor scene three-dimensional semantic map according to the present invention.
FIG. 2 is a schematic flow chart of an indoor scene three-dimensional semantic map construction system according to the present invention;
FIG. 3 is an original depth image generated by Kinect;
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
The invention provides a method for constructing an indoor three-dimensional semantic map, which comprises the following steps:
step S1, data acquisition; collecting color depth RGB-D image information of an indoor environment by using an RGB-D sensor, wherein the color depth RGB-D image information comprises an RGB image and a depth image;
step S2, obtaining semantic information: carrying out target detection or semantic segmentation on the acquired two-dimensional RGB image by using a deep learning algorithm to obtain corresponding semantic information;
step S3, repairing the depth image;
step S4, building a three-dimensional map of the indoor environment: constructing a three-dimensional map by using the repaired indoor environment RGB-D image;
and step S5, forming a three-dimensional semantic map: and fusing the target with semantic information obtained in the step S2 with the indoor three-dimensional map obtained in the step S3 through coordinate position conversion, and carrying out assignment and labeling on the map by using a label to form the indoor environment three-dimensional semantic map.
In a preferred embodiment, the specific steps of step S1 are as follows:
the user can scan the indoor environment by holding the equipment with the RGB-D sensor or by the mobile robot with the RGB-D sensor to obtain continuous RGB-D images.
In a preferred embodiment, the target detection method in step S2 is YOLOv 3.
In a preferred embodiment, the step S3 uses a parallelized real-time depth image restoration algorithm based on the CUDA technique.
In a preferred embodiment, the step S4 employs a modified three-dimensional reconstruction BundleFusion algorithm.
Example 2
The invention provides an indoor three-dimensional semantic map construction system in a second aspect, which comprises a data acquisition module, a three-dimensional dense reconstruction module and a semantic fusion dense reconstruction module;
the data acquisition module acquires color depth RGB-D image information of an indoor environment and divides the color depth RGB-D image information into an RGB image and a depth image; respectively carrying out RGB image target detection/semantic segmentation and CUDA depth image restoration;
the three-dimensional dense reconstruction module performs corresponding relation matching between frames on the input aligned color and depth data streams, then performs global pose optimization, corrects the overall drift, and keeps the model in a continuously dynamic updating state in the whole reconstruction process;
the semantic fusion dense reconstruction module is used for carrying out target detection or semantic segmentation on the image acquired by the camera, integrating the semantic result of the obtained image into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayes updating, and realizing the construction of an indoor scene three-dimensional semantic map facing the service robot.
In a preferred scheme, the CUDA depth image restoration method specifically includes the following steps:
the invalid points on each depth image are filtered using equation (1).
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs that the standard quantity is calculated by the formula (2);
the weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
in the formula:is the standard deviation of a spatial gaussian function,is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
In a preferred embodiment, said three-dimensional dense reconstruction module,
in the aspect of matching, a coarse-fine parallel global optimization method is used; using sparse SIFT feature points to perform rough registration, and then using dense luminosity and geometric constraint to perform finer registration;
in the aspect of position and attitude optimization, a layered local-to-global optimization method is used, the method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local position and attitude optimization is carried out on all frames in the chunk; on the second layer, only all the chunk key frames are used for mutual correlation and then global optimization; the method has the advantages that the key frames can be separated, and the storage and the data to be processed are reduced;
in the aspect of dense scene reconstruction, reconstruction errors caused by accumulated drift or calculation in the featureless region are corrected based on the attitude estimation.
Example 3
The embodiment of the invention provides a detailed flow diagram of an indoor scene three-dimensional semantic map construction method. The method mainly comprises three modules of data acquisition, three-dimensional dense reconstruction, semantic fusion dense reconstruction and the like.
Where the data collection uses RGB-D sensors, embodiments of the invention may scan the indoor environment with a user holding a depth sensor equipped device (e.g., KinectV2) or with a mobile robot equipped with a depth sensor, collecting continuous image data. The RGB-D image data includes an RGB color image and a depth image. The depth image can directly reflect real three-dimensional environment information, as shown in fig. 3. Due to the fact that self equipment, the surface material of an object, the region shielding and the like exist, a large number of invalid regions such as black edges and black holes exist in an original depth image generated by the Kinect, and the use of the depth image is greatly influenced. The embodiment of the invention uses a parallel real-time depth image restoration algorithm based on the CUDA technology to realize real-time and effective restoration of the depth image on the mobile robot.
In the embodiment, in order to parallelize the image restoration program, the image is divided, the size of the depth image of the Kinect v2 is 512 × 424, 12 lines of pixels above and below the image are omitted, 32 × 20 is used as a block, grid of 16 × 20 is formed, the grid is uploaded to a GPU after the image division is completed, and the image restoration program is executed in parallel by the GPU, and invalid points on each image are filtered by using a formula (1).
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs the standard quantity calculated by the formula (2).
The weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
in the formula:is the standard deviation of a spatial gaussian function,is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
The three-dimensional dense reconstruction module in fig. 2 is mainly completed based on a BundleFusion algorithm, and according to the embodiment of the invention, invalid point repairing processing is firstly performed on the acquired original depth image so as to solve the problem that the matching error of the key point is accumulated due to the existence of noise in the sensor. And then carrying out corresponding relation matching between frames on the input aligned color and depth data streams, then carrying out global pose optimization, correcting the overall drift, and keeping the model in a continuously dynamic updating state in the whole reconstruction process.
In the aspect of matching, a coarse-to-fine parallel global optimization method is used. First a coarser registration is performed using sparse SIFT feature points, and then a finer registration is performed using dense photometric and geometric constraints.
In terms of pose optimization, a hierarchical local-to-global optimization method is used. The method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local pose optimization is carried out on all frames in the chunk. On the second level, only all the chunk's key frames are used for inter-correlation and then global optimization. The method has the advantages of being capable of separating out key frames and reducing storage and data to be processed.
In terms of dense scene reconstruction, the key point is the symmetric update of the model: if an updated frame estimate is to be added, the old frame is removed and then re-integrated at the new pose. Based on the method, reconstruction errors caused by accumulated drift or calculation in the featureless area can be corrected as long as better attitude estimation is carried out, so that the model is more and more accurate.
The semantic information in the semantic fusion dense reconstruction module in fig. 2 can be obtained by a target detection or semantic segmentation method. Benefiting from the development of deep learning in recent years, the computer vision field obtains a plurality of remarkable achievements, wherein the achievements comprise target detection and semantic segmentation of images, a better target detection algorithm is a YOLO series, and can meet the requirement of a real-time detection task, wherein the YOLOv3 balances speed and precision by changing the size of a model structure; the average precision of the better semantic segmentation method Deeplabv3 reaches 85.2 percent. The algorithms are used for carrying out target detection or semantic segmentation on the images acquired by the camera, and the semantic results of the obtained images are integrated into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayesian update, so that the construction of an indoor scene three-dimensional semantic map facing a service robot is realized.
The method comprises the steps of scanning the surrounding environment of the indoor scene by using an RGB-D sensor to establish a three-dimensional scene map, and acquiring semantic information (walls, doors and windows, the ground, various furniture and the like) which enables a robot to automatically understand the surrounding environment by using a deep learning algorithm, so that the three-dimensional semantic map of the indoor scene is constructed finally; the method has important significance for the home service robot to really understand the surrounding environment and achieve the real purpose of intelligent semantic perception, and has important reference value for acquiring scene three-dimensional information for emerging applications such as indoor augmented reality and three-dimensional indoor design.
The terms describing positional relationships in the drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (8)
1. An indoor three-dimensional semantic map construction method is characterized by comprising the following steps:
s1, data acquisition; collecting color depth RGB-D image information of an indoor environment by using an RGB-D sensor, wherein the color depth RGB-D image information comprises an RGB image and a depth image;
s2, obtaining semantic information: carrying out target detection or semantic segmentation on the acquired two-dimensional RGB image by using a deep learning algorithm to obtain corresponding semantic information;
s3, repairing the depth image;
s4, constructing an indoor environment three-dimensional map: constructing a three-dimensional map by using the repaired indoor environment RGB-D image;
s5, forming a three-dimensional semantic map: and fusing the target with semantic information obtained in the step S2 with the indoor three-dimensional map obtained in the step S4 through coordinate position conversion, and carrying out assignment and labeling on the map by using a label to form the indoor environment three-dimensional semantic map.
2. The indoor three-dimensional semantic map construction method according to claim 1, wherein the specific steps of the step S1 are as follows:
the user can scan the indoor environment by holding the equipment with the RGB-D sensor or by the mobile robot with the RGB-D sensor to obtain continuous RGB-D images.
3. The indoor three-dimensional semantic map construction method according to claim 2, wherein the target detection method in the step S2 is YOLOv 3.
4. The indoor three-dimensional semantic map construction method according to claim 3, wherein the step S3 uses a parallelized real-time depth image restoration algorithm based on CUDA technology.
5. The indoor three-dimensional semantic map construction method according to claim 3, wherein the step S4 adopts a modified three-dimensional reconstruction Bundlefusion algorithm.
6. An indoor three-dimensional semantic map construction system based on the method of claims 1-5, which is characterized by comprising a data acquisition module, a three-dimensional dense reconstruction module and a semantic fusion dense reconstruction module;
the data acquisition module acquires color depth RGB-D image information of an indoor environment and divides the color depth RGB-D image information into an RGB image and a depth image; respectively carrying out RGB image target detection/semantic segmentation and CUDA depth image restoration;
the three-dimensional dense reconstruction module performs corresponding relation matching between frames on the input aligned color and depth data streams, then performs global pose optimization, corrects the overall drift, and keeps the model in a continuously dynamic updating state in the whole reconstruction process;
the semantic fusion dense reconstruction module is used for carrying out target detection or semantic segmentation on the image acquired by the camera, integrating the semantic result of the obtained image into three-dimensional dense point cloud reconstruction through a fusion algorithm based on Bayes updating, and realizing the construction of an indoor scene three-dimensional semantic map facing the service robot.
7. The indoor three-dimensional semantic map construction system according to claim 6, wherein the CUDA depth image restoration comprises the following specific steps:
the invalid points on each depth image are filtered using equation (1).
In the formula: i isdestIs a restored image IsrcFor the original image, ω (i, j) is the weight of the filter at point (i, j), ΩinvAs an area of invalid points on the image, omeganIs a neighborhood of pixels, omega, with invalid points removedpIs that the standard quantity is calculated by the formula (2);
the weight ω (i, j) is linearly related to the spatial domain and the value domain of the pixel point at the same time, the closer the distance is, the smaller the pixel value change is, the higher the correlation is, and the filter kernel function is defined as follows:
in the formula:is the standard deviation of a spatial gaussian function,is the standard deviation of the value domain gaussian function, x, y are the abscissa of the pixel within the filter window, I, j are the pixel coordinates of the invalid point currently being processed, I represents the value of a certain pixel on the depth image.
8. The indoor three-dimensional semantic map building system according to claim 6, wherein the three-dimensional dense reconstruction module,
in the aspect of matching, a coarse-fine parallel global optimization method is used; using sparse SIFT feature points to perform rough registration, and then using dense luminosity and geometric constraint to perform finer registration;
in the aspect of position and attitude optimization, a layered local-to-global optimization method is used, the method is divided into two layers in total, on the lowest layer, each continuous 10 frames form a chunk, the first frame is used as a key frame, and then local position and attitude optimization is carried out on all frames in the chunk; on the second layer, only all the chunk key frames are used for mutual correlation and then global optimization;
in the aspect of dense scene reconstruction, reconstruction errors caused by accumulated drift or calculation in the featureless region are corrected based on the attitude estimation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010108398.6A CN111340939B (en) | 2020-02-21 | 2020-02-21 | Indoor three-dimensional semantic map construction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010108398.6A CN111340939B (en) | 2020-02-21 | 2020-02-21 | Indoor three-dimensional semantic map construction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111340939A true CN111340939A (en) | 2020-06-26 |
CN111340939B CN111340939B (en) | 2023-04-18 |
Family
ID=71187107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010108398.6A Active CN111340939B (en) | 2020-02-21 | 2020-02-21 | Indoor three-dimensional semantic map construction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111340939B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113447012A (en) * | 2021-05-10 | 2021-09-28 | 天津大学 | Service robot 2D semantic map generation method and device based on deep learning |
CN113467267A (en) * | 2021-07-28 | 2021-10-01 | 珠海格力电器股份有限公司 | Control method of intelligent home system and intelligent home system |
WO2022021661A1 (en) * | 2020-07-27 | 2022-02-03 | 深圳大学 | Gaussian process-based visual positioning method, system, and storage medium |
CN114494267A (en) * | 2021-11-30 | 2022-05-13 | 北京国网富达科技发展有限责任公司 | Substation and cable tunnel scene semantic construction system and method |
CN116311023A (en) * | 2022-12-27 | 2023-06-23 | 广东长盈科技股份有限公司 | Equipment inspection method and system based on 5G communication and virtual reality |
CN117132727A (en) * | 2023-10-23 | 2023-11-28 | 光轮智能(北京)科技有限公司 | Map data acquisition method, computer readable storage medium and electronic device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017079918A1 (en) * | 2015-11-11 | 2017-05-18 | 中国科学院深圳先进技术研究院 | Indoor scene scanning reconstruction method and apparatus |
WO2018129715A1 (en) * | 2017-01-13 | 2018-07-19 | 浙江大学 | Simultaneous positioning and dense three-dimensional reconstruction method |
CN109658449A (en) * | 2018-12-03 | 2019-04-19 | 华中科技大学 | A kind of indoor scene three-dimensional rebuilding method based on RGB-D image |
CN110243370A (en) * | 2019-05-16 | 2019-09-17 | 西安理工大学 | A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning |
-
2020
- 2020-02-21 CN CN202010108398.6A patent/CN111340939B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017079918A1 (en) * | 2015-11-11 | 2017-05-18 | 中国科学院深圳先进技术研究院 | Indoor scene scanning reconstruction method and apparatus |
WO2018129715A1 (en) * | 2017-01-13 | 2018-07-19 | 浙江大学 | Simultaneous positioning and dense three-dimensional reconstruction method |
CN109658449A (en) * | 2018-12-03 | 2019-04-19 | 华中科技大学 | A kind of indoor scene three-dimensional rebuilding method based on RGB-D image |
CN110243370A (en) * | 2019-05-16 | 2019-09-17 | 西安理工大学 | A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022021661A1 (en) * | 2020-07-27 | 2022-02-03 | 深圳大学 | Gaussian process-based visual positioning method, system, and storage medium |
CN113447012A (en) * | 2021-05-10 | 2021-09-28 | 天津大学 | Service robot 2D semantic map generation method and device based on deep learning |
CN113467267A (en) * | 2021-07-28 | 2021-10-01 | 珠海格力电器股份有限公司 | Control method of intelligent home system and intelligent home system |
CN114494267A (en) * | 2021-11-30 | 2022-05-13 | 北京国网富达科技发展有限责任公司 | Substation and cable tunnel scene semantic construction system and method |
CN116311023A (en) * | 2022-12-27 | 2023-06-23 | 广东长盈科技股份有限公司 | Equipment inspection method and system based on 5G communication and virtual reality |
CN117132727A (en) * | 2023-10-23 | 2023-11-28 | 光轮智能(北京)科技有限公司 | Map data acquisition method, computer readable storage medium and electronic device |
CN117132727B (en) * | 2023-10-23 | 2024-02-06 | 光轮智能(北京)科技有限公司 | Map data acquisition method, computer readable storage medium and electronic device |
Also Published As
Publication number | Publication date |
---|---|
CN111340939B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111340939B (en) | Indoor three-dimensional semantic map construction method | |
CN111798475B (en) | Indoor environment 3D semantic map construction method based on point cloud deep learning | |
US10127670B2 (en) | Computer vision systems and methods for detecting and modeling features of structures in images | |
CN107292234B (en) | Indoor scene layout estimation method based on information edge and multi-modal features | |
WO2021004416A1 (en) | Method and apparatus for establishing beacon map on basis of visual beacons | |
CN112634451A (en) | Outdoor large-scene three-dimensional mapping method integrating multiple sensors | |
CN114365201A (en) | Structural annotation | |
CN112396595B (en) | Semantic SLAM method based on point-line characteristics in dynamic environment | |
CN111860651B (en) | Monocular vision-based semi-dense map construction method for mobile robot | |
CN111462210A (en) | Monocular line feature map construction method based on epipolar constraint | |
CN112734765A (en) | Mobile robot positioning method, system and medium based on example segmentation and multi-sensor fusion | |
CN115272596A (en) | Multi-sensor fusion SLAM method oriented to monotonous texture-free large scene | |
CN112396656A (en) | Outdoor mobile robot pose estimation method based on fusion of vision and laser radar | |
CN114332394A (en) | Semantic information assistance-based dynamic scene three-dimensional reconstruction method | |
CN111998862A (en) | Dense binocular SLAM method based on BNN | |
CN116619358A (en) | Self-adaptive positioning optimization and mapping method for autonomous mining robot | |
Yin et al. | CoMask: Corresponding mask-based end-to-end extrinsic calibration of the camera and LiDAR | |
Yu et al. | Accurate and robust visual localization system in large-scale appearance-changing environments | |
Zhou et al. | A state-of-the-art review on SLAM | |
Zhang et al. | Accurate real-time SLAM based on two-step registration and multimodal loop detection | |
Zhao et al. | A review of visual SLAM for dynamic objects | |
Zhang et al. | Hybrid iteration and optimization-based three-dimensional reconstruction for space non-cooperative targets with monocular vision and sparse lidar fusion | |
CN108491826A (en) | A kind of extraction method of remote sensing image building | |
CN113744397B (en) | Real-time object-level semantic map construction and updating method and device | |
Chen et al. | An improved Snake model for refinement of LiDAR-derived building roof contours using aerial images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |