CN115421158A - Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device - Google Patents
Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device Download PDFInfo
- Publication number
- CN115421158A CN115421158A CN202211387608.5A CN202211387608A CN115421158A CN 115421158 A CN115421158 A CN 115421158A CN 202211387608 A CN202211387608 A CN 202211387608A CN 115421158 A CN115421158 A CN 115421158A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- dimensional
- semantic
- state
- self
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01C—MEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
- G01C21/00—Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
- G01C21/38—Electronic maps specially adapted for navigation; Updating thereof
- G01C21/3804—Creation or updating of map data
- G01C21/3833—Creation or updating of map data characterised by the source of data
- G01C21/3841—Data obtained from two or more sources, e.g. probe vehicles
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/521—Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Remote Sensing (AREA)
- Radar, Positioning & Navigation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Electromagnetism (AREA)
- Computer Networks & Wireless Communication (AREA)
- Automation & Control Theory (AREA)
- Optics & Photonics (AREA)
- Optical Radar Systems And Details Thereof (AREA)
Abstract
The invention discloses a solid-state laser radar three-dimensional semantic map building method and device for self-supervised learning. The light-weight laser radar/inertia combined mapping algorithm based on Kalman filtering carries out scene three-dimensional reconstruction, and the three-dimensional point cloud map is semantically segmented through the point cloud semantic segmentation model, so that the semantic segmentation effect on the three-dimensional coordinate point cloud is good, and the point cloud segmentation understanding capability and the semantic segmentation precision are improved.
Description
Technical Field
The invention relates to the technical field of three-dimensional point cloud semantic segmentation, in particular to a solid-state laser radar three-dimensional semantic mapping method and device for self-supervised learning.
Background
Semantic segmentation is a typical computer vision problem that involves taking some raw data (e.g., flat images) as input and converting them into segmented regions with semantic meaning, forming a semantic map that facilitates understanding by robots and humans. The appearance of lightweight solid-state lidar has reduced lidar's cost and weight size, and is different with rotation type lidar, and small-size solid-state lidar's the angle of vision is less, and the scanning mode is irregular. The invention relates to a method and a device for constructing a graph aiming at dense three-dimensional point cloud semantics of a light and small solid-state laser radar. The method and apparatus may be used for mobile robots as well as hand-held devices.
Along with the rapid popularization of robots, the application of logistics distribution, household robots, medical robots and the like requires that the robots can independently navigate and establish images in various indoor and outdoor complex environments. The execution of the robot task needs to have the capabilities of autonomous navigation, map building and scene understanding. The laser radar can directly acquire information such as three-dimensional geometric position and reflection intensity of a target, the measuring distance can reach hundreds of meters, and the measuring precision is high. However, it is not enough for an indoor robot autonomous navigation to obtain only a three-dimensional point cloud map. The human body can distinguish roads, walls and other sundries from the point cloud map, but for the robot, the point cloud map in the eyes is a group of points with unknown meanings. And when the robot is required to effectively identify objects and regions with semantic meanings in the point cloud, semantic segmentation and image building are required. In recent years, the rapid increase of computational power enables the three-dimensional point cloud semantic segmentation algorithm based on deep learning to rapidly develop and improve the performance, and the research on the aspect also becomes a research hotspot in recent years.
The three-dimensional semantic mapping can be divided into two steps of semantic segmentation understanding and three-dimensional dense mapping of three-dimensional point cloud. In recent years, relevant research results focus on two aspects of three-dimensional point cloud mapping and point cloud semantic segmentation. In the aspect of three-dimensional point cloud mapping, zhang et al provides a laser radar odometer and mapping algorithm based on batch optimization, and real-time three-dimensional point cloud mapping is realized. Lin et al further proposes a positioning and mapping algorithm suitable for small solid-state laser radars, and realizes feature extraction under a limited field angle and motion compensation and point cloud matching under an irregular sampling condition. And Xu et al, fusing the point cloud characteristics and inertial vision of the laser radar by adopting extended Kalman filtering, and realizing positioning and mapping with higher efficiency and robustness. In the aspect of three-dimensional point cloud matching, qi et al. And (3) projecting the point cloud into a strip picture, performing semantic segmentation on the picture and mapping the point cloud back to realize the construction and segmentation of the point cloud, wherein the algorithm is only suitable for the rotary laser radar. At present, a three-dimensional dense semantic mapping method for small solid-state lidar does not exist.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a solid-state laser radar three-dimensional semantic mapping method and device for self-supervision learning, which can be applied to mobile robots and handheld devices. The method constructs a coding-decoding deep neural network, semantically segments real-time input three-dimensional point cloud based on a self-supervision learning mode, and adopts Kalman filtering to fuse laser radar point cloud data, semantic segmentation results and inertial data to construct a map so as to realize scene three-dimensional reconstruction. In addition, the reflectivity of the laser radar signal is introduced into a point cloud segmentation network of self-supervision learning, and therefore the point cloud segmentation understanding capability is effectively improved.
In order to achieve the purpose, the invention provides a solid-state laser radar three-dimensional semantic mapping method for self-supervision learning, which comprises the following steps:
step 1, constructing a point cloud semantic segmentation model of a coding-decoding structure, and carrying out self-supervision training on the point cloud semantic segmentation model based on a synchronously acquired RGB image and a three-dimensional point cloud data set;
step 2, collecting real-time three-dimensional point cloud and real-time inertia data through a small solid-state laser radar and an inertia measurement element, fusing the real-time three-dimensional point cloud and the real-time inertia data based on a Kalman filter, and outputting a dense three-dimensional point cloud map composed of the real-time point cloud data under global coordinates;
step 3, inputting the real-time three-dimensional point cloud into the trained point cloud semantic segmentation model to obtain a point cloud segmentation result corresponding to each point cloud coordinate at the current moment, and corresponding the point cloud segmentation result corresponding to each point cloud coordinate with the dense three-dimensional point cloud map coordinate to generate a three-dimensional semantic map;
and 4, deploying the trained point cloud semantic segmentation model on a computing device with an ARM + GPU architecture, constructing a semantic map building system together with a small solid-state laser radar and an inertia measurement element, performing three-dimensional point cloud and inertia data acquisition and map updating according to fixed frequency, performing real-time semantic segmentation, and generating a three-dimensional semantic map. Because the point cloud segmentation network is trained, the RGB images do not need to be synchronously acquired.
In order to achieve the above object, the present invention further provides a solid-state lidar three-dimensional semantic graph building apparatus for self-supervised learning, comprising:
the system comprises an RGB camera, a laser radar and an inertia measuring device, wherein the RGB camera, the laser radar and the inertia measuring device are respectively used for collecting RGB images, three-dimensional point cloud data and inertia data;
the data acquisition module is connected with the RGB camera, the laser radar and the inertia measurement device and is used for acquiring RGB images, three-dimensional point cloud data and inertia data;
the point cloud semantic segmentation model is connected with the data acquisition module and is used for performing point cloud semantic segmentation on the three-dimensional point cloud to obtain a point cloud segmentation result corresponding to each point cloud coordinate;
the self-supervision training module is connected with the data acquisition module and the point cloud semantic segmentation model and is used for carrying out self-supervision training on the point cloud semantic segmentation model according to the synchronously acquired RGB image and the three-dimensional point cloud data set;
the three-dimensional point cloud map building module is connected with the data acquisition module and used for fusing the real-time three-dimensional point cloud and the real-time inertial data and outputting a dense three-dimensional point cloud map consisting of the real-time point cloud data under the global coordinate;
and the three-dimensional semantic map building module is connected with the point cloud semantic division model and the three-dimensional point cloud map building module and is used for generating a three-dimensional semantic map by corresponding the point cloud division result corresponding to each point cloud coordinate with the dense three-dimensional point cloud map coordinate.
The invention provides a self-supervised learning solid-state laser radar three-dimensional semantic mapping method and device. In addition, the reflectivity of the laser radar signal is introduced into the point cloud segmentation network, so that the point cloud segmentation understanding capability is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a flow chart of a solid-state lidar three-dimensional semantic mapping method in an embodiment of the invention;
fig. 2 is a block diagram of a solid-state lidar three-dimensional semantic graph establishing device in the embodiment of the invention.
The implementation, functional features and advantages of the present invention will be further described with reference to the accompanying drawings.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
In addition, the descriptions related to "first", "second", etc. in the present invention are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of the technical solutions by those skilled in the art, and when the technical solutions are contradictory to each other or cannot be realized, such a combination of the technical solutions should not be considered to exist, and is not within the protection scope of the present invention.
Example 1
Fig. 1 shows a method for building a three-dimensional semantic map of a solid-state lidar for self-supervised learning, which includes the following steps:
step 1, constructing a point cloud semantic segmentation model of a coding-decoding structure, and performing self-supervision training on the point cloud semantic segmentation model based on a synchronously acquired RGB image and a three-dimensional point cloud data set;
step 2, collecting real-time three-dimensional point cloud and real-time Inertial data through a small solid-state laser radar and an Inertial Measurement Unit (IMU), fusing the real-time three-dimensional point cloud and the real-time Inertial data based on a Kalman filter, and outputting a dense three-dimensional point cloud map composed of the real-time point cloud data under the global coordinate;
step 3, inputting the real-time three-dimensional point cloud into the trained point cloud semantic segmentation model to obtain a point cloud segmentation result corresponding to each point cloud coordinate at the current moment, and corresponding the point cloud segmentation result corresponding to each point cloud coordinate with the dense three-dimensional point cloud map coordinate to generate a three-dimensional semantic map;
and 4, deploying the trained point cloud semantic segmentation model on a computing device with an ARM + GPU architecture, constructing a semantic map building system together with a small solid-state laser radar and an inertia measurement element, performing three-dimensional point cloud and inertia data acquisition and map updating according to the 50Hz frequency, performing real-time semantic segmentation, and generating a three-dimensional semantic map.
In this embodiment, the point cloud semantic segmentation model includes an encoding layer, a decoding layer, and a semantic prediction layer. The point cloud semantic segmentation model is input asThe point cloud set of (a) is,the number of the dots is the number of the dots,for the characteristic dimension of each point, 4 dimensions in this embodiment, including three-dimensional coordinatesAnd reflectivity。
The coding layer adopts a 3-layer coder and is mainly used for carrying out feature extraction on input three-dimensional point clouds, sequentially processing point cloud data, reducing the number of the point clouds, increasing the dimension of each point and finally counting the number of the point cloudsIs reduced toAnd promoting the dimension of the point cloud feature from 4 dimensions to 128 dimensions. The encoder consists of a local feature aggregator and a random sampling layer, and specifically comprises:
the local feature aggregator consists of a local space encoder and an attention mechanism and aims to expand the receptive field of each point so as to extract more effective features;
the random sampling layer is used for accelerating the extraction of point features and improving the operation efficiency.
And the decoding layer designs a decoder with 3 layers, the nearest K adjacent points are found for each input point by adopting a K nearest neighbor algorithm (K nearest neighbors are found to represent) through the encoder of each layer, and the point cloud characteristic set is up-sampled by a nearest neighbor difference algorithm. Finally, the features obtained by the up-sampling are connected with the features obtained by the encoder to obtain the feature vector output by the decoding layer。
The semantic prediction layer is used for mapping the feature vector obtained by the decoding layer to the full-link layer and the linear occipital flow function ReLU,Are a category of semantics, in commonThe class of the user is a generic class,the number of the middle points of the three-dimensional point cloud is shown.
In the specific implementation process, the self-supervision training of the point cloud semantic segmentation model specifically comprises the following steps:
step 1.1, corresponding three-dimensional point cloud coordinates and RGB image pixels one by one through a geometric relationship, obtaining a semantic segmentation result of each pixel in an RGB image by adopting a trained RGB image segmentation model, and obtaining a point cloud semantic label through a corresponding relationship between the three-dimensional point cloud coordinates and the RGB image pixels, wherein the label obtained through the corresponding relationship of the RGB image is defined as an automatic supervision semantic label in the embodiment;
step 1.2, point characteristics output by point cloud segmentation networkComparing with the self-supervision semantic labels obtained through the RGB image corresponding relation, and iteratively adjusting parameters of each level of the point cloud semantic segmentation model to enable semantic results output by the point cloud semantic segmentation model to approach the self-supervision semantic labels, wherein the process of iteratively adjusting the parameters is self-supervision learning training;
and step 1.3, after iteration for a certain number of times, a model with an output score closer to the self-supervision semantic label is left, so that a point cloud semantic segmentation model is formed.
It should be noted that the RGB image is only used for self-supervised training of the point cloud semantic segmentation model, and the trained point cloud semantic segmentation model can perform point cloud semantic segmentation only by inputting point cloud data, so as to obtain a point cloud segmentation result corresponding to each point cloud coordinate at the current time.
In this embodiment, the map building process of the dense three-dimensional point cloud map specifically includes:
step 2.1, performing inertial integration on the real-time inertial data to obtain a first initial state quantity (including position, attitude, speed and IMU error quantity) of the system, wherein the implementation process comprises the following steps:
in the formulaShow about、、(Here, theIs 0) is calculated as a function of the integral of inertia,representing an exponential function, the combination of which represents the inertial integration state update process;i.e. two adjacent IMU sample times in a radar scanAndthe difference between the two (IMU sampling interval),is the label of the ith IMU measurement data sample.Is measurement data of the IMU (i.e. IMU)Inertial data representing the ith IMU sample);indicating the state of the system in which, among other things,the presentation system is inThe first state quantity at the sub-sampling instant,the presentation system is inA first state quantity at the sub-sampling time;
step 2.2, based on the real-time three-dimensional point cloud, performing state updating on the first state quantity by adopting Kalman filtering to obtain a second state quantity of the system, wherein the implementation process comprises the following steps:
firstly, calculating the point cloud residual error of the laser radar three-dimensional point cloudThe method comprises the following steps:
in the formula (I), the compound is shown in the specification,is the location of the nearest feature point on the map,is a pointThe normal vector (or edge orientation) of the respective plane (or edge) on which it lies,is the IMU estimates the point cloud location;
iterative updating is carried out on the state estimation by adopting an iterative Kalman filter until the point cloud residual errorAnd converging, wherein the process of iteratively updating the state estimation comprises the following steps:
in the formulaIs shown inTime (Is shown asScan end time of sub lidar scan) toState after sub-Kalman filteringIs used to generate a value of (a) to (b),is shown inAt the first momentState after sub-Kalman filteringThe generated value of (a) is,Ithe unit matrix is represented by a matrix of units,which represents the observation matrix, is shown,to representAboutThe partial derivative of (a) of (b),in order to be an exponential function of the,in the form of a function of a logarithm,representState of the momentTrue value ofAnd generating a valueA dynamic model of the state of error between,representing a generated valueAnd the estimated valueAn error therebetween;
in the formula (I), the compound is shown in the specification,a covariance matrix is represented by a matrix of covariance,is shown asThe IMU state covariance at the sub-sampling instant,denotes the firstThe IMU state covariance at the sub-sampling instant,representing a covariance matrixSum partial derivative matrixThe intermediate variable that is generated is,andrespectively representTo pairAndthe partial derivative matrix of (a) is,representing IMU noiseCovariance of (2), superscriptRepresents a transpose of a matrix;
in the above iterative update of the state estimate, the cloud residual is present at the pointAfter convergence, an optimal state estimate, i.e. a second state quantity, can be obtained as follows:
in the formula (I), the compound is shown in the specification,the presentation system is inA second state quantity at the sub-sampling time;
and 2.3, updating the reverse state of the second state quantity to obtain a third state quantity of the system so as to optimize the estimation of the position and the attitude and improve the positioning accuracy, wherein the implementation process of the reverse state updating is as follows:
in the formula (I), the compound is shown in the specification,the presentation system is inA third quantity of states at the sub-sampling instants,the presentation system is inA third state quantity at the sub-sampling time;
step 2.4, obtaining a conversion matrix from the laser radar coordinate system to the IMU coordinate system based on the third state quantityAndIMU coordinate system of time of dayTo a global coordinate systemIs estimated to update the transformation matrixAnd pass throughAndconverting point cloud coordinates of each frame of self coordinate system in one scanning of the laser radar into coordinates of a coordinate system at the scanning end time to obtain global coordinates, wherein the global coordinates are as follows:
in the formula (I), the compound is shown in the specification,representing the coordinates of the laser point cloud,to representThe frame of the laser radar is scanned for the second time,a global coordinate system is represented, and,is shown asThe number of the characteristic points is one,is shown asIs characterized byThe coordinates of the sub-scan lidar frame,is shown asGlobal coordinates of the feature points after coordinate conversion;a coordinate transformation matrix representing the coordinate system from the lower right corner mark to the upper left corner mark,the coordinate system of the IMU is represented,which represents the laser radar coordinate system and which,a transformation matrix representing the lidar coordinate system to the IMU coordinate system,to representIMU coordinate system of time of dayTo the global coordinate systemUpdating the transformation matrix;the number of point clouds;
and 2.5, adding all the feature points at each time to the existing map according to the global coordinate to obtain the three-dimensional point cloud map under the global coordinate system.
In the global mapping process of step 2.5, each point cloud has a fixed serial number and a feature quantity, and the feature quantity is a point cloud segmentation result obtained by the semantic segmentation model in step 3. And (3) corresponding the local point cloud semantic segmentation result obtained in the step (3) with the generated global three-dimensional point cloud map coordinate, thus obtaining a global three-dimensional semantic map.
Example 2
On the basis of the solid-state laser radar three-dimensional semantic map building method for the self-supervised learning in the embodiment 1, the embodiment also discloses a solid-state laser radar three-dimensional semantic map building device for the self-supervised learning, and referring to fig. 2, the device mainly comprises an RGB camera, a laser radar, an inertia measuring device, a data acquisition module, a point cloud semantic segmentation model, a self-supervised training module, a three-dimensional point cloud map building module and a three-dimensional semantic map building module. Specifically, the method comprises the following steps:
the RGB camera, the laser radar and the inertia measuring device are carried on a carrier such as a mobile robot or handheld equipment and are respectively used for collecting RGB images, three-dimensional point cloud data and inertia data;
the data acquisition module is connected with the RGB camera, the laser radar and the inertia measurement device and is used for acquiring the acquired RGB image, the three-dimensional point cloud data and the inertia data;
the point cloud semantic segmentation model is connected with the data acquisition module and is used for performing point cloud semantic segmentation on the three-dimensional point cloud to obtain a point cloud segmentation result corresponding to each point cloud coordinate;
the self-supervision training module is connected with the data acquisition module and the point cloud semantic segmentation model and is used for carrying out self-supervision training on the point cloud semantic segmentation model according to the synchronously acquired RGB image and the three-dimensional point cloud data set;
the three-dimensional point cloud map building module is connected with the data acquisition module and used for fusing the real-time three-dimensional point cloud and the real-time inertial data and outputting a dense three-dimensional point cloud map consisting of the real-time point cloud data under the global coordinate;
the three-dimensional semantic map building module is connected with the point cloud semantic segmentation model and the three-dimensional point cloud map building module and is used for enabling the point cloud segmentation result corresponding to each point cloud coordinate to correspond to the dense three-dimensional point cloud map coordinate to generate the three-dimensional semantic map.
In a specific application process, each functional module of the solid-state lidar three-dimensional semantic map building device is the same as that in embodiment 1, and therefore, the description thereof is omitted in this embodiment.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention, and all modifications and equivalents of the present invention, which are made by the contents of the present specification and the accompanying drawings, or directly/indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (9)
1. A solid-state laser radar three-dimensional semantic mapping method for self-supervision learning is characterized by comprising the following steps:
step 1, constructing a point cloud semantic segmentation model of a coding-decoding structure, and carrying out self-supervision training on the point cloud semantic segmentation model based on a synchronously acquired RGB image and a three-dimensional point cloud data set;
step 2, collecting real-time three-dimensional point cloud and real-time inertia data through a small solid-state laser radar and an inertia measurement element, fusing the real-time three-dimensional point cloud and the real-time inertia data based on a Kalman filter, and outputting a dense three-dimensional point cloud map composed of the real-time point cloud data under global coordinates;
step 3, inputting the real-time three-dimensional point cloud into the trained point cloud semantic segmentation model to obtain a point cloud segmentation result corresponding to each point cloud coordinate at the current moment, and corresponding the point cloud segmentation result corresponding to each point cloud coordinate with the dense three-dimensional point cloud map coordinate to generate a three-dimensional semantic map;
and 4, deploying the trained point cloud semantic segmentation model on computing equipment with an ARM + GPU architecture, constructing a semantic map building system together with a small solid-state laser radar and an inertia measurement element, carrying out three-dimensional point cloud, inertia data acquisition and map updating according to fixed frequency, carrying out real-time semantic segmentation, and generating a three-dimensional semantic map.
2. The self-supervised learning solid-state lidar three-dimensional semantic mapping method according to claim 1, wherein in the step 1, the point cloud semantic segmentation model comprises:
the encoding layer is used for extracting the characteristics of the input three-dimensional point cloud;
the decoding layer is used for finding K nearest adjacent points for each input point on the basis of the characteristics extracted by the coding layer, then carrying out up-sampling on the point cloud characteristic set through a nearest neighbor difference algorithm, and connecting the characteristics obtained by the up-sampling with the characteristics obtained by the coder to obtain a characteristic vector output by the decoding layer;
a semantic prediction layer for mapping the feature vectors obtained from the decoding layer to the full-link layer and the linear anchor function ReLU,Are a category of semantics, in commonThe class of the user is a generic class,the number of the midpoints of the three-dimensional point cloud is shown.
3. The self-supervised learning solid-state lidar three-dimensional semantic mapping method according to claim 2, wherein in the step 1, the point cloud semantic segmentation model is self-supervised trained, specifically:
step 1.1, corresponding three-dimensional point cloud coordinates and RGB image pixels one by one through a geometric relationship, obtaining a semantic segmentation result of each pixel in an RGB image by adopting a trained RGB image segmentation model, and obtaining a self-supervision semantic label through the corresponding relationship between the three-dimensional point cloud coordinates and the RGB image pixels;
step 1.2, point characteristics output by point cloud segmentation networkComparing with the self-supervision semantic label obtained through the corresponding relation of the RGB image, and enabling the semantic result output by the point cloud semantic segmentation model to approach the self-supervision semantic label by iteratively adjusting the parameters of each level of the point cloud semantic segmentation model;
and step 1.3, after iteration for a certain number of times, a model with an output score closer to the self-supervision semantic label is left, so that a point cloud semantic segmentation model is formed.
4. The self-supervised learning solid-state lidar three-dimensional semantic mapping method according to claim 1, 2 or 3, wherein the step 2 specifically comprises:
step 2.1, performing inertia integration on the real-time inertia data to obtain a first initial state quantity of the system;
2.2, based on the real-time three-dimensional point cloud, performing state updating on the first state quantity by adopting Kalman filtering to obtain a second state quantity;
step 2.3, updating the reverse state of the second state quantity to obtain a third state quantity of the system;
step 2.4, obtaining a conversion matrix from the laser radar coordinate system to an inertial coordinate system and an estimation updating conversion matrix from the inertial coordinate system to a global coordinate system based on the third state quantity, and converting point cloud coordinates of each frame of self coordinate system in one scanning of the laser radar to coordinates of a coordinate system at the last scanning moment to obtain global coordinates;
and 2.5, adding all the feature points at each time to the existing map according to the global coordinate to obtain the dense three-dimensional point cloud map under the global coordinate system.
5. Self-supervision according to claim 4The three-dimensional semantic mapping method for the learned solid-state laser radar is characterized in that in step 2.1, inertial measurement noise is set0, the integral of inertia is:
in the formula (I), the compound is shown in the specification,show about、、Is a function of the integral of the inertia of,representing an exponential operator, the combination of which represents the inertia integral state update process;i.e. two adjacent sampling instants in a radar scanAnd withThe difference between them;is the firstThe labels of the secondary inertial data samples,is shown asReal-time inertial data sampled by the secondary IMU;indicating the state of the system in which, among other things,the presentation system is inThe first state quantity at the sub-sampling instant,the presentation system is inA first state quantity at a sub-sampling instant.
6. The self-supervised learning solid-state lidar three-dimensional semantic mapping method according to claim 5, wherein the step 2.2 is specifically as follows:
firstly, calculating the point cloud residual error of the real-time three-dimensional point cloudThe method comprises the following steps:
in the formula (I), the compound is shown in the specification,is the location of the nearest feature point on the map,is a pointThe normal vector or edge orientation of the respective plane or edge in which it lies,estimating the point cloud position by the IMU;
iterative updating is carried out on the state estimation by adopting an iterative Kalman filter until the point cloud residual errorAnd converging, wherein the process of iteratively updating the state estimation comprises the following steps:
in the formula (I), the compound is shown in the specification,is shown inAt the first momentState after sub-Kalman filteringThe generated value of (a) is,is shown inAt the first momentState after sub-Kalman filteringThe generated value of (a) is,is shown asThe scan end time of the secondary lidar scan,Ithe unit matrix is represented by a matrix of units,which represents the observation matrix, is shown,to representAboutThe partial derivative of (a) of (b),in the form of a function of a logarithm,to representState of the momentTrue value ofAnd generating a valueThe dynamic model of the error state between the two,representing a generated valueAnd the estimated valueThe error between;
in the formula (I), the compound is shown in the specification,a covariance matrix is represented by a matrix of covariance,is shown asThe IMU state covariance at the sub-sampling instant,is shown asThe IMU state covariance at the sub-sampling instant,representing a covariance matrixAnd partial derivative matrixThe intermediate variable that is generated is,andrespectively representTo pairAndthe partial derivative matrix of (a) is,representing IMU noiseCovariance of (2), superscriptRepresents a transpose of a matrix;
in the above iterative update process of the state estimation, when the cloud residual error is in pointAfter convergence, an optimal state estimate, i.e. a second state quantity, can be obtained as follows:
7. The method according to claim 6, wherein in step 2.3, the updating of the reverse state of the second state quantity is specifically:
8. The method for building a three-dimensional semantic map of a solid-state lidar based on self-supervised learning according to claim 4, wherein in step 2.4, converting the point cloud coordinates of the own coordinate system of each frame in a scan of the lidar into the coordinates of the global coordinate system specifically comprises:
in the formula (I), the compound is shown in the specification,representing the coordinates of the laser point cloud,to representThe frame of the laser radar is scanned for the second time,a global coordinate system is represented, and,is shown asThe number of the characteristic points is one,is shown asA characteristic point isThe coordinates of the sub-scan lidar frame,is shown asGlobal coordinates of the feature points after coordinate conversion;a coordinate transformation matrix representing the coordinate system from the lower right corner mark to the upper left corner mark,the inertial coordinate system is represented by a coordinate system,which represents the coordinate system of the laser radar,a transformation matrix representing the lidar coordinate system to an inertial coordinate system,representInertial frame of timeTo a global coordinate systemUpdating the transformation matrix;is the number of lidar point clouds.
9. The utility model provides a solid-state laser radar three-dimensional semantic map building device of self-supervised learning which characterized in that includes:
the system comprises an RGB camera, a laser radar and an inertia measuring device, wherein the RGB camera, the laser radar and the inertia measuring device are respectively used for collecting RGB images, three-dimensional point cloud data and inertia data;
the data acquisition module is connected with the RGB camera, the laser radar and the inertia measurement device and is used for acquiring RGB images, three-dimensional point cloud data and inertia data;
the point cloud semantic segmentation model is connected with the data acquisition module and is used for performing point cloud semantic segmentation on the three-dimensional point cloud to obtain a point cloud segmentation result corresponding to each point cloud coordinate;
the self-supervision training module is connected with the data acquisition module and the point cloud semantic segmentation model and is used for carrying out self-supervision training on the point cloud semantic segmentation model according to the synchronously acquired RGB image and the three-dimensional point cloud data set;
the three-dimensional point cloud map building module is connected with the data acquisition module and used for fusing the real-time three-dimensional point cloud and the real-time inertial data and outputting a dense three-dimensional point cloud map consisting of the real-time point cloud data under the global coordinate;
and the three-dimensional semantic map building module is connected with the point cloud semantic division model and the three-dimensional point cloud map building module and is used for generating a three-dimensional semantic map by corresponding the point cloud division result corresponding to each point cloud coordinate with the dense three-dimensional point cloud map coordinate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211387608.5A CN115421158B (en) | 2022-11-07 | 2022-11-07 | Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211387608.5A CN115421158B (en) | 2022-11-07 | 2022-11-07 | Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115421158A true CN115421158A (en) | 2022-12-02 |
CN115421158B CN115421158B (en) | 2023-04-07 |
Family
ID=84207166
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211387608.5A Active CN115421158B (en) | 2022-11-07 | 2022-11-07 | Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115421158B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115638788A (en) * | 2022-12-23 | 2023-01-24 | 安徽蔚来智驾科技有限公司 | Semantic vector map construction method, computer equipment and storage medium |
CN116229057A (en) * | 2022-12-22 | 2023-06-06 | 之江实验室 | Method and device for three-dimensional laser radar point cloud semantic segmentation based on deep learning |
CN116778162A (en) * | 2023-06-25 | 2023-09-19 | 南京航空航天大学 | Weak supervision large aircraft appearance point cloud semantic segmentation method based on geometric feature guidance |
CN117517864A (en) * | 2023-11-08 | 2024-02-06 | 南京航空航天大学 | Laser radar-based power transmission line near electricity early warning method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018094719A1 (en) * | 2016-11-28 | 2018-05-31 | 深圳市大疆创新科技有限公司 | Method for generating point cloud map, computer system, and device |
CN110097553A (en) * | 2019-04-10 | 2019-08-06 | 东南大学 | The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system |
US10929694B1 (en) * | 2020-01-22 | 2021-02-23 | Tsinghua University | Lane detection method and system based on vision and lidar multi-level fusion |
CN112634451A (en) * | 2021-01-11 | 2021-04-09 | 福州大学 | Outdoor large-scene three-dimensional mapping method integrating multiple sensors |
CN113128591A (en) * | 2021-04-14 | 2021-07-16 | 中山大学 | Rotation robust point cloud classification method based on self-supervision learning |
CN113763423A (en) * | 2021-08-03 | 2021-12-07 | 中国北方车辆研究所 | Multi-mode data based systematic target recognition and tracking method |
CN114898322A (en) * | 2022-06-13 | 2022-08-12 | 中国第一汽车股份有限公司 | Driving environment identification method and device, vehicle and storage medium |
CN114926469A (en) * | 2022-04-26 | 2022-08-19 | 中南大学 | Semantic segmentation model training method, semantic segmentation method, storage medium and terminal |
CN115222919A (en) * | 2022-07-27 | 2022-10-21 | 徐州徐工矿业机械有限公司 | Sensing system and method for constructing color point cloud map of mobile machine |
-
2022
- 2022-11-07 CN CN202211387608.5A patent/CN115421158B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018094719A1 (en) * | 2016-11-28 | 2018-05-31 | 深圳市大疆创新科技有限公司 | Method for generating point cloud map, computer system, and device |
CN110097553A (en) * | 2019-04-10 | 2019-08-06 | 东南大学 | The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system |
US10929694B1 (en) * | 2020-01-22 | 2021-02-23 | Tsinghua University | Lane detection method and system based on vision and lidar multi-level fusion |
CN112634451A (en) * | 2021-01-11 | 2021-04-09 | 福州大学 | Outdoor large-scene three-dimensional mapping method integrating multiple sensors |
CN113128591A (en) * | 2021-04-14 | 2021-07-16 | 中山大学 | Rotation robust point cloud classification method based on self-supervision learning |
CN113763423A (en) * | 2021-08-03 | 2021-12-07 | 中国北方车辆研究所 | Multi-mode data based systematic target recognition and tracking method |
CN114926469A (en) * | 2022-04-26 | 2022-08-19 | 中南大学 | Semantic segmentation model training method, semantic segmentation method, storage medium and terminal |
CN114898322A (en) * | 2022-06-13 | 2022-08-12 | 中国第一汽车股份有限公司 | Driving environment identification method and device, vehicle and storage medium |
CN115222919A (en) * | 2022-07-27 | 2022-10-21 | 徐州徐工矿业机械有限公司 | Sensing system and method for constructing color point cloud map of mobile machine |
Non-Patent Citations (2)
Title |
---|
ŁUKASZ SOBCZAK等: ""LiDAR Point Cloud Generation for SLAM Algorithm Evaluation"", 《SENSOR》 * |
王金科等: ""多源融合SLAM 的现状与挑战"", 《中国象形图形学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116229057A (en) * | 2022-12-22 | 2023-06-06 | 之江实验室 | Method and device for three-dimensional laser radar point cloud semantic segmentation based on deep learning |
CN116229057B (en) * | 2022-12-22 | 2023-10-27 | 之江实验室 | Method and device for three-dimensional laser radar point cloud semantic segmentation based on deep learning |
WO2024130776A1 (en) * | 2022-12-22 | 2024-06-27 | 之江实验室 | Three-dimensional lidar point cloud semantic segmentation method and apparatus based on deep learning |
CN115638788A (en) * | 2022-12-23 | 2023-01-24 | 安徽蔚来智驾科技有限公司 | Semantic vector map construction method, computer equipment and storage medium |
CN115638788B (en) * | 2022-12-23 | 2023-03-21 | 安徽蔚来智驾科技有限公司 | Semantic vector map construction method, computer equipment and storage medium |
CN116778162A (en) * | 2023-06-25 | 2023-09-19 | 南京航空航天大学 | Weak supervision large aircraft appearance point cloud semantic segmentation method based on geometric feature guidance |
CN117517864A (en) * | 2023-11-08 | 2024-02-06 | 南京航空航天大学 | Laser radar-based power transmission line near electricity early warning method and device |
CN117517864B (en) * | 2023-11-08 | 2024-04-26 | 南京航空航天大学 | Laser radar-based power transmission line near electricity early warning method and device |
Also Published As
Publication number | Publication date |
---|---|
CN115421158B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115421158B (en) | Self-supervision learning solid-state laser radar three-dimensional semantic mapping method and device | |
CN108665496B (en) | End-to-end semantic instant positioning and mapping method based on deep learning | |
JP7236565B2 (en) | POSITION AND ATTITUDE DETERMINATION METHOD, APPARATUS, ELECTRONIC DEVICE, STORAGE MEDIUM AND COMPUTER PROGRAM | |
CN113393522B (en) | 6D pose estimation method based on monocular RGB camera regression depth information | |
US7747106B2 (en) | Method and system for filtering, registering, and matching 2.5D normal maps | |
CN115655262B (en) | Deep learning perception-based multi-level semantic map construction method and device | |
CN107462897B (en) | Three-dimensional mapping method based on laser radar | |
JP2014523572A (en) | Generating map data | |
US11948310B2 (en) | Systems and methods for jointly training a machine-learning-based monocular optical flow, depth, and scene flow estimator | |
CN116222577B (en) | Closed loop detection method, training method, system, electronic equipment and storage medium | |
CN115267724B (en) | Position re-identification method of mobile robot capable of estimating pose based on laser radar | |
CN112907557A (en) | Road detection method, road detection device, computing equipment and storage medium | |
Li et al. | FDnCNN-based image denoising for multi-labfel localization measurement | |
Li et al. | Vehicle object detection based on rgb-camera and radar sensor fusion | |
CN117570968A (en) | Map construction and maintenance method and device based on visual road sign and storage medium | |
CN117496312A (en) | Three-dimensional multi-target detection method based on multi-mode fusion algorithm | |
US11741631B2 (en) | Real-time alignment of multiple point clouds to video capture | |
Milli et al. | Multi-modal multi-task (3mt) road segmentation | |
CN114782689A (en) | Point cloud plane segmentation method based on multi-frame data fusion | |
CN113554754A (en) | Indoor positioning method based on computer vision | |
Mo et al. | Cross-based dense depth estimation by fusing stereo vision with measured sparse depth | |
CN114417946A (en) | Target detection method and device | |
WO2024011455A1 (en) | Method for position re-recognition of mobile robot based on lidar estimable pose | |
Wang et al. | Improved simultaneous localization and mapping by stereo camera and SURF | |
CN114323038A (en) | Outdoor positioning method fusing binocular vision and 2D laser radar |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |