CN111080659A - Environmental semantic perception method based on visual information - Google Patents

Environmental semantic perception method based on visual information Download PDF

Info

Publication number
CN111080659A
CN111080659A CN201911317441.3A CN201911317441A CN111080659A CN 111080659 A CN111080659 A CN 111080659A CN 201911317441 A CN201911317441 A CN 201911317441A CN 111080659 A CN111080659 A CN 111080659A
Authority
CN
China
Prior art keywords
semantic
information
point cloud
map
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911317441.3A
Other languages
Chinese (zh)
Inventor
白成超
郭继峰
郑红星
刘天航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN201911317441.3A priority Critical patent/CN111080659A/en
Publication of CN111080659A publication Critical patent/CN111080659A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Abstract

The invention provides an environmental semantic perception method based on visual information, which comprises the following steps: acquiring environmental image information by using a Kinect V1.0 camera to obtain a color image and a depth image after registration; on the basis of the color image and the depth image after registration, resolving a three-dimensional pose of the camera according to ORB feature points extracted from each frame through an ORB _ SLAM2 process to obtain camera pose information; performing semantic segmentation on each frame of image to generate semantic color information; synchronously generating a point cloud according to the input depth map and the internal reference matrix of the camera; registering semantic color information into the point cloud to obtain a local semantic point cloud result; fusing the camera pose information and the local semantic point cloud result to obtain new global semantic point cloud information; and expressing the fused global semantic point cloud information by using an octree map to obtain a final three-dimensional octree semantic map. The invention provides deeper human-like understanding for the environment detection of the extraterrestrial celestial body patrolling device.

Description

Environmental semantic perception method based on visual information
Technical Field
The invention relates to an environment semantic perception method based on visual information, and belongs to the technical field of artificial intelligence information.
Background
The effective perception of the mobile platform in the environment can be realized through the synchronous positioning and drawing technology, the obstacle information in the environment can be known, the relative relation between the mobile platform and the environment can be obtained through synchronization, the key step of realizing platform autonomy is realized, however, with the continuous development of the platform and a detection load, more task scenes and requirements are provided, and the problems in actual encounter cannot be solved for the identification of the appearance and the geometric characteristics of the target. In the process of inspecting extraterrestrial celestial bodies, a piece of landform with similar appearance can obtain three-dimensional reconstruction of an environment if only depending on a traditional identification mode, but is difficult to distinguish the difference between two landforms, for a previous detection task, the prior cognition is far from insufficient as long as the prior detection task can identify whether a front obstacle exists or not and whether the front obstacle can pass or not, and along with the increase of detection time and scale, the environment needs to be understood in a semantic level, namely whether a target exists or not needs to be detected, what needs to be further analyzed, and the intelligent core of the inspection tour device is achieved. In the unmanned research, the problem is also important, a vehicle, a pedestrian, a roadblock and other targets with high randomness can be met in the driving process, the fact that a collision cannot be avoided is assumed, the fact that a front obstacle exists is recognized, the significance of semantic understanding is highlighted when the obstacle cannot be effectively judged, and after the understanding result that one is a person and the other is a grass heap is given, correct judgment can be easily made. The semantic awareness of the environment determines the correctness and effectiveness of the task performed, as can be analyzed by the above example. Furthermore, semantic cognition on the environment is more similar to human environment understanding behavior, and currently, research aiming at the problem gradually becomes a field hotspot.
Semantic segmentation may be understood as the division of the image input into different semantically interpretable classes, a common segmentation architecture being convolutional neural networks. In 2017, a learner from UCL gives a semantic segmentation idea based on deep learning, and a solution named deep Lab is provided, and mainly comprises a deep convolution network, an up-sampling convolution and a fully-connected conditional random field. The resolution of characteristic response calculation can be effectively controlled by utilizing the up-sampling filtering, and the field range of the filter can be effectively expanded, so that more semantic information can be fused; furthermore, the multi-scale segmentation of the target is realized by using the pyramid Pooling of the upsampling space (Atrous SpatialPyramid Pooling); and finally, the precise positioning of the target boundary is improved by combining the deep convolutional neural network with the probability map model. In order to solve the problem that the existing semantic segmentation method does not effectively utilize neural network parameters, Chaurasia et al utilizes an encoder to express and realize efficient semantic segmentation and provides a LinkNet solution, so that learning training can be performed on the premise of not obviously increasing parameters. Zhao et al provides a real-time semantic segmentation framework for high-resolution images, and provides an image cascade network (ICNet), and high-quality semantic segmentation is rapidly realized by introducing a cascade feature fusion unit. Schneider et al propose a new multimode convolutional neural network architecture for semantic segmentation and target detection, utilize complementary input information in addition to color information, and the advantage of this kind of combined model is that the intermediate level fusion is realized, make the network can utilize the interdependence of the cross-modality. In order to solve the scene analysis in the unlimited open vocabulary environment, Zhao et al proposes a pyramid scene analysis network (PSPNet), and realizes global semantic understanding by semantic fusion based on different regions, and it can be seen from the result that the PSPNet provides a very good framework for pixel-level prediction. In addition, U-Net, SegNet, DeconvNet, RefineNet, PixelNet and other methods also show good segmentation effects, and meanwhile, scholars propose an end-to-end-based segmentation model and an implementation idea based on countermeasure training, so that a new direction is provided for subsequent research.
Based on the background investigation and analysis, the demand for environment semantic perception is gradually increased, and the future development direction is also predicted, so that the invention provides a brand-new semantic perception method on the basis of the existing perception technology, and provides support for the environment semantic understanding of the patrol instrument.
Disclosure of Invention
The invention provides an environmental semantic perception method based on visual information, and aims to solve the problem that the existing environmental perception is insufficient in deep semantic understanding capability and simultaneously provides reliable environmental perception information for a subsequent planning control stage.
An environmental semantic perception method based on visual information, the perception method comprising the following steps:
the method comprises the following steps: acquiring environment image information by using a Kinect V1.0 camera to obtain a color image and a depth image after registration, and simultaneously executing the second step and the third step;
step two: on the basis of the color image and the depth image after registration, resolving a three-dimensional pose of the camera according to ORB feature points extracted from each frame through an ORB _ SLAM2 process to obtain camera pose information, and then executing a fifth step;
step three: based on the issued color image, performing semantic segmentation on each frame of image to generate semantic color information; synchronously generating a point cloud according to the input depth map and the internal reference matrix of the camera;
step four: registering semantic color information generated in the third step into the point cloud generated in the third step to obtain a local semantic point cloud result;
step five: fusing the camera pose information obtained in the step two with the local semantic point cloud result generated in the step four to obtain new global semantic point cloud information;
step six: and expressing the fused global semantic point cloud information obtained in the fifth step by using an octree map to obtain a final three-dimensional octree semantic map.
Further, in the second step, specifically, the color map and the depth map after the registration are obtained in a manner of combining OpenNI and OpenCV.
Further, in step two, specifically, the three main parallel threads of the ORB _ SLAM2 are as follows:
the camera position of each frame is positioned through the matched features in the local map, and only the movement BA is used for minimizing the reprojection error, wherein the BA is Bundle Adjustment and is translated into beam Adjustment;
the management and optimization of the local map are realized based on the local BA;
and performing loop detection, and correcting the accumulated drift based on pose graph optimization.
Further, in step two, specifically, the ORB _ SLAM2 process is optimized by bundle adjustment.
Further, in step three, specifically, the point cloud is a three-dimensional point cloud.
Further, in step three, specifically, the pyramid scene analysis network PSPNet is used as a model for implementing the semantic segmentation network.
Further, in the fifth step, a maximum confidence fusion mode is adopted in a fusion mode in which the camera pose information and the local semantic point cloud result generated in the fourth step are fused.
Further, in the sixth step, specifically, when the point cloud is inserted into the three-dimensional map, the points are filtered through a voxel filter to sample the points downwards; then, inserting the points into an Octomap, and removing a free space by utilizing ray projection so as to update an internal node of the Octomap, namely a voxel with lower resolution; and finally, sorting the updated Octomap to realize visualization.
The main advantages of the invention are:
the invention realizes semantic reconstruction of the detection environment based on visual information, has the capabilities of synchronous three-dimensional reconstruction, semantic segmentation and spatial semantic representation, provides deeper human-like understanding for environment detection of the extraterrestrial celestial body inspection tour, and provides more reliable information input for task planning and decision analysis. The invention belongs to the direction of artificial intelligence information technology, and improves the high-level semantic understanding capability compared with the prior art.
Drawings
FIG. 1 is a frame diagram for implementing RGB-D semantic SLAM based environment semantic perception method based on visual information according to the present invention;
FIG. 2 is a schematic diagram of an input information conversion process;
FIG. 3 is an ORB _ SLAM2 implementation framework;
FIG. 4 is a schematic diagram of an octree map;
FIG. 5 is a schematic diagram of a pyramid scene analysis network framework;
FIG. 6 is a diagram of the reconstruction of results in a dataset environment based on ORB _ SLAM 2;
FIG. 7 is a diagram illustrating the result of mapping based on RGBD three-dimensional semantics;
FIG. 8 is a diagram of the final result and semantic tags;
FIG. 9 is a diagram illustrating indoor environment mapping results based on ORB _ SLAM 2;
FIG. 10 is a diagram illustrating the result of mapping based on RGBD three-dimensional semantics;
FIG. 11 is a diagram of the final result and semantic tags.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides an embodiment of an environmental semantic perception method based on visual information, where the perception method includes the following steps:
the method comprises the following steps: acquiring environment image information by using a Kinect V1.0 camera to obtain a color image and a depth image after registration, and simultaneously executing the second step and the third step;
step two: on the basis of the color image and the depth image after registration, resolving a three-dimensional pose of the camera according to ORB feature points extracted from each frame through an ORB _ SLAM2 process (synchronous positioning and drawing) to obtain camera pose information, and then executing a fifth step;
step three: based on the issued color image, performing semantic segmentation on each frame of image to generate semantic color information; synchronously generating a point cloud according to the input depth map and the internal reference matrix of the camera;
step four: registering semantic color information generated in the third step into the point cloud generated in the third step to obtain a local semantic point cloud result;
step five: fusing the camera pose information obtained in the step two with the local semantic point cloud result generated in the step four to obtain new global semantic point cloud information;
step six: and expressing the fused global semantic point cloud information obtained in the fifth step by using an octree map to obtain a final three-dimensional octree semantic map.
Specifically, the invention realizes the pose estimation, semantic segmentation and global/local semantic reconstruction of visual point cloud information by using the depth camera, improves the understanding capacity of environment information, ensures that the environment perception of the patrol device is not limited to geometric three-dimensional understanding, increases the understanding of barrier semantic attributes, is more favorable for the development of tasks and the planning of paths, and greatly improves the intelligence of the platform.
Referring to fig. 2, in the present preferred embodiment, in step two, specifically, the color map and the depth map after registration are obtained by combining OpenNI and OpenCV.
Referring to fig. 3, in the present preferred embodiment, in step two, specifically, the three main parallel threads of the ORB _ SLAM2 are as follows:
the camera position of each frame is positioned through the matched features in the local map, and only the movement BA is used for minimizing the reprojection error;
the management and optimization of the local map are realized based on the local BA;
and performing loop detection, and correcting the accumulated drift based on pose graph optimization.
In the preferred embodiment of this section, in step two, specifically, the ORB _ SLAM2 process is optimized by bundle adjustment.
Specifically, referring to fig. 2-3, regarding ORB _ SLAM2, in 2017 Mur-Artal et al, an open source SLAM solution applicable to monocular, binocular and RGB-D cameras was proposed, namely ORB _ SLAM2, compared with the previous monocular ORB _ SLAM system, the application range is expanded, not limited to monocular vision, and the whole system framework includes closed loop detection, relocation and map reuse; secondly, higher precision is obtained by introducing Beam Adjustment (BA) optimization at the rear end compared with a real-time method based on Iteration Closest Point (ICP) or luminosity and depth error minimization and the like; thirdly, the final precision is superior to the direct binocular matching by using the binocular point matching and monocular observation at far and near positions; fourthly, a lightweight positioning mode is provided, the visual odometer is utilized to track the non-reconstructed area and is matched with the map point allowing zero drift positioning, and the positioning problem under the condition that the map cannot be built is effectively solved. At present, the system is already applied to various application scenarios, such as handheld environment reconstruction equipment, unmanned aerial vehicle environment reconstruction, unmanned vehicle automatic driving in a large-scale environment and the like, so the invention uses ORB _ SLAM2 as the rear end of the SLAM to solve the camera pose, the SLAM system can still obtain accurate global positioning accuracy in a large time scale, the requirement on the operating environment is conventional, and the real-time solution operation can be realized based on a CPU.
1) System input
The input of the system is a color image and a depth image collected by a camera, and for each frame of image, a group of characteristic points are extracted, which correspond to the Kinect V1.0 camera adopted by the invention, that is, 1000 points are extracted from each image in the size of 640 multiplied by 480. Meanwhile, for the image data acquisition application, a mode of combining OpenNI and OpenCV is adopted, because OpenCV cannot directly operate the sensor, and the image format extracted by OpenNI cannot directly perform subsequent operations, and the operation flow is shown in fig. 2. The available state of the sensor equipment is detected through OpenNI, the data stream is collected, and the data stream is converted into an OpenCV available form through format conversion, namely, a picture format for subsequent operation can be carried out. The obtained image information will be stored in the form of color pictures and depth pictures.
2) System architecture and operation
In operation, the system has three main parallel threads: firstly, the camera position of each frame is positioned through matched features in a local map, and only motion BA is used for minimizing a reprojection error; secondly, the management and optimization of the local map are realized based on the local BA; performing loop detection, and correcting accumulated drift based on pose graph optimization; after this, a fourth thread, i.e., a full BA optimization, can be performed to give an optimal mechanism and motion solution. In addition, a location identification module based on DBoW2 is embedded for relocation in case of tracking failure or reinitialization in already reconstructed scenes. While the system also maintains a common visibility graph (covisibilitygraph), i.e., two keyframes connecting arbitrary observation common points and a minimum spanning tree connecting all the keyframes, these graph structures allow the retrieval of local windows of the keyframes for tracking and local mapping to occur locally. The system uses the same ORB features for tracking, mapping, and recognition tasks, which are robust to rotation and scale, and well invariant to camera auto-gain, auto-exposure, and illumination variations. And the method has the characteristics of fast reading, extraction and matching, and has advantages in the aspect of real-time operation.
3) Beam method Adjustment (Bundle Adjustment) optimization
Map point three-dimensional coordinate Xw,j3Keyframe pose TiwE SE (3), where w represents the world system, for matching keypoints
Figure BDA0002326222090000065
The reprojection error of (2) is optimized to minimize the error sum thereof. The error term for observing map point j in key frame i is:
ei,j=xi,ji(Tiw,Xw,j) (1)
where piiIs the projection equation:
Figure BDA0002326222090000061
[xi,jyi,jzi,j]T=RiwXw,j+tiw(3)
where R isiw∈SO(3)、
Figure BDA0002326222090000062
Are each TiwThe rotational and translational portions of (a). (f)i,u,fi,v) And (c)i,u,ci,v) Is the camera internal parameter corresponding to the key frame i moment. The cost function to be minimized is:
Figure BDA0002326222090000063
where ρ ishIs a Huber robust kernel function that,
Figure BDA0002326222090000064
is a covariance matrix related to the scale of the detection keypoints. For a full BA, all points and key frames are optimized, the first key frame being fixed as the origin. In local BA, all points contained in the local area are optimized, while the subset of keyframes is fixed. In "pose graph optimization" or "motion BA", all points are fixed, only the camera pose is optimized. Pose graph Optimization (PoseGraph Optimization) under the SE (3) constraint is given below.
Firstly, a pose graph of a binary edge is given, and the error of the edge is defined as:
Figure BDA0002326222090000071
after the closed edge is calculated, logSE(3)Conversion to tangent space, so that the error vector is6The vector of (1). The goal is to optimize the keyframe pose in SE (3) space, minimizing the cost function:
Figure BDA0002326222090000072
in the formula, Λi,jIs the information matrix of the edge. Although this method is a rough approximation of a full BA, it has faster, better convergence than BA.
In this preferred embodiment, in step three, specifically, the point cloud is a three-dimensional point cloud.
Specifically, before inserting the three-dimensional map, the environmental structure information is stored in the form of a point cloud for performing message delivery. A point cloud is a set of unordered points, each containing the coordinates of the point in some reference system. The depth image is first registered to the reference frame of the color image. Then, the real world coordinates of each pixel are calculated according to the position and the depth of each pixel on the image and the camera internal parameters, and further point cloud information is generated.
In the pinhole camera model, given a pixel and its pixel coordinates (X, Y) and depth d, the coordinates (X, Y, Z) of the real world sitting in the camera optical center coordinate system can be calculated by:
Figure BDA0002326222090000073
where f isx、fyIs the focal length of the camera, cx、cyIs the pixel coordinate of the center of the optical axis on the image. In addition to the location and RGB information, semantic information is also stored in the point cloud. Different point types are used for different semantic fusion methods. According to the invention, the three-dimensional semantic reconstruction is realized by adopting the maximum confidence fusion, so that the point cloud data structure comprises three-dimensional position information, RGB color information, semantic color information and semantic confidence information.
Referring to fig. 5, in the preferred embodiment of this section, in step three, specifically, the pyramid scene analysis network PSPNet is used as a model for implementing the semantic segmentation network.
Specifically, the main purpose of semantic segmentation is to distinguish semantic information of an image, and compared with target identification and positioning, the semantic segmentation is closer to real application in application, that is, whether an object to be identified exists in the image is given by the target identification, the positioning is to give a relative spatial relationship of the identified objects, and the semantic segmentation is to semantically distinguish the environment, so that the semantic segmentation has an understanding capability on each frame of image. The environment perception of the semantic level is the most needed in practical application, because the attribute of the environment can be better judged by combining the priori knowledge through semantic cognition, the planning constraint is considered from more aspects, and a safer and more optimized running track is obtained.
In recent years, with the rise of artificial intelligence technology, semantic segmentation is more and more emphasized, and the semantic segmentation has been brought into effect in many fields through the combination with a neural network, such as intelligent robots, unmanned driving, medical images and the like, so that support is provided for high-level understanding of different task scenes, and the conversion from actual measurement information to abstract semantic understanding is provided. For the extraterrestrial celestial body patrol device, the ability is also needed to help the patrol device autonomously carry out patrol tasks, know what obstacles are while detecting obstacles in front, know the current terrain, and be unsuitable for going forward and the like.
At present, mature deep networks such as AlexNet, VGG-16, GoogleNet and ResNet have good effects on the realization of image semantic segmentation. The invention adopts pyramid scene analysis network (PSPNet) as a model for realizing CNN semantic segmentation network. Fig. 5-6 show the structure of the network model, the input of which is the collected color image of the scene, and the output is the score map containing the category information. To implement this process, first process the input image with ResNet to generate a feature map; secondly, performing pyramid pooling operation on the generated feature map, thereby obtaining feature maps with different resolutions; then, carrying out convolution operation on each pooled feature map, and stacking the results in combination with the upsampled feature map to form a final feature map; and finally, obtaining a score map of the category through convolution processing.
When the method is implemented on an unmanned vehicle platform, an image acquired by the Kinect V1.0 is firstly adjusted to the input size of a CNN semantic segmentation network; simultaneously, a Softmax activation function is adopted during class output mapping so as to generate a probability distribution (the sum of scores is 1); then, according to a semantic fusion method, a semantic label with the highest probability is selected for each pixel, and the probabilities are called semantic confidence of associated semantic category labels; and finally, decoding the semantic labels into RGB colors according to the color map. And obtaining and representing the semantic information.
In this preferred embodiment of the present invention, in the fifth step, specifically, a fusion mode in which the camera pose information and the local semantic point cloud result generated in the fourth step are fused adopts a maximum confidence fusion mode.
Specifically, semantic labels corresponding to pixels of each frame of image can be obtained by performing semantic segmentation on each frame of image, and in a continuous motion environment, semantic values at a plurality of continuous moments need to be fused to realize global semantic understanding. When point cloud fusion is executed, the method adopts a maximum confidence fusion mode, the fusion comprises the semantic color with the highest confidence generated by the CNN semantic segmentation network and the confidence of the generated point cloud, and the same information is also stored in each voxel of Octmap. When inserting a point cloud into an Octomap, if a voxel has a new measurement, the two semantic information are fused together.
If the two semantic colors are the same, the semantic color is maintained and the confidence is the average of the confidence of the two semantics. In another case, if the two semantics are different in color, the semantics with higher confidence are retained, and the invention reduces the confidence by 0.9 as a penalty for the inconsistency. This may also ensure that the semantic information is always updated, even if it already has a very high degree of confidence. The method has the advantage that only one semantic information is stored, so that the memory efficiency is improved. The pseudo code is shown in table 1:
Figure BDA0002326222090000091
TABLE 1 semantic fusion-Max confidence fusion
Referring to fig. 4, in the present preferred embodiment, in step six, specifically, when a point cloud is inserted into a three-dimensional map, first, filtering points through a voxel filter to down-sample the points; then, inserting the points into the Octomap, and removing a free space within a certain range by utilizing ray projection so as to update internal nodes of the Octomap, namely voxels with lower resolution; and finally, sorting the updated Octomap to realize visualization.
Specifically, regarding the octree map:
the three-dimensional reconstruction terrain can be represented in various forms, the three-dimensional reconstruction terrain can be divided into a measurement map and a topological map, and in order to effectively improve the map representation in a large-scale environment, the Octomap is used as the three-dimensional map representation. Octmap represents a large bounded space as an octree occupying a grid (voxel). Each node in the octree represents a voxel of a particular size, depending on its level in the tree. Each parent node of the octree is subdivided into 8 children nodes until the best resolution is reached. An illustration of an octree is shown in FIG. 4. Thus, a three-dimensional map of a large scale can be efficiently stored in the memory.
The Octomap models the sensors with hit and miss rates and updates the occupancy of voxels based on different measurements in a probabilistic manner. Through testing, it was found that testing of the invention was suitable with a resolution of 2 cm, providing not only good detail for the characterization of the environment, but also maintaining the real-time efficiency of the inset map. In addition to this, Octomap is also able to distinguish free space from unknown space.
Regarding point cloud insertion maps:
when inserting a point cloud in a three-dimensional map, the points are first filtered by a voxel filter to down-sample the points. These points are then inserted into Octomap. And the free space within a certain range is eliminated by utilizing ray projection. And then updating the internal nodes of the Octomap, namely the voxels with lower resolution. And finally, sorting the updated Octomap to realize visualization.
Wherein the voxel filter is used to down-sample the point cloud. The principle is to retain only one point (resolution) in a given voxel space. Since only one point is needed to update the octree nodes, the resolution of the voxel filter is set to the same value as the octree resolution. Such a filter can greatly improve performance because it reduces the number of points, especially for points far from the sensor, which is time consuming for ray casting. For a kinect v1.0 with an image size of 640 x 480, 307200 dots need to be inserted. After voxel filtering, 15000 to 60000 points can be obtained according to the distance of the points, thereby greatly reducing the storage of the points and improving the utilization of effective points.
When the point cloud is inserted into the Octomap, only the voxels with the highest resolution (leaf nodes) are updated. Their occupancy probabilities, RGB colors, semantic colors, and confidences are updated. And meanwhile, updating semantic color and confidence level according to a maximum confidence level semantic fusion method. Considering the limited measurement range and efficiency of the depth camera, only points at a certain distance from the origin (the optical center of the camera) are inserted here. This maximum range is set to 5 meters in the present invention. For the probability of occupation, according to the derivation in the octree correlation paper, assuming that T is 1,2, T-1, time T, the observed data is z1,,zTThen, the information recorded by the nth leaf node is:
Figure BDA0002326222090000101
to clear free space, when a point is inserted in Octomap, ray casting may be performed to clear all voxels on the straight line between the origin and the end point. When the endpoint is far from the origin, this can be a very expensive operation because many octree searches are performed. In order to eliminate the necessary free space while maintaining reasonable operating efficiency, the present invention only projects light to a limited extent.
Then, color and semantic information at low resolution is obtained by updating the internal nodes of the octree. The occupation probability of a father node is set to be the maximum value of eight child nodes of the father node, the color of the father node is set to be the average value of the child nodes of the father node, and the semantic information of the father node is the fusion of the semantics of the child nodes.
Finally, in Octomap, the same child nodes may be pruned to reduce the size of the map data. In the source code implementation of Octomap, children are pruned if all of these children have the same footprint. Since semantic information must be stored on leaf nodes, a node's children are pruned only if all its children have the same probability of occupation, the same semantic color, and the same semantic confidence. So in actual testing, the probability of child nodes being pruned is low.
The specific embodiment of the invention:
(1) validating parameter settings
Based on the method, the algorithm verification is completed in 2 environments, wherein the simulation environment verification is performed based on an ADE20K data set issued by MIT, and the data set provides a good test reference for perception and semantic understanding of a scene; the complex environment test is carried out in the laboratory environment including people, tables, chairs, cabinets, books and the like, and the ISAP laboratory of the new Harbour technology is selected for testing.
Meanwhile, a whale XQ unmanned vehicle platform is selected as an experimental test platform, a KinectV1.0 depth vision camera is carried, and the internal parameter is fx=517.306408,fy=516.469215,cx=318.643040,cy255.313989, the tangential distortion coefficient is k1=0.262383,k2-0.953104, radial distortion coefficient p1=-0.005358,p2=0.002628,p3When 1.163314, the effective depth range can be calculated as:
Figure BDA0002326222090000111
in the process of physical testing, the acquisition frequency of a color image and a depth image of the Kinect V1.0 camera is 30Hz, the acquisition frequency of the vibration sensor is 100Hz, the frequency of a feature vector is 1.6Hz, and the running frequency of ORB _ SLAM2 is 15 Hz.
(2) Test results
Open data set testing:
semantic reconstruction test analysis under the environment of a public data set is given, and ORB _ SLAM2 sparse environment reconstruction and dense three-dimensional semantic reconstruction based on the method provided by the invention are respectively completed based on an ADE20K data set, wherein a test result based on ORB _ SLAM2 is given in FIG. 6, sparse point cloud reconstruction is given on the left side in the graph, only the change trend of the environment can be seen approximately, the graph is usually used for auxiliary navigation and provides detection feedback of front obstacles for a platform, and an image schematic and feature point detection result of a key frame is given on the right side.
Therefore, as shown in fig. 7, a three-dimensional semantic mapping result based on RGBD is given, and video data of 52s in the data set is collected together, and the mapping frequency is 0.9 Hz. The left side of the graph shows a semantic map construction result of a traveling process, the right side of the graph shows an image schematic of the process in data, parameters set based on experiments can be seen, a test environment result can be reasonably reconstructed, meanwhile, in combination with the graph 8, the judgment of the environment semantic result can be basically consistent with the actual result, for example, the color schematic of typical scenes such as the ground, a table, a chair and a wall surface in a reconstructed graph is consistent with the result in a semantic label, and a green track in the graph is a track schematic of the motion of a camera. It should be noted here that in order to globally judge the semantic labeling precision, it is necessary to know the real semantic information of the reconstructed point cloud and the point cloud semantic estimation in the testing process, but the measurement of the standard value is very difficult, and at the same time, each experiment cannot ensure that the point cloud selection is completely consistent, so the method provided by the invention can judge the correctness according to the semantic color information, and then indirectly feed back the reliability of the label according to the planning success rate of the experiment platform based on the semantic information in the subsequent physical test. Of course, there are some data sets that give true contrast values, but the data are very limited, and the ADE20K data set for verification in the present invention does not give the true contrast values, so the judgment is still made by comparing the semantic color of the point cloud with the actual value, because the cognitive ability is finally used for physical application, and the above-mentioned judgment method has certain operability.
And (3) testing the physical environment:
the invention carries out physical test in a complex laboratory environment, walks for a circle around a laboratory walkway, acquires 84s of video data, and places a lawn in a scene for distinguishing terrains made of different materials, wherein the scene comprises common articles such as floors, walls, people, tables, chairs, curtains, glass, lockers, bags, garbage cans and the like. Consistent with the data set testing thought, firstly, a sparse reconstruction test based on the ORB _ SLAM2 is given, as shown in fig. 9, the left side shows an environment three-dimensional point cloud reconstruction, the right side shows a part of key frame and feature point detection results in the process, and the method can also show only the rough shape of the environment, and is difficult to represent information such as the specific shape of an object in the real environment, but for a patrol task, the environment is unknown, the information of each frame is important for both scientific detection and navigation, and the environment which runs outside the ground can be expressed abundantly in practice, so that the method based on the invention is used for carrying out an experiment on the dense three-dimensional semantic reconstruction in the environment.
Fig. 10 shows a three-dimensional semantic mapping result based on RGBD, where the frequency of the mapping process is 0.4Hz, the left side in the diagram is the semantic mapping result in the test process, different colors represent different semantic information, and the right side is an image schematic acquired around an ISAP laboratory in the test process, so that the environment passing through can be seen to contain more object types, and can be seen as complex scene processing.
The final semantic mapping result in the indoor complex environment is shown in fig. 11, where the green track is the actual running path, referring to the semantic label shown on the right, the experiment has the semantic recognition and reconstruction capability of 16 objects such as walls, floors, people, doors, glass windows, storage cabinets, boxes, chairs, curtains, grass and the like, and through comparison with an actual measurement environment, the walking environment information has good semantic mapping effect, compared with the reconstruction result based on ORB _ SLAM2, although the approximate environment results are similar in schematic and also show whether the detection environment has obstacles, the semantic mapping meaning is larger, such as a laboratory student sitting close to the wall in the environment on the left side in the figure, the semantic graph can effectively segment the information from the environment and represent the information by different color values, but the former cannot and only can give the result that an object exists. Meanwhile, as the main purpose of the invention is to improve the terrain perception capability of the patrol instrument, 1000 point cloud points are randomly extracted by selecting five types of objects (walls, floors, grass, storage cabinets and doors) in a test environment, and a statistical result of the test marking precision is given by comparing the predicted semantic value and the real semantic value of each point, wherein the statistical standard is shown as the following formula:
Figure BDA0002326222090000131
the results are shown in table 2:
Figure BDA0002326222090000132
TABLE 2 semantic reconstruction tag precision
The marking precision of the five types of objects is higher than 90%, and the selected types are the types which are easy to be confused in the conventional process, so that the operation objects and the environment are relatively single and the operation speed is relatively slow in comparison with the extraterrestrial celestial body patrol, and a cognitive result with higher precision can be obtained, so that abundant semantic information on a patrol path can be obtained by utilizing semantic construction, and the patrol device has the capability of environment understanding, thereby planning the optimal path of the patrol and executing an autonomous set task with ideas. This is certainly the development trend of future patrols, and the research of the present invention can provide a certain reference for this purpose.

Claims (8)

1. An environmental semantic perception method based on visual information is characterized by comprising the following steps:
the method comprises the following steps: acquiring environment image information by using a Kinect V1.0 camera to obtain a color image and a depth image after registration, and simultaneously executing the second step and the third step;
step two: on the basis of the color image and the depth image after registration, resolving a three-dimensional pose of the camera according to ORB feature points extracted from each frame through an ORB _ SLAM2 process to obtain camera pose information, and then executing a fifth step;
step three: based on the issued color image, performing semantic segmentation on each frame of image to generate semantic color information; synchronously generating a point cloud according to the input depth map and the internal reference matrix of the camera;
step four: registering semantic color information generated in the third step into the point cloud generated in the third step to obtain a local semantic point cloud result;
step five: fusing the camera pose information obtained in the step two with the local semantic point cloud result generated in the step four to obtain new global semantic point cloud information;
step six: and expressing the fused global semantic point cloud information obtained in the fifth step by using an octree map to obtain a final three-dimensional octree semantic map.
2. The method as claimed in claim 1, wherein in step two, the color map and the depth map after registration are obtained by combining OpenNI and OpenCV.
3. The method for semantic perception of an environment based on visual information as claimed in claim 1, wherein in step two, specifically, the three main parallel threads of the ORB _ SLAM2 are as follows:
the camera position of each frame is positioned through the matched features in the local map, and only the movement BA is used for minimizing the reprojection error;
the management and optimization of the local map are realized based on the local BA;
and performing loop detection, and correcting the accumulated drift based on pose graph optimization.
4. The method as claimed in claim 1, wherein in step two, specifically, the ORB SLAM2 process is optimized by bundle adjustment.
5. The method for sensing environmental semantics based on visual information according to claim 1, wherein in step three, specifically, the point cloud is a three-dimensional point cloud.
6. The method for sensing environmental semantics based on visual information according to claim 1, wherein in step three, specifically, a pyramid scene analysis network PSPNet is adopted as a model for implementing a semantic segmentation network.
7. The visual information-based environment semantic perception method according to claim 1, wherein in the fifth step, a maximum confidence fusion mode is adopted as a fusion mode in which the camera pose information is fused with the local semantic point cloud result generated in the fourth step.
8. The method for sensing environmental semantics based on visual information according to claim 1, wherein in step six, specifically, when a point cloud is inserted into a three-dimensional map, points are first filtered through a voxel filter to down-sample the points; then, inserting the points into an Octomap, and removing a free space by utilizing ray projection so as to update an internal node of the Octomap, namely a voxel with lower resolution; and finally, sorting the updated Octomap to realize visualization.
CN201911317441.3A 2019-12-19 2019-12-19 Environmental semantic perception method based on visual information Pending CN111080659A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911317441.3A CN111080659A (en) 2019-12-19 2019-12-19 Environmental semantic perception method based on visual information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911317441.3A CN111080659A (en) 2019-12-19 2019-12-19 Environmental semantic perception method based on visual information

Publications (1)

Publication Number Publication Date
CN111080659A true CN111080659A (en) 2020-04-28

Family

ID=70315751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911317441.3A Pending CN111080659A (en) 2019-12-19 2019-12-19 Environmental semantic perception method based on visual information

Country Status (1)

Country Link
CN (1) CN111080659A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar
CN111583331A (en) * 2020-05-12 2020-08-25 北京轩宇空间科技有限公司 Method and apparatus for simultaneous localization and mapping
CN111583322A (en) * 2020-05-09 2020-08-25 北京华严互娱科技有限公司 Depth learning-based 2D image scene depth prediction and semantic segmentation method and system
CN111667523A (en) * 2020-06-08 2020-09-15 深圳阿米嘎嘎科技有限公司 Multi-mode multi-source based deep data refining method and system
CN111784709A (en) * 2020-07-07 2020-10-16 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111950426A (en) * 2020-08-06 2020-11-17 东软睿驰汽车技术(沈阳)有限公司 Target detection method and device and delivery vehicle
CN111968129A (en) * 2020-07-15 2020-11-20 上海交通大学 Instant positioning and map construction system and method with semantic perception
CN112000130A (en) * 2020-09-07 2020-11-27 哈尔滨工业大学 Unmanned aerial vehicle's multimachine cooperation high accuracy is built and is drawn positioning system
CN112198859A (en) * 2020-09-07 2021-01-08 西安交通大学 Method, system and device for testing automatic driving vehicle in vehicle ring under mixed scene
CN112233124A (en) * 2020-10-14 2021-01-15 华东交通大学 Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning
CN112258633A (en) * 2020-10-23 2021-01-22 华中科技大学鄂州工业技术研究院 High-precision scene reconstruction method and device based on SLAM technology
CN112258618A (en) * 2020-11-04 2021-01-22 中国科学院空天信息创新研究院 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN112348921A (en) * 2020-11-05 2021-02-09 上海汽车集团股份有限公司 Mapping method and system based on visual semantic point cloud
CN112419461A (en) * 2020-11-16 2021-02-26 北京理工大学 Collaborative unmanned system joint semantic mapping method
CN112509061A (en) * 2020-12-14 2021-03-16 济南浪潮高新科技投资发展有限公司 Multi-camera visual positioning method, system, electronic device and medium
CN112857314A (en) * 2020-12-30 2021-05-28 惠州学院 Bimodal terrain identification method, hardware system and sensor installation method thereof
CN112884802A (en) * 2021-02-24 2021-06-01 电子科技大学 Anti-attack method based on generation
CN112927211A (en) * 2021-03-09 2021-06-08 电子科技大学 Universal anti-attack method based on depth three-dimensional detector, storage medium and terminal
CN113313824A (en) * 2021-04-13 2021-08-27 中山大学 Three-dimensional semantic map construction method
CN113393522A (en) * 2021-05-27 2021-09-14 湖南大学 6D pose estimation method based on monocular RGB camera regression depth information
CN113743413A (en) * 2021-07-30 2021-12-03 的卢技术有限公司 Visual SLAM method and system combining image semantic information
CN114356078A (en) * 2021-12-15 2022-04-15 之江实验室 Method and device for detecting human intention based on gazing target and electronic equipment
US11315271B2 (en) * 2020-09-30 2022-04-26 Tsinghua University Point cloud intensity completion method and system based on semantic segmentation

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160071278A1 (en) * 2013-06-21 2016-03-10 National University Of Ireland, Maynooth Method for Mapping an Environment
CN106204705A (en) * 2016-07-05 2016-12-07 长安大学 A kind of 3D point cloud segmentation method based on multi-line laser radar
CN108052672A (en) * 2017-12-29 2018-05-18 北京师范大学 Promote structural knowledge map construction system and method using group study behavior
CN109117718A (en) * 2018-07-02 2019-01-01 东南大学 A kind of semantic map structuring of three-dimensional towards road scene and storage method
CN109615698A (en) * 2018-12-03 2019-04-12 哈尔滨工业大学(深圳) Multiple no-manned plane SLAM map blending algorithm based on the detection of mutual winding
CN109697753A (en) * 2018-12-10 2019-04-30 智灵飞(北京)科技有限公司 A kind of no-manned plane three-dimensional method for reconstructing, unmanned plane based on RGB-D SLAM
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160071278A1 (en) * 2013-06-21 2016-03-10 National University Of Ireland, Maynooth Method for Mapping an Environment
CN106204705A (en) * 2016-07-05 2016-12-07 长安大学 A kind of 3D point cloud segmentation method based on multi-line laser radar
CN108052672A (en) * 2017-12-29 2018-05-18 北京师范大学 Promote structural knowledge map construction system and method using group study behavior
CN109117718A (en) * 2018-07-02 2019-01-01 东南大学 A kind of semantic map structuring of three-dimensional towards road scene and storage method
CN109615698A (en) * 2018-12-03 2019-04-12 哈尔滨工业大学(深圳) Multiple no-manned plane SLAM map blending algorithm based on the detection of mutual winding
CN109697753A (en) * 2018-12-10 2019-04-30 智灵飞(北京)科技有限公司 A kind of no-manned plane three-dimensional method for reconstructing, unmanned plane based on RGB-D SLAM
CN110243370A (en) * 2019-05-16 2019-09-17 西安理工大学 A kind of three-dimensional semantic map constructing method of the indoor environment based on deep learning
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
TONG LIU 等: "A robust fusion method for RGB-D SLAM", 《2013 CHINESE AUTOMATION CONGRESS》 *
常思雨: "基于视觉SLAM的语义地图研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
张震 等: "一种结合ORB特征和视觉词典的RGB-D SLAM算法", 《计算机工程与应用》 *
胡正乙 等: "基于RGB-D的室内场景实时三维重建算法", 《东北大学学报(自然科学版)》 *

Cited By (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111563442A (en) * 2020-04-29 2020-08-21 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar
CN111563442B (en) * 2020-04-29 2023-05-02 上海交通大学 Slam method and system for fusing point cloud and camera image data based on laser radar
CN111583322A (en) * 2020-05-09 2020-08-25 北京华严互娱科技有限公司 Depth learning-based 2D image scene depth prediction and semantic segmentation method and system
CN111583331A (en) * 2020-05-12 2020-08-25 北京轩宇空间科技有限公司 Method and apparatus for simultaneous localization and mapping
CN111583331B (en) * 2020-05-12 2023-09-01 北京轩宇空间科技有限公司 Method and device for simultaneous localization and mapping
CN111667523B (en) * 2020-06-08 2023-10-31 深圳阿米嘎嘎科技有限公司 Multi-mode multi-source-based deep data refining method and system
CN111667523A (en) * 2020-06-08 2020-09-15 深圳阿米嘎嘎科技有限公司 Multi-mode multi-source based deep data refining method and system
CN111784709A (en) * 2020-07-07 2020-10-16 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111784709B (en) * 2020-07-07 2023-02-17 北京字节跳动网络技术有限公司 Image processing method, image processing device, electronic equipment and computer readable medium
CN111968129B (en) * 2020-07-15 2023-11-07 上海交通大学 Instant positioning and map construction system and method with semantic perception
CN111968129A (en) * 2020-07-15 2020-11-20 上海交通大学 Instant positioning and map construction system and method with semantic perception
CN111950426A (en) * 2020-08-06 2020-11-17 东软睿驰汽车技术(沈阳)有限公司 Target detection method and device and delivery vehicle
CN112198859A (en) * 2020-09-07 2021-01-08 西安交通大学 Method, system and device for testing automatic driving vehicle in vehicle ring under mixed scene
CN112000130A (en) * 2020-09-07 2020-11-27 哈尔滨工业大学 Unmanned aerial vehicle's multimachine cooperation high accuracy is built and is drawn positioning system
CN112198859B (en) * 2020-09-07 2022-02-11 西安交通大学 Method, system and device for testing automatic driving vehicle in vehicle ring under mixed scene
US11315271B2 (en) * 2020-09-30 2022-04-26 Tsinghua University Point cloud intensity completion method and system based on semantic segmentation
CN112233124A (en) * 2020-10-14 2021-01-15 华东交通大学 Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning
CN112233124B (en) * 2020-10-14 2022-05-17 华东交通大学 Point cloud semantic segmentation method and system based on countermeasure learning and multi-modal learning
CN112258633B (en) * 2020-10-23 2023-02-28 华中科技大学鄂州工业技术研究院 SLAM technology-based scene high-precision reconstruction method and device
CN112258633A (en) * 2020-10-23 2021-01-22 华中科技大学鄂州工业技术研究院 High-precision scene reconstruction method and device based on SLAM technology
CN112258618B (en) * 2020-11-04 2021-05-14 中国科学院空天信息创新研究院 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN112258618A (en) * 2020-11-04 2021-01-22 中国科学院空天信息创新研究院 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN112348921B (en) * 2020-11-05 2024-03-29 上海汽车集团股份有限公司 Drawing construction method and system based on visual semantic point cloud
CN112348921A (en) * 2020-11-05 2021-02-09 上海汽车集团股份有限公司 Mapping method and system based on visual semantic point cloud
CN112419461A (en) * 2020-11-16 2021-02-26 北京理工大学 Collaborative unmanned system joint semantic mapping method
CN112509061A (en) * 2020-12-14 2021-03-16 济南浪潮高新科技投资发展有限公司 Multi-camera visual positioning method, system, electronic device and medium
CN112509061B (en) * 2020-12-14 2024-03-22 山东浪潮科学研究院有限公司 Multi-camera visual positioning method, system, electronic device and medium
CN112857314A (en) * 2020-12-30 2021-05-28 惠州学院 Bimodal terrain identification method, hardware system and sensor installation method thereof
CN112884802A (en) * 2021-02-24 2021-06-01 电子科技大学 Anti-attack method based on generation
CN112927211B (en) * 2021-03-09 2023-08-25 电子科技大学 Universal attack countermeasure method based on depth three-dimensional detector, storage medium and terminal
CN112927211A (en) * 2021-03-09 2021-06-08 电子科技大学 Universal anti-attack method based on depth three-dimensional detector, storage medium and terminal
CN113313824A (en) * 2021-04-13 2021-08-27 中山大学 Three-dimensional semantic map construction method
CN113313824B (en) * 2021-04-13 2024-03-15 中山大学 Three-dimensional semantic map construction method
CN113393522A (en) * 2021-05-27 2021-09-14 湖南大学 6D pose estimation method based on monocular RGB camera regression depth information
CN113743413A (en) * 2021-07-30 2021-12-03 的卢技术有限公司 Visual SLAM method and system combining image semantic information
CN113743413B (en) * 2021-07-30 2023-12-01 的卢技术有限公司 Visual SLAM method and system combining image semantic information
CN114356078A (en) * 2021-12-15 2022-04-15 之江实验室 Method and device for detecting human intention based on gazing target and electronic equipment
CN114356078B (en) * 2021-12-15 2024-03-19 之江实验室 Person intention detection method and device based on fixation target and electronic equipment

Similar Documents

Publication Publication Date Title
CN111080659A (en) Environmental semantic perception method based on visual information
CN110956651B (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
Chen et al. Suma++: Efficient lidar-based semantic slam
Kim et al. Remove, then revert: Static point cloud map construction using multiresolution range images
US11030525B2 (en) Systems and methods for deep localization and segmentation with a 3D semantic map
Zhang et al. Instance segmentation of lidar point clouds
Wojek et al. Monocular visual scene understanding: Understanding multi-object traffic scenes
Sekkat et al. SynWoodScape: Synthetic surround-view fisheye camera dataset for autonomous driving
CN109509230A (en) A kind of SLAM method applied to more camera lens combined type panorama cameras
Paz et al. Probabilistic semantic mapping for urban autonomous driving applications
CN111210518A (en) Topological map generation method based on visual fusion landmark
Jeong et al. Multimodal sensor-based semantic 3D mapping for a large-scale environment
CN110781262A (en) Semantic map construction method based on visual SLAM
CN110728751A (en) Construction method of indoor 3D point cloud semantic map
CN113516664A (en) Visual SLAM method based on semantic segmentation dynamic points
Budvytis et al. Large scale joint semantic re-localisation and scene understanding via globally unique instance coordinate regression
CN115388902A (en) Indoor positioning method and system, AR indoor positioning navigation method and system
CN111767854B (en) SLAM loop detection method combined with scene text semantic information
Bieder et al. Exploiting multi-layer grid maps for surround-view semantic segmentation of sparse lidar data
Nagy et al. 3D CNN based phantom object removing from mobile laser scanning data
Khoche et al. Semantic 3d grid maps for autonomous driving
Li et al. Multi-modal neural feature fusion for automatic driving through perception-aware path planning
Zhang et al. Front vehicle detection based on multi-sensor fusion for autonomous vehicle
CN116664851A (en) Automatic driving data extraction method based on artificial intelligence
CN116597122A (en) Data labeling method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200428