CN109559320A - Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network - Google Patents

Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network Download PDF

Info

Publication number
CN109559320A
CN109559320A CN201811388678.6A CN201811388678A CN109559320A CN 109559320 A CN109559320 A CN 109559320A CN 201811388678 A CN201811388678 A CN 201811388678A CN 109559320 A CN109559320 A CN 109559320A
Authority
CN
China
Prior art keywords
semanteme
point
builds
vision slam
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811388678.6A
Other languages
Chinese (zh)
Other versions
CN109559320B (en
Inventor
朱煜
黄俊健
陈旭东
郑兵兵
倪光耀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Publication of CN109559320A publication Critical patent/CN109559320A/en
Application granted granted Critical
Publication of CN109559320B publication Critical patent/CN109559320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of methods for building figure function based on empty convolution deep neural network realization vision SLAM semanteme, obtain the colour information and depth information of current environment by RGB-D camera including (1) embedded development processor;(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud data;(3) Pixel-level semantic segmentation is carried out to image using deep learning, is mapped by image coordinate system and world coordinate system, and make spatial point that there is semantic tagger information;(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation;(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud semanteme map being made of intensive discrete point.The invention further relates to a kind of systems for building figure function based on empty convolution deep neural network realization vision SLAM semanteme.Using this method and system, spatial network map has more advanced semantic information, more meets use demand during building figure in real time.

Description

Realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method and system
Technical field
The present invention relates to the semantic segmentation field that unmanned systems positioned and built in real time figure field more particularly to image procossing, Specifically refer to a kind of method and system that figure function is built based on empty convolution deep neural network realization vision SLAM semanteme.
Background technique
Unmanned systems are quickly grown in recent years, and automatic Pilot, robot and unmanned plane are all typical unmanned systems.Vision SLAM (Simultaneous Localization and Mapping is positioned immediately and built figure) system has been widely used In the positioning and path planning of unmanned systems, the ORB-SLAM (Mur-Artal proposed in 2015 is such as equal to by Mur-Artal R,Montiel J M M,Tardós J D.ORB-SLAM:A Versatile and Accurate Monocular SLAM System[J].IEEE Transactions on Robotics,2015,31(5):1147-116).Institute in vision SLAM system The spatial network map of foundation only includes low-level information, such as color information and range information, is unfavorable for robot pair in this way The understanding of current scene, so we introduce the semantic segmentation net based on deep learning in the building process of vision SLAM system Network realizes robot to the semanteme and scene understanding of current scene.
The purpose of semantic segmentation is to realize the Accurate Segmentation between all kinds of targets for scene understanding, be can be used for certainly Driving or robot are moved to help to identify target and relationship by objective (RBO), the DeepLab depth nerve such as proposed by GoogLe company Network structure be now widely used for semantic segmentation field (L.-C.Chen, G.Papandreou, I.Kokkinos, K.Murphy,and A.L.Yuille.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs.arXiv: 1606.00915,2016) but due to the general semantics segmentation network query function real-time it is poor, be difficult to apply in embedded systems. Meanwhile semantic segmentation can also carry out situations such as unobvious belt edge contours segmentation, erroneous detection and missing inspection.
Semantic segmentation is applied and is built in figure in vision SLAM system semantics by we, so that the spatial network established Each of figure network coordinate point all has advanced semantic information, and robot is allowed to manage current scene target with semantic class Solution, and error brought by semantic segmentation is optimized by space manifold clustering algorithm, so that the semantic map of building is more quasi- Really.
Summary of the invention
The purpose of the present invention is overcoming the above-mentioned prior art, provide a kind of by deep learning and vision SLAM Combine, make robot to scene objects have semantic class understand, reduce semantic segmentation error based on empty convolution depth Neural fusion vision SLAM semanteme builds the method and system of figure function.
To achieve the goals above, of the invention to realize that vision SLAM semanteme builds figure based on empty convolution deep neural network The method and system of function are as follows:
This realizes the method that vision SLAM semanteme builds figure function, main feature based on empty convolution deep neural network Be, the method the following steps are included:
(1) embedded development processor obtains the colour information and depth information of current environment by RGB-D camera;
(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud number According to;
(3) Pixel-level semantic segmentation is carried out to image using deep learning, is reflected by image coordinate system and world coordinate system It penetrates, and makes spatial point that there is semantic tagger information;
(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation;
(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud being made of intensive discrete point semantically Figure.
Wherein, the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.
Preferably, the step (2) the following steps are included:
(2.1) image characteristic point is extracted by vision SLAM technology, carries out characteristic matching and obtains Feature Points Matching pair;
(2.2) by 3D point to the solution current pose of camera;
(2.3) more accurate pose estimation is carried out by the method for figure optimization Bundle Adjustment;
(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data.
Preferably, in the step (3) to image carry out Pixel-level semantic segmentation specifically includes the following steps:
(3.1) pass through the feature extraction layer based on the GoogLeNet for improving empty convolution;
(3.2) pass through the multiple dimensioned extract layer based on the GoogLeNet for improving empty convolution;
(3.3) classified according to extraction result to image.
Preferably, the step (3.1) further includes the design process of feature extraction layer, specifically includes the following steps:
Maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1 by (3.1.1);
(3.1.2) is by Inception (4a), Inception (4b), Inception in GoogLeNet network structure (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2;
Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3).
Preferably, the step (3.2) further includes the design process of multiple dimensioned extract layer, specifically includes the following steps:
(3.2.1) is based on spatial pyramid pondization and carries out multiple dimensioned processing;
(3.2.2) extracts the characteristic image of different scale by the empty convolution of 1 × 1 convolution sum difference sample rate;
(3.2.3) blending image pond feature is merged into module, by the characteristic image by 1 × 1 convolution To feature, and it is put into Softmax layers of progress pixel semantic classification.
Preferably, the step (4) specifically includes the following steps:
(4.1) the Surface by Tangent Plane Method vector of spatial point is calculated;
(4.2) the point x of unassigned classification is searched fori, judge whether that all the points have clustered, if it is, continuing step (4.5);Otherwise, xiClassification is c=c+1, and creates empty queue q;
(4.3) spatial point x is calculatediSurface by Tangent Plane Method vector viWith away from it less than all the points x in 0.01 rangejNormal vector vjAngle αij, judge whether there is αij< σ or αij175 ° of >, if it is, xjAnd xiIt is classified as one kind, xjClassification is c, and By the x for the condition that meetsjIt is pressed into queue q;Otherwise, continue step (4.4);
(4.4) judge queue q whether non-empty, if it is, enabling xi=q1, continue step (4.3);Otherwise continue step (4.1);
(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle.
Preferably, the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), specifically:
The Surface by Tangent Plane Method vector of spatial point is calculated according to the following formula:
Wherein, w ∈ R3×1For the unit normal vector of the plane, a is characterized value.
Preferably, the step (5) the following steps are included:
(5.1) according to the precision characteristic of RGB-D camera, the too big or invalid point cloud of depth value is removed;
(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of space Point apart from mean value, remove the spatial point excessive apart from mean value;
(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice is only Retain a spatial point.
This is based on what empty convolution deep neural network realization vision SLAM semanteme built figure function based on the above method System, is mainly characterized by, the system includes:
Embedded development processor, for constructing vision SLAM semanteme map;
RGB-D camera is connected, for acquiring color data and depth data with the embedded development processor;
Graph builder, the graph builder according to deep learning and vision SLAM, pass through embedded development at runtime Processor and RGB-D camera realize that vision SLAM semanteme builds figure, specifically follow the steps below processing:
(1) embedded development processor obtains the colour information and depth information of current environment by RGB-D camera;
(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud number According to;
(3) Pixel-level semantic segmentation is carried out to image using deep learning, is reflected by image coordinate system and world coordinate system It penetrates, and makes spatial point that there is semantic tagger information;
(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation;
(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud being made of intensive discrete point semantically Figure.
Preferably, the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.
Preferably, the step (2) the following steps are included:
(2.1) image characteristic point is extracted by vision SLAM technology, carries out characteristic matching and obtains Feature Points Matching pair;
(2.2) by 3D point to the solution current pose of camera;
(2.3) more accurate pose estimation is carried out by the method for figure optimization Bundle Adjustment;
(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data.
Preferably, in the step (3) to image carry out Pixel-level semantic segmentation specifically includes the following steps:
(3.1) pass through the feature extraction layer based on the GoogLeNet for improving empty convolution;
(3.2) pass through the multiple dimensioned extract layer based on the GoogLeNet for improving empty convolution;
(3.3) classified according to extraction result to image.
Preferably, the step (3.1) further includes the design process of feature extraction layer, specifically includes the following steps:
Maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1 by (3.1.1);
(3.1.2) is by Inception (4a), Inception (4b), Inception in GoogLeNet network structure (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2;
Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3).
Preferably, the step (3.2) further includes the design process of multiple dimensioned extract layer, specifically includes the following steps:
(3.2.1) is based on spatial pyramid pondization and carries out multiple dimensioned processing;
(3.2.2) extracts the characteristic image of different scale by the empty convolution of 1 × 1 convolution sum difference sample rate;
(3.2.3) blending image pond feature is merged into module, by the characteristic image by 1 × 1 convolution To feature, and it is put into Softmax layers of progress pixel semantic classification.
Preferably, the step (4) specifically includes the following steps:
(4.1) the Surface by Tangent Plane Method vector of spatial point is calculated;
(4.2) the point x of unassigned classification is searched fori, judge whether that all the points have clustered, if it is, continuing step (4.5);Otherwise, xiClassification is c=c+1, and creates empty queue q;
(4.3) spatial point x is calculatediSurface by Tangent Plane Method vector viWith away from it less than all the points x in 0.01 rangejNormal vector vjAngle αij, judge whether there is αij< σ or αij175 ° of >, if it is, xjAnd xiIt is classified as one kind, xjClassification is c, and By the x for the condition that meetsjIt is pressed into queue q;Otherwise, continue step (4.4);
(4.4) judge queue q whether non-empty, if it is, enabling xi=q1, continue step (4.3);Otherwise continue step (4.1);
(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle.
Preferably, the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), specifically:
The Surface by Tangent Plane Method vector of spatial point is calculated according to the following formula:
Wherein, w ∈ R3×1For the unit normal vector of the plane, a is characterized value.
Preferably, the step (5) the following steps are included:
(5.1) according to the precision characteristic of RGB-D camera, the too big or invalid point cloud of depth value is removed;
(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of space Point apart from mean value, remove the spatial point excessive apart from mean value;
(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice is only Retain a spatial point.
Using the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme And system, system use embedded development processor, by the collected color data of RGB-D camera and depth data, benefit With vision SLAM technology, image characteristic point is extracted, carries out characteristic matching, the method for recycling Bundle Adjustment obtains More accurate robot pose estimation, the cumulative errors for eliminating interframe are detected using winding.Letter is positioned in real time obtaining robot It is refreshing using depth is improved using a kind of empty convolution design method for GoogLeNet deep neural network while breath Semantic segmentation result combination vision SLAM system is obtained building for semantic class by the feature extraction through the real-time semantic segmentation of network implementations Figure.And clustered by manifold and eliminate error brought by optimization semantic segmentation, after building figure by Octree, spatial network map tool There is more advanced semantic information, and the semantic map constructed is more accurate.The improvement of network improves the real-time place of system Time loss of the semantic segmentation network of reason ability, this method and system on NVIDIA Jetson TX2 platform is 0.099s/ Width meets use demand during building figure in real time.
Detailed description of the invention
Fig. 1 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Flow chart.
Fig. 2 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Semantic segmentation flow chart.
Fig. 3 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Empty convolution schematic diagram.
Fig. 4 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Experimental result schematic diagram.
Fig. 5 be it is of the invention based on empty convolution deep neural network realize vision SLAM semanteme build figure function method and The NVIDIA Jetson TX2 processor schematic diagram of system.
Specific embodiment
It is further to carry out combined with specific embodiments below in order to more clearly describe technology contents of the invention Description.
This realizes the method that vision SLAM semanteme builds figure function based on empty convolution deep neural network, wherein described Method the following steps are included:
(1) embedded development processor obtains the colour information and depth information of current environment by RGB-D camera;
(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud number According to;
(2.1) image characteristic point is extracted by vision SLAM technology, carries out characteristic matching and obtains Feature Points Matching pair;
(2.2) by 3D point to the solution current pose of camera;
(2.3) more accurate pose estimation is carried out by the method for figure optimization Bundle Adjustment;
(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data;
(3) Pixel-level semantic segmentation is carried out to image using deep learning, is reflected by image coordinate system and world coordinate system It penetrates, and makes spatial point that there is semantic tagger information;
(3.1) pass through the feature extraction layer based on the GoogLeNet for improving empty convolution;
Maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1 by (3.1.1);
(3.1.2) is by Inception (4a), Inception (4b), Inception in GoogLeNet network structure (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2;
Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3);
(3.2) pass through the multiple dimensioned extract layer based on the GoogLeNet for improving empty convolution;
(3.2.1) is based on spatial pyramid pondization and carries out multiple dimensioned processing;
(3.2.2) extracts the characteristic image of different scale by the empty convolution of 1 × 1 convolution sum difference sample rate;
(3.2.3) blending image pond feature is merged into module, by the characteristic image by 1 × 1 convolution To feature, and it is put into Softmax layers of progress pixel semantic classification;
(3.3) classified according to extraction result to image;
(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation;
(4.1) the Surface by Tangent Plane Method vector of spatial point is calculated;
(4.2) the point x of unassigned classification is searched fori, judge whether that all the points have clustered, if it is, continuing step (4.5);Otherwise, xiClassification is c=c+1, and creates empty queue q;
(4.3) spatial point x is calculatediSurface by Tangent Plane Method vector viWith away from it less than all the points x in 0.01 rangejNormal vector vjAngle αij, judge whether there is αij< σ or αij175 ° of >, if it is, xjAnd xiIt is classified as one kind, xjClassification is c, and By the x for the condition that meetsjIt is pressed into queue q;Otherwise, continue step (4.4);
(4.4) judge queue q whether non-empty, if it is, enabling xi=q1, continue step (4.3);Otherwise continue step (4.1);
(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle;
(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud being made of intensive discrete point semantically Figure;
(5.1) according to the precision characteristic of RGB-D camera, the too big or invalid point cloud of depth value is removed;
(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of space Point apart from mean value, remove the spatial point excessive apart from mean value;
(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice is only Retain a spatial point.
As the preferred embodiment of the present invention, the embeded processor in the step (1) includes NVIDIAJETSON TX2 system.
As the preferred embodiment of the present invention, the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), Specifically:
The Surface by Tangent Plane Method vector of spatial point is calculated according to the following formula:
Wherein, w ∈ R3×1For the unit normal vector of the plane, a is characterized value.
This is based on what empty convolution deep neural network realization vision SLAM semanteme built figure function based on the above method System, wherein the system includes:
Embedded development processor, for constructing vision SLAM semanteme map;
RGB-D camera is connected, for acquiring color data and depth data with the embedded development processor;
Graph builder, the graph builder according to deep learning and vision SLAM, pass through embedded development at runtime Processor and RGB-D camera realize that vision SLAM semanteme builds figure, specifically follow the steps below processing:
(1) embedded development processor obtains the colour information and depth information of current environment by RGB-D camera;
(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud number According to;
(2.1) image characteristic point is extracted by vision SLAM technology, carries out characteristic matching and obtains Feature Points Matching pair;
(2.2) by 3D point to the solution current pose of camera;
(2.3) more accurate pose estimation is carried out by the method for figure optimization Bundle Adjustment;
(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data;
(3) Pixel-level semantic segmentation is carried out to image using deep learning, is reflected by image coordinate system and world coordinate system It penetrates, and makes spatial point that there is semantic tagger information;
(3.1) pass through the feature extraction layer based on the GoogLeNet for improving empty convolution;
Maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1 by (3.1.1);
(3.1.2) is by Inception (4a), Inception (4b), Inception in GoogLeNet network structure (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2;
Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3);
(3.2) pass through the multiple dimensioned extract layer based on the GoogLeNet for improving empty convolution;
(3.2.1) is based on spatial pyramid pondization and carries out multiple dimensioned processing;
(3.2.2) extracts the characteristic image of different scale by the empty convolution of 1 × 1 convolution sum difference sample rate;
(3.2.3) blending image pond feature is merged into module, by the characteristic image by 1 × 1 convolution To feature, and it is put into Softmax layers of progress pixel semantic classification;
(3.3) classified according to extraction result to image;
(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation;
(4.1) the Surface by Tangent Plane Method vector of spatial point is calculated;
(4.2) the point x of unassigned classification is searched fori, judge whether that all the points have clustered, if it is, continuing step (4.5);Otherwise, xiClassification is c=c+1, and creates empty queue q;
(4.3) spatial point x is calculatediSurface by Tangent Plane Method vector viWith away from it less than all the points x in 0.01 rangejNormal vector vjAngle αij, judge whether there is αij< σ or αij175 ° of >, if it is, xjAnd xiIt is classified as one kind, xjClassification is c, and By the x for the condition that meetsjIt is pressed into queue q;Otherwise, continue step (4.4);
(4.4) judge queue q whether non-empty, if it is, enabling xi=q1, continue step (4.3);Otherwise continue step (4.1);
(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle;
(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud being made of intensive discrete point semantically Figure;
(5.1) according to the precision characteristic of RGB-D camera, the too big or invalid point cloud of depth value is removed;
(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of space Point apart from mean value, remove the spatial point excessive apart from mean value;
(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice is only Retain a spatial point.
As the preferred embodiment of the present invention, the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.
As the preferred embodiment of the present invention, the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), Specifically:
The Surface by Tangent Plane Method vector of spatial point is calculated according to the following formula:
Wherein, w ∈ R3×1For the unit normal vector of the plane, a is characterized value.
In a specific embodiment of the invention, the present invention relates to the technology necks that unmanned robot system positioned and built in real time figure Domain is that a kind of vision SLAM semanteme based on empty convolution deep neural network builds drawing method and system.System is using embedded Development process device, by extracting figure using vision SLAM technology to the collected color data of RGB-D camera and depth data As characteristic point, characteristic matching is carried out, the method for recycling Bundle Adjustment obtains more accurate robot pose and estimates Meter detects the cumulative errors for eliminating interframe using winding.While obtaining robot real-time positioning information, a kind of needle is used To the empty convolution design method of GoogLeNet deep neural network, semantic in real time point is realized using deep neural network is improved Semantic segmentation result combination vision SLAM system is obtained the figure of building of semantic class, and is disappeared by manifold cluster by the feature extraction cut Except error brought by optimization semantic segmentation, after building figure by Octree, spatial network map has more advanced semantic information, And the semantic map constructed is more accurate.
The method that figure function is built based on empty convolution deep neural network realization vision SLAM semanteme based on above system, Wherein, comprising the following steps:
(1) embedded development processor is used, the colour information of current environment is obtained by RGB-D camera and depth is believed Breath;
(2) image characteristic point is extracted using vision SLAM technology to the image collected by camera, carries out feature With obtaining Feature Points Matching pair;Using 3D point to the solution current pose of camera;Utilize the side of figure optimization Bundle Adjustment Method carries out more accurate pose estimation;The cumulative errors for eliminating interframe are detected using winding, and obtain scene space point cloud data;
(3) Pixel-level semantic segmentation is carried out to image using deep learning, utilizes image coordinate system and world coordinate system Relationship map is into space, so that each spatial point has semantic tagger information;
(4) using manifold cluster optimization semantic segmentation bring error;
(5) it carries out semanteme and builds figure, spatial point cloud is spliced, finally obtain the point being made of intensive discrete point Cloud semanteme map.
In the above-described example, embeded processor described in the step (1) includes NVIDIA Jetson TX2 system And same category of device.
In the above-described example, general vision SLAM and its local improvement technology are used in the step (2).
In the above-described example, semantic segmentation network is specifically included with flowering structure in the step (3):
(31) feature extraction layer;
(32) multiple dimensioned extract layer;
(33) classification layer;
In the above-described example, feature extraction layer described in the step (31) is specifically included with flowering structure:
(311) front end features extract layer of the GoogLeNet network structure as DeepLab model is used;
(312) the maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1, from And characteristic dimension is expanded, guarantee that output resolution ratio is constant;
(313) Inception (4a) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension;
(314) Inception (4b) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension;
(315) Inception (4c) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension;
(316) Inception (4d) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension;
(317) Inception (4e) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension;
(316) the maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1, from And characteristic dimension is expanded, guarantee that output resolution ratio is constant;
Wherein having a size of 224, feature Output Size is 7 for original GoogLeNet input, is equivalent to and reduces 32 times, will The step-length of two layers of pond layer is revised as 1 afterwards, and original common convolution is revised as empty convolution, in this way for input having a size of 321, the Output Size of characteristic pattern is 41, is equivalent to and reduces 8 times, to expand characteristic dimension.
In the above-described example, multiple dimensioned layer described in the step (32) is specifically included with flowering structure:
(321) based on spatial pyramid pondization mode carries out multiple dimensioned processing;
(322) spatial pyramid pond model is optimized, uses 1 × 1 convolution and different sample rates (6,12,18) Empty convolution extract different scale receptive field feature;
(323) by image pond Fusion Features into module, then obtained characteristic image is all passed through to 1 × 1 volume Fusion (Concat) obtains feature to the end after product, places into Softmax layers of progress pixel semantic classification.
In the above-described example, manifold described in described ground step (4) cluster specifically includes the following steps:
(41) the Surface by Tangent Plane Method vector of each spatial point is calculated, if currently cluster classification c=0;
(42) the point x for being assigned classification not yet is searched foriIf all the points have clustered, (85) are thened follow the steps, Otherwise, if xiClassification is c=c+1, and creates an empty queue q;
(43) spatial point x is calculatediSurface by Tangent Plane Method vector viWith its distance less than all the points x in 0.01 rangejNormal vector vjAngle αijIf αij< σ or αij175 ° of >, then xjAnd xiIt is classified as one kind, xjClassification is c, and by the x for the condition that meetsj It is pressed into queue q;
(44) if queue q non-empty, enables xi=q1, step 3 is continued to execute, step 1 is otherwise jumped to;
(45) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle.
Wherein, in step (41) Surface by Tangent Plane Method vector calculating step are as follows:
If n spatial point forms matrixCovariance matrix ∑=E [(X- μ) (X- μ) of XT]
If w ∈ R3×1For the unit normal vector of this plane, Z=wTX is throwing of this n point on this unit normal vector Shadow length, establishes model:
s.t.wTW=1
It is solved using method of Lagrange multipliers:
Partial derivative is asked to obtain above formula:
W needs are unitization, a corresponding eigenvalue in above formula hasAnd And covariance matrix is positive semidefinite matrix, so space vector w is the smallest unit of the more corresponding eigenvalues of ∑ of covariance matrix Feature vector.
In the above-described example, build nomography described in the step (5) specifically includes the following steps:
(51) when generating each frame point cloud information, according to the precision characteristic of RGB-D camera, remove that depth value is too big or nothing The point cloud of effect;
(52) spatial point isolated using the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of space Point apart from mean value, remove the spatial point excessive apart from mean value and eliminate isolated noise spot to retain dense space point;
(53) space lattice principle is utilized, all spatial point clouds are filled into space lattice, guarantee each space networks Lattice only retain a spatial point, be equivalent to spatial point cloud carry out it is down-sampled, to save many memory spaces.
Wherein, spatial network map is established using octotree data structure.
For a spatial cuboids, eight regions are classified as, identical, each subregion continues to be divided into eight areas Domain dynamically creates an Octree map in this way.
With reference to the accompanying drawing and specific embodiment is discussed in detail, the vision of the invention based on empty convolutional neural networks SLAM semanteme builds drawing method.
Vision SLAM semanteme based on empty convolutional neural networks builds drawing method and system flow as shown in Figure 1:
The image data acquired by RGB-D camera selects the not high frame of similarity as key frame, and key frame includes coloured silk Chromatic graph picture, depth image and current pose carry out semantic segmentation to color image, first by using based on the empty convolution of improvement GoogLeNet feature extraction layer, multiple dimensioned layer obtains original semantic point cloud.Operation is filtered to original semantic point cloud, Popular cluster is carried out in conjunction with depth image, finally combines posture information to carry out Octree together and builds figure, the improvement of network improves The processing capability in real time of system, can on the embedded platform based on NVIDIAJETSON TX2 real-time implementation.
Vision SLAM semanteme based on empty convolutional neural networks is built in drawing method and system flow, and deep learning language is passed through Justice segmentation network obtains the semantic information of image, and system flow is as shown in Fig. 2, be broadly divided into feature extraction, multiple dimensioned extraction With three parts of classification.
Vision SLAM semanteme based on empty convolutional neural networks builds cavity convolution used in drawing method and system flow It is as shown in Figure 3:
Convolution sum pond is regarded as operation of the same race, it is assumed that intermediate violet spot part is general as input, the green portion of figure Logical convolution process obtains feature after step-length is respectively 2,1,2,1 convolution (or pond) process.The characteristic point of top layer Corresponding receptive field is entire input layer.
In order to expand characteristic size, step-length is all changed to 1, first layer volume for pink colour part in figure using empty convolution After product step-size change, enabling dilation is 1, and obtained number of features expands twice, when carrying out second layer convolution operation, is enabled Dilation is 2, that is, when doing convolution operation, is spaced 1 point and convolution nuclear convolution, obtains the two of feature or original common convolution Times, and the receptive field of characteristic point is constant, continues third layer convolution operation, is changed to 1 with by step-length, in order to keep identical sense By open country, dilation at this time equally should be 2.In the 4th layer of convolution operation, dilation will just be able to maintain sense for 4 at this time By wild constant.
It is needed to pay attention to when using empty convolution:
S1. in the step-length of upper one layer of convolution operation by strideoldBecome stridenew, in order to keep receptive field constant, connect The convolution layer operation for getting off all will carry out voidageConvolution with holes;
S2. the voidage of current layer cavity convolution operation such as following formula.
The wherein front layer step-size change number of N representative,For the change of n-th step-length.
It is as shown in Figure 4 that vision SLAM semanteme based on empty convolutional neural networks builds figure result.Image is at two in figure It is being tested in scene as a result, it is left be office scenarios, the right side be laboratory scene.What first behavior this system exported in figure has language Adopted information builds figure as a result, wherein chair, people, plant are respectively with red, pink colour, green mark;Second behavior Conventional visual What SLAM was established builds figure result without semantic information.The experimental results showed that the present invention, which can be such that robot is well understood by, works as front court Main target in scape.Software and the algorithm according to the present invention item on NVIDIA Jetson TX2 embedded platform, Its processor diagram is as shown in Figure 5.
Using the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme And system, system use embedded development processor, by the collected color data of RGB-D camera and depth data, benefit With vision SLAM technology, image characteristic point is extracted, carries out characteristic matching, the method for recycling Bundle Adjustment obtains More accurate robot pose estimation, the cumulative errors for eliminating interframe are detected using winding.Letter is positioned in real time obtaining robot It is refreshing using depth is improved using a kind of empty convolution design method for GoogLeNet deep neural network while breath Semantic segmentation result combination vision SLAM system is obtained building for semantic class by the feature extraction through the real-time semantic segmentation of network implementations Figure.And clustered by manifold and eliminate error brought by optimization semantic segmentation, after building figure by Octree, spatial network map tool There is more advanced semantic information, and the semantic map constructed is more accurate.The improvement of network improves the real-time place of system Time loss of the semantic segmentation network of reason ability, this method and system on NVIDIA Jetson TX2 platform is 0.099s/ Width meets use demand during building figure in real time.
In this description, the present invention is described with reference to its specific embodiment.But it is clear that can still make Various modifications and alterations are without departing from the spirit and scope of the invention.Therefore, the description and the appended drawings should be considered as illustrative And not restrictive.

Claims (18)

1. a kind of method for realizing that vision SLAM semanteme builds figure function based on empty convolution deep neural network, which is characterized in that The method the following steps are included:
(1) embedded development processor obtains the colour information and depth information of current environment by RGB-D camera;
(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud data;
(3) Pixel-level semantic segmentation is carried out to image using deep learning, is mapped by image coordinate system and world coordinate system, and So that spatial point has semantic tagger information;
(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation;
(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud semanteme map being made of intensive discrete point.
2. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.
3. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (2) the following steps are included:
(2.1) image characteristic point is extracted by vision SLAM technology, carries out characteristic matching and obtains Feature Points Matching pair;
(2.2) by 3D point to the solution current pose of camera;
(2.3) more accurate pose estimation is carried out by the method for figure optimization Bundle Adjustment;
(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data.
4. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that in the step (3) to image carry out Pixel-level semantic segmentation specifically includes the following steps:
(3.1) pass through the feature extraction layer based on the GoogLeNet for improving empty convolution;
(3.2) pass through the multiple dimensioned extract layer based on the GoogLeNet for improving empty convolution;
(3.3) classified according to extraction result to image.
5. according to claim 4 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (3.1) further includes the design process of feature extraction layer, specifically includes the following steps:
Maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1 by (3.1.1);
(3.1.2) by Inception (4a) in GoogLeNet network structure, Inception (4b), Inception (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2;
Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3).
6. according to claim 4 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (3.2) further includes the design process of multiple dimensioned extract layer, specifically includes the following steps:
(3.2.1) is based on spatial pyramid pondization and carries out multiple dimensioned processing;
(3.2.2) extracts the characteristic image of different scale by the empty convolution of 1 × 1 convolution sum difference sample rate;
(3.2.3) blending image pond feature merges to obtain spy into module, by the characteristic image by 1 × 1 convolution Sign, and it is put into Softmax layers of progress pixel semantic classification.
7. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (4) specifically includes the following steps:
(4.1) the Surface by Tangent Plane Method vector of spatial point is calculated;
(4.2) the point x of unassigned classification is searched fori, judge whether that all the points have clustered, if it is, continuing step (4.5); Otherwise, xiClassification is c=c+1, and creates empty queue q;
(4.3) spatial point x is calculatediSurface by Tangent Plane Method vector viWith away from it less than all the points x in 0.01 rangejNormal vector vjFolder Angle αij, judge whether there is αij< σ or αij175 ° of >, if it is, xjAnd xiIt is classified as one kind, xjClassification is c, and will be met The x of conditionjIt is pressed into queue q;Otherwise, continue step (4.4);
(4.4) judge queue q whether non-empty, if it is, enabling xi=q1, continue step (4.3);Otherwise continue step (4.1);
(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle.
8. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), specifically:
The Surface by Tangent Plane Method vector of spatial point is calculated according to the following formula:
Wherein, w ∈ R3×1For the unit normal vector of the plane, a is characterized value.
9. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (5) the following steps are included:
(5.1) according to the precision characteristic of RGB-D camera, the too big or invalid point cloud of depth value is removed;
(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of spatial point Apart from mean value, the spatial point excessive apart from mean value is removed;
(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice only retains One spatial point.
10. a kind of system for realizing that vision SLAM semanteme builds figure function based on empty convolution deep neural network, which is characterized in that The system includes:
Embedded development processor, for constructing vision SLAM semanteme map;
RGB-D camera is connected, for acquiring color data and depth data with the embedded development processor;
Graph builder, the graph builder according to deep learning and vision SLAM, are handled at runtime by embedded development Device and RGB-D camera realize that vision SLAM semanteme builds figure, specifically follow the steps below processing:
(1) embedded development processor obtains the colour information and depth information of current environment by RGB-D camera;
(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud data;
(3) Pixel-level semantic segmentation is carried out to image using deep learning, is mapped by image coordinate system and world coordinate system, and So that spatial point has semantic tagger information;
(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation;
(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud semanteme map being made of intensive discrete point.
11. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.
12. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (2) the following steps are included:
(2.1) image characteristic point is extracted by vision SLAM technology, carries out characteristic matching and obtains Feature Points Matching pair;
(2.2) by 3D point to the solution current pose of camera;
(2.3) more accurate pose estimation is carried out by the method for figure optimization Bundle Adjustment;
(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data.
13. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that in the step (3) to image carry out Pixel-level semantic segmentation specifically includes the following steps:
(3.1) pass through the feature extraction layer based on the GoogLeNet for improving empty convolution;
(3.2) pass through the multiple dimensioned extract layer based on the GoogLeNet for improving empty convolution;
(3.3) classified according to extraction result to image.
14. according to claim 13 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (3.1) further includes the design process of feature extraction layer, specifically includes the following steps:
Maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1 by (3.1.1);
(3.1.2) by Inception (4a) in GoogLeNet network structure, Inception (4b), Inception (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2;
Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3).
15. according to claim 13 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (3.2) further includes the design process of multiple dimensioned extract layer, specifically includes the following steps:
(3.2.1) is based on spatial pyramid pondization and carries out multiple dimensioned processing;
(3.2.2) extracts the characteristic image of different scale by the empty convolution of 1 × 1 convolution sum difference sample rate;
(3.2.3) blending image pond feature merges to obtain spy into module, by the characteristic image by 1 × 1 convolution Sign, and it is put into Softmax layers of progress pixel semantic classification.
16. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (4) specifically includes the following steps:
(4.1) the Surface by Tangent Plane Method vector of spatial point is calculated;
(4.2) the point x of unassigned classification is searched fori, judge whether that all the points have clustered, if it is, continuing step (4.5); Otherwise, xiClassification is c=c+1, and creates empty queue q;
(4.3) spatial point x is calculatediSurface by Tangent Plane Method vector viWith away from it less than all the points x in 0.01 rangejNormal vector vjFolder Angle αij, judge whether there is αij< σ or αij175 ° of >, if it is, xjAnd xiIt is classified as one kind, xjClassification is c, and will be met The x of conditionjIt is pressed into queue q;Otherwise, continue step (4.4);
(4.4) judge queue q whether non-empty, if it is, enabling xi=q1, continue step (4.3);Otherwise continue step (4.1);
(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle.
17. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), specifically:
The Surface by Tangent Plane Method vector of spatial point is calculated according to the following formula:
Wherein, w ∈ R3×1For the unit normal vector of the plane, a is characterized value.
18. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (5) the following steps are included:
(5.1) according to the precision characteristic of RGB-D camera, the too big or invalid point cloud of depth value is removed;
(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of spatial point Apart from mean value, the spatial point excessive apart from mean value is removed;
(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice only retains One spatial point.
CN201811388678.6A 2018-09-18 2018-11-21 Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network Active CN109559320B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2018110885315 2018-09-18
CN201811088531 2018-09-18

Publications (2)

Publication Number Publication Date
CN109559320A true CN109559320A (en) 2019-04-02
CN109559320B CN109559320B (en) 2022-11-18

Family

ID=65866933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811388678.6A Active CN109559320B (en) 2018-09-18 2018-11-21 Method and system for realizing visual SLAM semantic mapping function based on hole convolution deep neural network

Country Status (1)

Country Link
CN (1) CN109559320B (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046677A (en) * 2019-04-26 2019-07-23 山东大学 Data preprocessing method, map constructing method, winding detection method and system
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN110146098A (en) * 2019-05-06 2019-08-20 北京猎户星空科技有限公司 A kind of robot map enlargement method, device, control equipment and storage medium
CN110146099A (en) * 2019-05-31 2019-08-20 西安工程大学 A kind of synchronous superposition method based on deep learning
CN110197215A (en) * 2019-05-22 2019-09-03 深圳市牧月科技有限公司 A kind of ground perception point cloud semantic segmentation method of autonomous driving
CN110264572A (en) * 2019-06-21 2019-09-20 哈尔滨工业大学 A kind of terrain modeling method and system merging geometrical property and mechanical characteristic
CN110276286A (en) * 2019-06-13 2019-09-24 中国电子科技集团公司第二十八研究所 A kind of embedded panoramic video splicing system based on TX2
CN110297491A (en) * 2019-07-02 2019-10-01 湖南海森格诺信息技术有限公司 Semantic navigation method and its system based on multiple structured light binocular IR cameras
CN110363178A (en) * 2019-07-23 2019-10-22 上海黑塞智能科技有限公司 The airborne laser point cloud classification method being embedded in based on part and global depth feature
CN110378345A (en) * 2019-06-04 2019-10-25 广东工业大学 Dynamic scene SLAM method based on YOLACT example parted pattern
CN110533716A (en) * 2019-08-20 2019-12-03 西安电子科技大学 A kind of semantic SLAM system and method based on 3D constraint
CN110544307A (en) * 2019-08-29 2019-12-06 广州高新兴机器人有限公司 Semantic map construction method based on convolutional neural network and computer storage medium
CN110619299A (en) * 2019-09-12 2019-12-27 北京影谱科技股份有限公司 Object recognition SLAM method and device based on grid
CN110781262A (en) * 2019-10-21 2020-02-11 中国科学院计算技术研究所 Semantic map construction method based on visual SLAM
CN110827305A (en) * 2019-10-30 2020-02-21 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN110910405A (en) * 2019-11-20 2020-03-24 湖南师范大学 Brain tumor segmentation method and system based on multi-scale cavity convolutional neural network
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN111670417A (en) * 2019-07-05 2020-09-15 深圳市大疆创新科技有限公司 Semantic map construction method, semantic map construction system, mobile platform and storage medium
CN111797938A (en) * 2020-07-15 2020-10-20 燕山大学 Semantic information and VSLAM fusion method for sweeping robot
CN113191367A (en) * 2021-05-25 2021-07-30 华东师范大学 Semantic segmentation method based on dense scale dynamic network
WO2021249575A1 (en) * 2020-06-09 2021-12-16 全球能源互联网研究院有限公司 Area semantic learning and map point identification method for power transformation operation scene
CN115240115A (en) * 2022-07-27 2022-10-25 河南工业大学 Visual SLAM loop detection method combining semantic features and bag-of-words model
CN116657348A (en) * 2023-06-02 2023-08-29 浙江正源丝绸科技有限公司 Silk pretreatment method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024262A (en) * 2011-01-06 2011-04-20 西安电子科技大学 Method for performing image segmentation by using manifold spectral clustering
CN105787510A (en) * 2016-02-26 2016-07-20 华东理工大学 System and method for realizing subway scene classification based on deep learning
CN107358189A (en) * 2017-07-07 2017-11-17 北京大学深圳研究生院 It is a kind of based on more object detecting methods under the indoor environments of Objective extraction
CN107480603A (en) * 2017-07-27 2017-12-15 大连和创懒人科技有限公司 Figure and method for segmenting objects are synchronously built based on SLAM and depth camera
CN108230337A (en) * 2017-12-31 2018-06-29 厦门大学 A kind of method that semantic SLAM systems based on mobile terminal are realized
CN109636905A (en) * 2018-12-07 2019-04-16 东北大学 Environment semanteme based on depth convolutional neural networks builds drawing method
CN111462135A (en) * 2020-03-31 2020-07-28 华东理工大学 Semantic mapping method based on visual S L AM and two-dimensional semantic segmentation
WO2021018690A1 (en) * 2019-07-31 2021-02-04 Continental Automotive Gmbh Method for determining an environmental model of a scene

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102024262A (en) * 2011-01-06 2011-04-20 西安电子科技大学 Method for performing image segmentation by using manifold spectral clustering
CN105787510A (en) * 2016-02-26 2016-07-20 华东理工大学 System and method for realizing subway scene classification based on deep learning
CN107358189A (en) * 2017-07-07 2017-11-17 北京大学深圳研究生院 It is a kind of based on more object detecting methods under the indoor environments of Objective extraction
CN107480603A (en) * 2017-07-27 2017-12-15 大连和创懒人科技有限公司 Figure and method for segmenting objects are synchronously built based on SLAM and depth camera
CN108230337A (en) * 2017-12-31 2018-06-29 厦门大学 A kind of method that semantic SLAM systems based on mobile terminal are realized
CN109636905A (en) * 2018-12-07 2019-04-16 东北大学 Environment semanteme based on depth convolutional neural networks builds drawing method
WO2021018690A1 (en) * 2019-07-31 2021-02-04 Continental Automotive Gmbh Method for determining an environmental model of a scene
CN111462135A (en) * 2020-03-31 2020-07-28 华东理工大学 Semantic mapping method based on visual S L AM and two-dimensional semantic segmentation

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
ANESTIS ZAGANIDIS等: "Integrating Deep Semantic Segmentation Into 3-D Point Cloud Registration", 《IEEE ROBOTICS AND AUTOMATION LETTERS》 *
SUPERVAN: "深度学习结合SLAM的研究思路/成果整理之", 《HTTPS://WWW.CNBLOGS.COM/CHAOFN/P/9334685.HTML》 *
YU ZHU等: "Real-Time Semantic Mapping of Visual SLAM Based on DCNN", 《COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE》 *
林智鹏等: "流形降维最小二乘回归子空间分割", 《信息技术与网络安全》 *
潘琢金等: "融合空洞卷积神经网络的语义SLAM研究", 《现代电子技术》 *
白云汉: "基于SLAM算法和深度神经网络的语义地图构建研究", 《计算机应用与软件》 *
英特尔中国研究院: "专栏|语义SLAM的重要性,你造吗?", 《HTTPS://ZHIDX.COM/P/92828.HTML》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097553A (en) * 2019-04-10 2019-08-06 东南大学 The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN110046677A (en) * 2019-04-26 2019-07-23 山东大学 Data preprocessing method, map constructing method, winding detection method and system
CN110146098A (en) * 2019-05-06 2019-08-20 北京猎户星空科技有限公司 A kind of robot map enlargement method, device, control equipment and storage medium
CN110146098B (en) * 2019-05-06 2021-08-20 北京猎户星空科技有限公司 Robot map extension method and device, control equipment and storage medium
CN110197215A (en) * 2019-05-22 2019-09-03 深圳市牧月科技有限公司 A kind of ground perception point cloud semantic segmentation method of autonomous driving
CN110146099A (en) * 2019-05-31 2019-08-20 西安工程大学 A kind of synchronous superposition method based on deep learning
CN110378345B (en) * 2019-06-04 2022-10-04 广东工业大学 Dynamic scene SLAM method based on YOLACT instance segmentation model
CN110378345A (en) * 2019-06-04 2019-10-25 广东工业大学 Dynamic scene SLAM method based on YOLACT example parted pattern
CN110276286B (en) * 2019-06-13 2022-03-04 中国电子科技集团公司第二十八研究所 Embedded panoramic video stitching system based on TX2
CN110276286A (en) * 2019-06-13 2019-09-24 中国电子科技集团公司第二十八研究所 A kind of embedded panoramic video splicing system based on TX2
CN110264572A (en) * 2019-06-21 2019-09-20 哈尔滨工业大学 A kind of terrain modeling method and system merging geometrical property and mechanical characteristic
CN110264572B (en) * 2019-06-21 2021-07-30 哈尔滨工业大学 Terrain modeling method and system integrating geometric characteristics and mechanical characteristics
CN110297491A (en) * 2019-07-02 2019-10-01 湖南海森格诺信息技术有限公司 Semantic navigation method and its system based on multiple structured light binocular IR cameras
CN111670417A (en) * 2019-07-05 2020-09-15 深圳市大疆创新科技有限公司 Semantic map construction method, semantic map construction system, mobile platform and storage medium
WO2021003587A1 (en) * 2019-07-05 2021-01-14 深圳市大疆创新科技有限公司 Semantic map building method and system, and movable platforms and storage medium
CN110363178A (en) * 2019-07-23 2019-10-22 上海黑塞智能科技有限公司 The airborne laser point cloud classification method being embedded in based on part and global depth feature
CN110363178B (en) * 2019-07-23 2021-10-15 上海黑塞智能科技有限公司 Airborne laser point cloud classification method based on local and global depth feature embedding
CN110533716A (en) * 2019-08-20 2019-12-03 西安电子科技大学 A kind of semantic SLAM system and method based on 3D constraint
CN110533716B (en) * 2019-08-20 2022-12-02 西安电子科技大学 Semantic SLAM system and method based on 3D constraint
CN110544307A (en) * 2019-08-29 2019-12-06 广州高新兴机器人有限公司 Semantic map construction method based on convolutional neural network and computer storage medium
CN110619299A (en) * 2019-09-12 2019-12-27 北京影谱科技股份有限公司 Object recognition SLAM method and device based on grid
CN110781262A (en) * 2019-10-21 2020-02-11 中国科学院计算技术研究所 Semantic map construction method based on visual SLAM
CN110827305A (en) * 2019-10-30 2020-02-21 中山大学 Semantic segmentation and visual SLAM tight coupling method oriented to dynamic environment
CN110910405B (en) * 2019-11-20 2023-04-18 湖南师范大学 Brain tumor segmentation method and system based on multi-scale cavity convolutional neural network
CN110910405A (en) * 2019-11-20 2020-03-24 湖南师范大学 Brain tumor segmentation method and system based on multi-scale cavity convolutional neural network
CN110956651A (en) * 2019-12-16 2020-04-03 哈尔滨工业大学 Terrain semantic perception method based on fusion of vision and vibrotactile sense
WO2021249575A1 (en) * 2020-06-09 2021-12-16 全球能源互联网研究院有限公司 Area semantic learning and map point identification method for power transformation operation scene
CN111797938B (en) * 2020-07-15 2022-03-15 燕山大学 Semantic information and VSLAM fusion method for sweeping robot
CN111797938A (en) * 2020-07-15 2020-10-20 燕山大学 Semantic information and VSLAM fusion method for sweeping robot
CN113191367A (en) * 2021-05-25 2021-07-30 华东师范大学 Semantic segmentation method based on dense scale dynamic network
CN115240115A (en) * 2022-07-27 2022-10-25 河南工业大学 Visual SLAM loop detection method combining semantic features and bag-of-words model
CN116657348A (en) * 2023-06-02 2023-08-29 浙江正源丝绸科技有限公司 Silk pretreatment method and system
CN116657348B (en) * 2023-06-02 2023-11-21 浙江正源丝绸科技有限公司 Silk pretreatment method and system

Also Published As

Publication number Publication date
CN109559320B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN109559320A (en) Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network
JP6830707B1 (en) Person re-identification method that combines random batch mask and multi-scale expression learning
CN106127204B (en) A kind of multi-direction meter reading Region detection algorithms of full convolutional neural networks
CN108052896B (en) Human body behavior identification method based on convolutional neural network and support vector machine
CN107392964B (en) The indoor SLAM method combined based on indoor characteristic point and structure lines
CN109740665A (en) Shielded image ship object detection method and system based on expertise constraint
CN110378281A (en) Group Activity recognition method based on pseudo- 3D convolutional neural networks
CN108734143A (en) A kind of transmission line of electricity online test method based on binocular vision of crusing robot
CN110097553A (en) The semanteme for building figure and three-dimensional semantic segmentation based on instant positioning builds drawing system
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
CN108960184A (en) A kind of recognition methods again of the pedestrian based on heterogeneous components deep neural network
CN109341703A (en) A kind of complete period uses the vision SLAM algorithm of CNNs feature detection
CN110378997A (en) A kind of dynamic scene based on ORB-SLAM2 builds figure and localization method
CN114972418A (en) Maneuvering multi-target tracking method based on combination of nuclear adaptive filtering and YOLOX detection
CN109035329A (en) Camera Attitude estimation optimization method based on depth characteristic
CN113221625A (en) Method for re-identifying pedestrians by utilizing local features of deep learning
US11361534B2 (en) Method for glass detection in real scenes
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
CN109816714A (en) A kind of point cloud object type recognition methods based on Three dimensional convolution neural network
CN110334584A (en) A kind of gesture identification method based on the full convolutional network in region
CN114241226A (en) Three-dimensional point cloud semantic segmentation method based on multi-neighborhood characteristics of hybrid model
Li et al. An aerial image segmentation approach based on enhanced multi-scale convolutional neural network
CN117197676A (en) Target detection and identification method based on feature fusion
CN114358133B (en) Method for detecting looped frames based on semantic-assisted binocular vision SLAM
CN111339967A (en) Pedestrian detection method based on multi-view graph convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant