CN109559320A

CN109559320A - Realize that vision SLAM semanteme builds the method and system of figure function based on empty convolution deep neural network

Info

Publication number: CN109559320A
Application number: CN201811388678.6A
Authority: CN
Inventors: 朱煜; 黄俊健; 陈旭东; 郑兵兵; 倪光耀
Original assignee: East China University of Science and Technology
Current assignee: East China University of Science and Technology
Priority date: 2018-09-18
Filing date: 2018-11-21
Publication date: 2019-04-02
Anticipated expiration: 2038-11-21
Also published as: CN109559320B

Abstract

The present invention relates to a kind of methods for building figure function based on empty convolution deep neural network realization vision SLAM semanteme, obtain the colour information and depth information of current environment by RGB-D camera including (1) embedded development processor；(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud data；(3) Pixel-level semantic segmentation is carried out to image using deep learning, is mapped by image coordinate system and world coordinate system, and make spatial point that there is semantic tagger information；(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation；(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud semanteme map being made of intensive discrete point.The invention further relates to a kind of systems for building figure function based on empty convolution deep neural network realization vision SLAM semanteme.Using this method and system, spatial network map has more advanced semantic information, more meets use demand during building figure in real time.

Description

Realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method and system

Technical field

The present invention relates to the semantic segmentation field that unmanned systems positioned and built in real time figure field more particularly to image procossing, Specifically refer to a kind of method and system that figure function is built based on empty convolution deep neural network realization vision SLAM semanteme.

Background technique

Unmanned systems are quickly grown in recent years, and automatic Pilot, robot and unmanned plane are all typical unmanned systems.Vision SLAM (Simultaneous Localization and Mapping is positioned immediately and built figure) system has been widely used In the positioning and path planning of unmanned systems, the ORB-SLAM (Mur-Artal proposed in 2015 is such as equal to by Mur-Artal R,Montiel J M M,Tardós J D.ORB-SLAM:A Versatile and Accurate Monocular SLAM System[J].IEEE Transactions on Robotics,2015,31(5):1147-116).Institute in vision SLAM system The spatial network map of foundation only includes low-level information, such as color information and range information, is unfavorable for robot pair in this way The understanding of current scene, so we introduce the semantic segmentation net based on deep learning in the building process of vision SLAM system Network realizes robot to the semanteme and scene understanding of current scene.

The purpose of semantic segmentation is to realize the Accurate Segmentation between all kinds of targets for scene understanding, be can be used for certainly Driving or robot are moved to help to identify target and relationship by objective (RBO), the DeepLab depth nerve such as proposed by GoogLe company Network structure be now widely used for semantic segmentation field (L.-C.Chen, G.Papandreou, I.Kokkinos, K.Murphy,and A.L.Yuille.Deeplab:Semantic image segmentation with deep convolutional nets,atrous convolution,and fully connected crfs.arXiv: 1606.00915,2016) but due to the general semantics segmentation network query function real-time it is poor, be difficult to apply in embedded systems. Meanwhile semantic segmentation can also carry out situations such as unobvious belt edge contours segmentation, erroneous detection and missing inspection.

Semantic segmentation is applied and is built in figure in vision SLAM system semantics by we, so that the spatial network established Each of figure network coordinate point all has advanced semantic information, and robot is allowed to manage current scene target with semantic class Solution, and error brought by semantic segmentation is optimized by space manifold clustering algorithm, so that the semantic map of building is more quasi- Really.

Summary of the invention

The purpose of the present invention is overcoming the above-mentioned prior art, provide a kind of by deep learning and vision SLAM Combine, make robot to scene objects have semantic class understand, reduce semantic segmentation error based on empty convolution depth Neural fusion vision SLAM semanteme builds the method and system of figure function.

To achieve the goals above, of the invention to realize that vision SLAM semanteme builds figure based on empty convolution deep neural network The method and system of function are as follows:

This realizes the method that vision SLAM semanteme builds figure function, main feature based on empty convolution deep neural network Be, the method the following steps are included:

(1) embedded development processor obtains the colour information and depth information of current environment by RGB-D camera；

(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud number According to；

(3) Pixel-level semantic segmentation is carried out to image using deep learning, is reflected by image coordinate system and world coordinate system It penetrates, and makes spatial point that there is semantic tagger information；

(4) it is clustered by manifold and eliminates error brought by optimization semantic segmentation；

(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud being made of intensive discrete point semantically Figure.

Wherein, the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.

Preferably, the step (2) the following steps are included:

(2.1) image characteristic point is extracted by vision SLAM technology, carries out characteristic matching and obtains Feature Points Matching pair；

(2.2) by 3D point to the solution current pose of camera；

(2.3) more accurate pose estimation is carried out by the method for figure optimization Bundle Adjustment；

(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data.

Preferably, in the step (3) to image carry out Pixel-level semantic segmentation specifically includes the following steps:

(3.1) pass through the feature extraction layer based on the GoogLeNet for improving empty convolution；

(3.2) pass through the multiple dimensioned extract layer based on the GoogLeNet for improving empty convolution；

(3.3) classified according to extraction result to image.

Preferably, the step (3.1) further includes the design process of feature extraction layer, specifically includes the following steps:

Maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1 by (3.1.1)；

(3.1.2) is by Inception (4a), Inception (4b), Inception in GoogLeNet network structure (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2；

Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3).

Preferably, the step (3.2) further includes the design process of multiple dimensioned extract layer, specifically includes the following steps:

(3.2.1) is based on spatial pyramid pondization and carries out multiple dimensioned processing；

(3.2.2) extracts the characteristic image of different scale by the empty convolution of 1 × 1 convolution sum difference sample rate；

(3.2.3) blending image pond feature is merged into module, by the characteristic image by 1 × 1 convolution To feature, and it is put into Softmax layers of progress pixel semantic classification.

Preferably, the step (4) specifically includes the following steps:

(4.1) the Surface by Tangent Plane Method vector of spatial point is calculated；

(4.2) the point x of unassigned classification is searched for_i, judge whether that all the points have clustered, if it is, continuing step (4.5)；Otherwise, x_iClassification is c=c+1, and creates empty queue q；

(4.3) spatial point x is calculated_iSurface by Tangent Plane Method vector v_iWith away from it less than all the points x in 0.01 range_jNormal vector v_jAngle α_ij, judge whether there is α_ij< σ or α_ij175 ° of >, if it is, x_jAnd x_iIt is classified as one kind, x_jClassification is c, and By the x for the condition that meets_jIt is pressed into queue q；Otherwise, continue step (4.4)；

(4.4) judge queue q whether non-empty, if it is, enabling x_i=q₁, continue step (4.3)；Otherwise continue step (4.1)；

(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle.

Preferably, the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), specifically:

The Surface by Tangent Plane Method vector of spatial point is calculated according to the following formula:

Wherein, w ∈ R^3×1For the unit normal vector of the plane, a is characterized value.

Preferably, the step (5) the following steps are included:

(5.1) according to the precision characteristic of RGB-D camera, the too big or invalid point cloud of depth value is removed；

(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of space Point apart from mean value, remove the spatial point excessive apart from mean value；

(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice is only Retain a spatial point.

This is based on what empty convolution deep neural network realization vision SLAM semanteme built figure function based on the above method System, is mainly characterized by, the system includes:

Embedded development processor, for constructing vision SLAM semanteme map；

RGB-D camera is connected, for acquiring color data and depth data with the embedded development processor；

Graph builder, the graph builder according to deep learning and vision SLAM, pass through embedded development at runtime Processor and RGB-D camera realize that vision SLAM semanteme builds figure, specifically follow the steps below processing:

Preferably, the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.

Preferably, the step (2) the following steps are included:

(2.2) by 3D point to the solution current pose of camera；

(3.3) classified according to extraction result to image.

Preferably, the step (4) specifically includes the following steps:

Preferably, the step (5) the following steps are included:

Using the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme And system, system use embedded development processor, by the collected color data of RGB-D camera and depth data, benefit With vision SLAM technology, image characteristic point is extracted, carries out characteristic matching, the method for recycling Bundle Adjustment obtains More accurate robot pose estimation, the cumulative errors for eliminating interframe are detected using winding.Letter is positioned in real time obtaining robot It is refreshing using depth is improved using a kind of empty convolution design method for GoogLeNet deep neural network while breath Semantic segmentation result combination vision SLAM system is obtained building for semantic class by the feature extraction through the real-time semantic segmentation of network implementations Figure.And clustered by manifold and eliminate error brought by optimization semantic segmentation, after building figure by Octree, spatial network map tool There is more advanced semantic information, and the semantic map constructed is more accurate.The improvement of network improves the real-time place of system Time loss of the semantic segmentation network of reason ability, this method and system on NVIDIA Jetson TX2 platform is 0.099s/ Width meets use demand during building figure in real time.

Detailed description of the invention

Fig. 1 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Flow chart.

Fig. 2 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Semantic segmentation flow chart.

Fig. 3 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Empty convolution schematic diagram.

Fig. 4 is the method for the invention for building figure function based on empty convolution deep neural network realization vision SLAM semanteme Experimental result schematic diagram.

Fig. 5 be it is of the invention based on empty convolution deep neural network realize vision SLAM semanteme build figure function method and The NVIDIA Jetson TX2 processor schematic diagram of system.

Specific embodiment

It is further to carry out combined with specific embodiments below in order to more clearly describe technology contents of the invention Description.

This realizes the method that vision SLAM semanteme builds figure function based on empty convolution deep neural network, wherein described Method the following steps are included:

(2.2) by 3D point to the solution current pose of camera；

(2.4) cumulative errors for eliminating interframe are detected by winding, and obtain scene space point cloud data；

Maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1 by (3.1.3)；

(3.2.3) blending image pond feature is merged into module, by the characteristic image by 1 × 1 convolution To feature, and it is put into Softmax layers of progress pixel semantic classification；

(3.3) classified according to extraction result to image；

(4.5) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle；

(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud being made of intensive discrete point semantically Figure；

As the preferred embodiment of the present invention, the embeded processor in the step (1) includes NVIDIAJETSON TX2 system.

As the preferred embodiment of the present invention, the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), Specifically:

This is based on what empty convolution deep neural network realization vision SLAM semanteme built figure function based on the above method System, wherein the system includes:

Embedded development processor, for constructing vision SLAM semanteme map；

(2.2) by 3D point to the solution current pose of camera；

(3.3) classified according to extraction result to image；

As the preferred embodiment of the present invention, the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.

In a specific embodiment of the invention, the present invention relates to the technology necks that unmanned robot system positioned and built in real time figure Domain is that a kind of vision SLAM semanteme based on empty convolution deep neural network builds drawing method and system.System is using embedded Development process device, by extracting figure using vision SLAM technology to the collected color data of RGB-D camera and depth data As characteristic point, characteristic matching is carried out, the method for recycling Bundle Adjustment obtains more accurate robot pose and estimates Meter detects the cumulative errors for eliminating interframe using winding.While obtaining robot real-time positioning information, a kind of needle is used To the empty convolution design method of GoogLeNet deep neural network, semantic in real time point is realized using deep neural network is improved Semantic segmentation result combination vision SLAM system is obtained the figure of building of semantic class, and is disappeared by manifold cluster by the feature extraction cut Except error brought by optimization semantic segmentation, after building figure by Octree, spatial network map has more advanced semantic information, And the semantic map constructed is more accurate.

The method that figure function is built based on empty convolution deep neural network realization vision SLAM semanteme based on above system, Wherein, comprising the following steps:

(1) embedded development processor is used, the colour information of current environment is obtained by RGB-D camera and depth is believed Breath；

(2) image characteristic point is extracted using vision SLAM technology to the image collected by camera, carries out feature With obtaining Feature Points Matching pair；Using 3D point to the solution current pose of camera；Utilize the side of figure optimization Bundle Adjustment Method carries out more accurate pose estimation；The cumulative errors for eliminating interframe are detected using winding, and obtain scene space point cloud data；

(3) Pixel-level semantic segmentation is carried out to image using deep learning, utilizes image coordinate system and world coordinate system Relationship map is into space, so that each spatial point has semantic tagger information；

(4) using manifold cluster optimization semantic segmentation bring error；

(5) it carries out semanteme and builds figure, spatial point cloud is spliced, finally obtain the point being made of intensive discrete point Cloud semanteme map.

In the above-described example, embeded processor described in the step (1) includes NVIDIA Jetson TX2 system And same category of device.

In the above-described example, general vision SLAM and its local improvement technology are used in the step (2).

In the above-described example, semantic segmentation network is specifically included with flowering structure in the step (3):

(31) feature extraction layer；

(32) multiple dimensioned extract layer；

(33) classification layer；

In the above-described example, feature extraction layer described in the step (31) is specifically included with flowering structure:

(311) front end features extract layer of the GoogLeNet network structure as DeepLab model is used；

(312) the maximum pond layer step-length after Inception (3b) in GoogLeNet network structure is revised as 1, from And characteristic dimension is expanded, guarantee that output resolution ratio is constant；

(313) Inception (4a) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension；

(314) Inception (4b) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension；

(315) Inception (4c) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension；

(316) Inception (4d) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension；

(317) Inception (4e) in GoogLeNet network structure is partially replaced using empty convolution, setting Dilation is 2,5 × 5 Pool, to expand characteristic dimension；

(316) the maximum pond layer step-length after Inception (4e) in GoogLeNet network structure is revised as 1, from And characteristic dimension is expanded, guarantee that output resolution ratio is constant；

Wherein having a size of 224, feature Output Size is 7 for original GoogLeNet input, is equivalent to and reduces 32 times, will The step-length of two layers of pond layer is revised as 1 afterwards, and original common convolution is revised as empty convolution, in this way for input having a size of 321, the Output Size of characteristic pattern is 41, is equivalent to and reduces 8 times, to expand characteristic dimension.

In the above-described example, multiple dimensioned layer described in the step (32) is specifically included with flowering structure:

(321) based on spatial pyramid pondization mode carries out multiple dimensioned processing；

(322) spatial pyramid pond model is optimized, uses 1 × 1 convolution and different sample rates (6,12,18) Empty convolution extract different scale receptive field feature；

(323) by image pond Fusion Features into module, then obtained characteristic image is all passed through to 1 × 1 volume Fusion (Concat) obtains feature to the end after product, places into Softmax layers of progress pixel semantic classification.

In the above-described example, manifold described in described ground step (4) cluster specifically includes the following steps:

(41) the Surface by Tangent Plane Method vector of each spatial point is calculated, if currently cluster classification c=0；

(42) the point x for being assigned classification not yet is searched for_iIf all the points have clustered, (85) are thened follow the steps, Otherwise, if x_iClassification is c=c+1, and creates an empty queue q；

(43) spatial point x is calculated_iSurface by Tangent Plane Method vector v_iWith its distance less than all the points x in 0.01 range_jNormal vector v_jAngle α_ijIf α_ij< σ or α_ij175 ° of >, then x_jAnd x_iIt is classified as one kind, x_jClassification is c, and by the x for the condition that meets_j It is pressed into queue q；

(44) if queue q non-empty, enables x_i=q₁, step 3 is continued to execute, step 1 is otherwise jumped to；

(45) the preceding k class point that points are most in cluster is extracted, remaining point is sorted out according to nearby principle.

Wherein, in step (41) Surface by Tangent Plane Method vector calculating step are as follows:

If n spatial point forms matrixCovariance matrix ∑=E [(X- μ) (X- μ) of X^T]

If w ∈ R^3×1For the unit normal vector of this plane, Z=w^TX is throwing of this n point on this unit normal vector Shadow length, establishes model:

s.t.w^TW=1

It is solved using method of Lagrange multipliers:

Partial derivative is asked to obtain above formula:

W needs are unitization, a corresponding eigenvalue in above formula hasAnd And covariance matrix is positive semidefinite matrix, so space vector w is the smallest unit of the more corresponding eigenvalues of ∑ of covariance matrix Feature vector.

In the above-described example, build nomography described in the step (5) specifically includes the following steps:

(51) when generating each frame point cloud information, according to the precision characteristic of RGB-D camera, remove that depth value is too big or nothing The point cloud of effect；

(52) spatial point isolated using the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of space Point apart from mean value, remove the spatial point excessive apart from mean value and eliminate isolated noise spot to retain dense space point；

(53) space lattice principle is utilized, all spatial point clouds are filled into space lattice, guarantee each space networks Lattice only retain a spatial point, be equivalent to spatial point cloud carry out it is down-sampled, to save many memory spaces.

Wherein, spatial network map is established using octotree data structure.

For a spatial cuboids, eight regions are classified as, identical, each subregion continues to be divided into eight areas Domain dynamically creates an Octree map in this way.

With reference to the accompanying drawing and specific embodiment is discussed in detail, the vision of the invention based on empty convolutional neural networks SLAM semanteme builds drawing method.

Vision SLAM semanteme based on empty convolutional neural networks builds drawing method and system flow as shown in Figure 1:

The image data acquired by RGB-D camera selects the not high frame of similarity as key frame, and key frame includes coloured silk Chromatic graph picture, depth image and current pose carry out semantic segmentation to color image, first by using based on the empty convolution of improvement GoogLeNet feature extraction layer, multiple dimensioned layer obtains original semantic point cloud.Operation is filtered to original semantic point cloud, Popular cluster is carried out in conjunction with depth image, finally combines posture information to carry out Octree together and builds figure, the improvement of network improves The processing capability in real time of system, can on the embedded platform based on NVIDIAJETSON TX2 real-time implementation.

Vision SLAM semanteme based on empty convolutional neural networks is built in drawing method and system flow, and deep learning language is passed through Justice segmentation network obtains the semantic information of image, and system flow is as shown in Fig. 2, be broadly divided into feature extraction, multiple dimensioned extraction With three parts of classification.

Vision SLAM semanteme based on empty convolutional neural networks builds cavity convolution used in drawing method and system flow It is as shown in Figure 3:

Convolution sum pond is regarded as operation of the same race, it is assumed that intermediate violet spot part is general as input, the green portion of figure Logical convolution process obtains feature after step-length is respectively 2,1,2,1 convolution (or pond) process.The characteristic point of top layer Corresponding receptive field is entire input layer.

In order to expand characteristic size, step-length is all changed to 1, first layer volume for pink colour part in figure using empty convolution After product step-size change, enabling dilation is 1, and obtained number of features expands twice, when carrying out second layer convolution operation, is enabled Dilation is 2, that is, when doing convolution operation, is spaced 1 point and convolution nuclear convolution, obtains the two of feature or original common convolution Times, and the receptive field of characteristic point is constant, continues third layer convolution operation, is changed to 1 with by step-length, in order to keep identical sense By open country, dilation at this time equally should be 2.In the 4th layer of convolution operation, dilation will just be able to maintain sense for 4 at this time By wild constant.

It is needed to pay attention to when using empty convolution:

S1. in the step-length of upper one layer of convolution operation by stride_oldBecome stride_new, in order to keep receptive field constant, connect The convolution layer operation for getting off all will carry out voidageConvolution with holes；

S2. the voidage of current layer cavity convolution operation such as following formula.

The wherein front layer step-size change number of N representative,For the change of n-th step-length.

It is as shown in Figure 4 that vision SLAM semanteme based on empty convolutional neural networks builds figure result.Image is at two in figure It is being tested in scene as a result, it is left be office scenarios, the right side be laboratory scene.What first behavior this system exported in figure has language Adopted information builds figure as a result, wherein chair, people, plant are respectively with red, pink colour, green mark；Second behavior Conventional visual What SLAM was established builds figure result without semantic information.The experimental results showed that the present invention, which can be such that robot is well understood by, works as front court Main target in scape.Software and the algorithm according to the present invention item on NVIDIA Jetson TX2 embedded platform, Its processor diagram is as shown in Figure 5.

In this description, the present invention is described with reference to its specific embodiment.But it is clear that can still make Various modifications and alterations are without departing from the spirit and scope of the invention.Therefore, the description and the appended drawings should be considered as illustrative And not restrictive.

Claims

1. a kind of method for realizing that vision SLAM semanteme builds figure function based on empty convolution deep neural network, which is characterized in that The method the following steps are included:

(2) Feature Points Matching pair is obtained by the image acquired, the estimation of line position of going forward side by side appearance, and obtain scene space point cloud data；

(3) Pixel-level semantic segmentation is carried out to image using deep learning, is mapped by image coordinate system and world coordinate system, and So that spatial point has semantic tagger information；

(5) it carries out semanteme and builds figure, spatial point cloud is spliced, obtain the point cloud semanteme map being made of intensive discrete point.

2. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.

3. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (2) the following steps are included:

(2.2) by 3D point to the solution current pose of camera；

4. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that in the step (3) to image carry out Pixel-level semantic segmentation specifically includes the following steps:

(3.3) classified according to extraction result to image.

5. according to claim 4 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (3.1) further includes the design process of feature extraction layer, specifically includes the following steps:

(3.1.2) by Inception (4a) in GoogLeNet network structure, Inception (4b), Inception (4c), Inception (4d), Inception (4e) are partially replaced using empty convolution, and be arranged empty convolution be 5 × 5 and The Pool that dilation is 2；

6. according to claim 4 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (3.2) further includes the design process of multiple dimensioned extract layer, specifically includes the following steps:

(3.2.3) blending image pond feature merges to obtain spy into module, by the characteristic image by 1 × 1 convolution Sign, and it is put into Softmax layers of progress pixel semantic classification.

7. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (4) specifically includes the following steps:

(4.2) the point x of unassigned classification is searched for_i, judge whether that all the points have clustered, if it is, continuing step (4.5)； Otherwise, x_iClassification is c=c+1, and creates empty queue q；

(4.3) spatial point x is calculated_iSurface by Tangent Plane Method vector v_iWith away from it less than all the points x in 0.01 range_jNormal vector v_jFolder Angle α_ij, judge whether there is α_ij< σ or α_ij175 ° of >, if it is, x_jAnd x_iIt is classified as one kind, x_jClassification is c, and will be met The x of condition_jIt is pressed into queue q；Otherwise, continue step (4.4)；

8. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), specifically:

9. according to claim 1 realize that vision SLAM semanteme builds the side of figure function based on empty convolution deep neural network Method, which is characterized in that the step (5) the following steps are included:

(5.2) spatial point isolated by the removal of statistical zero-knowledge method calculates each spatial point and its nearest N number of spatial point Apart from mean value, the spatial point excessive apart from mean value is removed；

(5.3) by space lattice principle, all spatial point clouds are filled into space lattice, so that each space lattice only retains One spatial point.

10. a kind of system for realizing that vision SLAM semanteme builds figure function based on empty convolution deep neural network, which is characterized in that The system includes:

Embedded development processor, for constructing vision SLAM semanteme map；

Graph builder, the graph builder according to deep learning and vision SLAM, are handled at runtime by embedded development Device and RGB-D camera realize that vision SLAM semanteme builds figure, specifically follow the steps below processing:

11. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the embeded processor in the step (1) includes NVIDIA JETSON TX2 system.

12. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (2) the following steps are included:

(2.2) by 3D point to the solution current pose of camera；

13. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that in the step (3) to image carry out Pixel-level semantic segmentation specifically includes the following steps:

(3.3) classified according to extraction result to image.

14. according to claim 13 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (3.1) further includes the design process of feature extraction layer, specifically includes the following steps:

15. according to claim 13 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (3.2) further includes the design process of multiple dimensioned extract layer, specifically includes the following steps:

16. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (4) specifically includes the following steps:

17. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the Surface by Tangent Plane Method vector of the calculating spatial point in the step (4.1), specifically:

18. according to claim 10 realize that vision SLAM semanteme builds figure function based on empty convolution deep neural network System, which is characterized in that the step (5) the following steps are included: