CN112288857A

CN112288857A - Robot semantic map object recognition method based on deep learning

Info

Publication number: CN112288857A
Application number: CN202011189866.3A
Authority: CN
Inventors: 王晓华; 李耀光; 王文杰; 张蕾; 苏泽斌
Original assignee: Xian Polytechnic University
Current assignee: Xian Polytechnic University
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-01-29

Abstract

The invention discloses a semantic map object recognition method of a robot based on deep learning, which comprises the following specific steps of; step 1: acquiring image information of each frame of the surrounding environment in real time, and selecting image key frames; step 2: performing target detection on the selected image key frame through a convolutional neural network, and extracting semantic information; and step 3: completing the segmentation of the three-dimensional point cloud object level by using a three-dimensional point cloud segmentation method for the detected key frame image output in the step 2; and 4, step 4: and (3) fusing the segmentation module in the step (3) and the detection module in the step (2) into an ORB _ SLAM2 visual SLAM framework to obtain a three-dimensional semantic map containing object information. The invention solves the problem that the robot in the prior art can not understand the high-level semantic information of the articles in the environment and identify the objects in the map.

Description

Robot semantic map object recognition method based on deep learning

Technical Field

The invention belongs to the technical field of artificial intelligence, and relates to a semantic map object recognition method of a robot based on deep learning.

Background

With the development of science and technology and the improvement of living standard, the robot gradually enters the daily life of people. However, the current robots usually employ a Simultaneous Localization and Mapping (SLAM) based method to construct a sparse road map containing geometric information, which can only be used to perform navigation and Localization tasks. In this case, the robot cannot understand the high-level semantic information of the object in the environment and recognize the object in the map. Therefore, combining the SLAM and the object detection and recognition technology, constructing the semantic map containing the object semantic information becomes an effective solution. The efficient semantic SLAM is the basis for the intelligent robot system to realize high-level tasks such as autonomous positioning, environmental navigation and intelligent search in an unknown environment.

Disclosure of Invention

The invention aims to provide a semantic map object recognition method of a robot based on deep learning, which solves the problem that the robot in the prior art cannot understand the high-level semantic information of articles in the environment and recognize objects in the map.

The technical scheme adopted by the invention is that a robot semantic map object recognition method based on deep learning specifically comprises the following steps:

step 1: acquiring image information of each frame of the surrounding environment in real time, and selecting image key frames;

step 2: performing target detection on the selected image key frame through a convolutional neural network, and extracting semantic information;

and step 3: completing the segmentation of the three-dimensional point cloud object level by using a three-dimensional point cloud segmentation method for the detected key frame image output in the step 2;

and 4, step 4: and (3) fusing the segmentation module in the step (3) and the detection module in the step (2) into an ORB _ SLAM2 visual SLAM framework to obtain a three-dimensional semantic map containing object information.

The invention is also characterized in that:

in the step 1, an RGB-D vision sensor is used for collecting environmental image information in real time; the specific process of selecting the image key frame in the step 1 is as follows: in 10-15 frames of images, one frame of image with good image quality, sufficient number of characteristic points and uniform distribution of the characteristic points is taken as a key.

The specific process of target detection in step 2 is as follows:

step 2.1: making a data set with object markers; labeling the image sample by using an image labeling tool LableImg; labeling different objects with a text box to generate a labeling file conforming to a network format;

step 2.2: setting a network hyper-parameter to train a YOLOV3-tiny improved network to obtain a proper network model;

step 2.3: and sending the selected image key frame into a YOLO model for object detection to obtain the position and classification probability of the object.

The specific process of the point cloud segmentation method in the step 3 is as follows:

step 3.1: performing two-dimensional target segmentation on the key frame image output in the step 2 by using a GrabCT algorithm in the SLAM process;

step 3.2: performing three-dimensional point cloud segmentation by using an improved VCCS algorithm;

step 3.3: and eliminating pixel points which do not belong to the target in the point cloud by using the point cloud segmentation result, and completing the three-dimensional point cloud segmentation of the object level.

The specific process of step 3.1 is as follows:

step 3.1.1: defining one or more rectangles containing objects in the key frame picture, wherein the areas outside the rectangles are automatically considered as backgrounds;

step 3.1.2: modeling the background and foreground with a gaussian mixture model and labeling undefined pixels as possible foreground or background;

for each pixel in the image, a model is built with a GMM containing K Gaussian components, and the final segmentation is performed using a vector K{k₁,k₂,...k_nDenotes that each element in the vector represents a corresponding one of the components belonging to k, i.e. k_nE.g., {1,2,. K }, while distinguishing the foreground from the background;

for each pixel, a gaussian component from the target GMM or a gaussian component from the background GMM is calculated by substituting the RGB three channel values into the GMM model, and the energy function of the whole image is expressed as:

E(α,k,θ,z)＝U(α,k,θ,z)+V(α,z) (1)

u () is a region item and represents the penalty of classifying a pixel as a target or a background, V () is a boundary item and represents the penalty of discontinuity between adjacent pixels, and the Gaussian mixture model is in the form of:

wherein pi represents the weight of each Gaussian component, mu represents the mean vector of each Gaussian component, and is a three-element vector for a three-channel image, and sigma represents a covariance matrix;

step 3.1.3: each pixel (i.e. a node in the algorithm) is connected with a foreground or background node, and after the nodes are connected, if edges between the nodes belong to different terminals, the edges between the nodes are cut off, so that each part of the image can be segmented to construct an initial semantic point cloud map.

The specific process of the step 3.2 is as follows: for the three-dimensional point cloud obtained by the visual SLAM system, point cloud segmentation is carried out by using a Super Voxel-based point cloud segmentation method, and Voxel clustering is carried out by using point cloud Voxel connectivity, so that a three-dimensional point cloud clustering segmentation algorithm for target segmentation is realized.

Step 3.2.1, constructing an adjacency graph of the voxel point cloud, wherein in a voxelized three-dimensional space, three adjacent definitions are provided, namely 6, 18 and 26 adjacent voxels, the voxels share one surface, edge and fixed point, the VCCS selects 26 adjacent voxels, the adjacency graph is realized through a Kd-tree, the resolution of the voxels for segmentation is recorded as Rvoxel, and all 26 voxels areThe centers of the neighboring voxels are kept at

Performing the following steps;

step 3.2.2, generating and filtering seed point cloud after the adjacency graph is established;

firstly, initializing a hyper-voxel by utilizing a plurality of seed point clouds, and dividing a space point cloud into a voxel network with a selected resolution Rseed, wherein the Rseed determines the distance between the hyper-voxels, and the Rseed is used for determining whether a sufficient number of seeds occupy the voxel; selecting a point closest to the center of a seed voxel in the point cloud as a seed candidate point, and deleting a noise point seed according to the Rsearch after the candidate point is determined;

step 3.2.3, after the seed voxel is selected, the super voxel characteristic vector can be initialized, then iterative clustering is carried out based on constraint, voxel points are assigned to the super voxel in an iterative manner, and the distance from each voxel to the center of the super voxel is calculated by using the formula (3):

wherein D is_c,D_n,D_sRespectively representing the difference between the voxel and the center color of the hyper-voxel, the difference between the normal and the distance, w_c,w_n,w_sRespectively representing weights on different differences, and determining the shape of the hyper-voxel;

the iterative process for each superpixel is: point cloud cluster center voxel flows outwards → the distance from the center neighborhood voxel to the center of the super voxel is calculated → if the minimum distance of the voxel is met → a voxel label is set, the farther neighborhood voxel is searched → the next super voxel iteration is carried out;

when iterating to the next superpixel, all the superpixels are considered at the same time from each layer from the center to the outside; iteratively searching outward until an edge is found for each super voxel; then, before searching in depth for the map, checking the same level of all super voxels; when all leaf nodes are searched or the searched nodes do not have labels, the label setting of the current hyper-voxel is finished;

step 3.2.4, after the search of all the hyper-voxels is finished and the labels are drawn, the center of each hyper-voxel cluster is updated by taking the average of all its components, and this operation is performed in an iterative manner until convergence.

The specific process of the step 3.3 is as follows: the three-dimensional point cloud is processed by a VCCS algorithm to obtain a group of different curved surface slices, and the result of the algorithm is expressed as an adjacent graph G ═ V, E }, wherein V represents a curved surface slice V_iE represents different surface patches V_iAnd V_jSet of connected edges e_iEach curved sheet corresponding to a center of gravity c_iSum normal vector n_iThus, the division of the three-dimensional scene is constructed into image partitions;

eliminating the influence caused by noise by adopting the fusion support plane, and assuming that K support planes { s } exist in the point cloud₁,s₂,...,s_kAll the surface patches are contained in these support planes and define the variable b_i},{b_i}∈[0,K],b_iK denotes that the surface belongs to the support plane s_k；

For all the curved surface pieces, generating a supporting plane of point cloud by using the centroid and normal vector of the curved surface pieces, and setting M candidate planes which are generated preliminarily after removing redundancy of a brief repeated plane;

firstly, a final point cloud support plane set is determined from M planes, P support planes meeting the geometric relationship are obtained from the point cloud support plane set, all the curved surface pieces are distributed to the P support planes, and each curved surface piece corresponds to one label l_pWherein l is_pE {0, 1, 2.,. p }, and if the curved surface sheet does not belong to any plane, the label of the curved surface sheet is 0; if two connected curved surface pieces belong to the same plane, the corresponding edge weight is assigned to be 0, otherwise, the edge weight is assigned to be 1, as shown in formula (4):

wherein l_i，l_jA label representing the contiguous sheet of curved surface,

and representing the weights of the connected edges of the curved surface pieces, constructing a graph optimization problem by using the curved surface pieces and the connected edges, and endowing the label with the minimum edge weight sum as the optimal segmentation, thereby completing the object segmentation of the point cloud scene.

The specific process of the step 4 is as follows:

step 4.1: data association and object model updating of the three-dimensional semantic map;

step 4.2: and constructing a three-dimensional semantic map based on Octomap.

The specific process of the step 4.1 is as follows: firstly, for each detection of a key frame, selecting an object after target segmentation as an object candidate set, searching in the neighborhood of each three-dimensional point of the current target before the current target is added into a map, finding out a three-dimensional point closest to the point from point cloud data of the object candidate set, and calculating the Euler distance between the two points;

if the Euler distance between two points is less than a certain threshold, the two points are considered as the same point, and the similarity between the current target and the object candidate set is calculated, wherein the calculation formula is represented as:

the method comprises the steps that M represents the number of three-dimensional points with Euler distances smaller than a threshold value in a current target, N represents the total number of three-dimensional points of each candidate object in a candidate object set, the similarity between the current target and each candidate object is calculated, if the similarity meets the threshold value, the current target and the current target are considered to be the same target, information of the current target and the current target are associated, object models are maintained together, and otherwise, a new object model is added;

the specific process of the step 4.2 is as follows: in the octal tree map form based on Octomap, a probability p is used in an octal tree to express whether a leaf is occupied, and the probability is expressed as:

alpha represents the corresponding node state, the probability p takes values from 0 to 1, a certain node in the octree is set as n, the observation data is set as z, and the node information alpha of the certain node from the beginning to the time t is represented as L (n | z_1:t-1) (ii) a The node information at time t +1 is represented as:

L(n|z_1:t)＝L(n|z_1：t-1)+L(n|z_t) (7)

if the depth of a certain pixel in the depth map is d, occupied data exists on a space node corresponding to the value d, no occupied information exists in an image area with the depth value smaller than d, and node information in the octree map is updated rapidly according to the above formula.

The method has the beneficial effects that the robot semantic map object recognition method based on deep learning combines the current target detection algorithm and visual SLAM algorithm based on deep learning, so that the robot can see and understand the real world and make autonomous decision and action planning.

Drawings

FIG. 1 is a flow chart of a semantic map object recognition method of a robot based on deep learning according to the invention;

FIG. 2 is a network structure diagram of a deep learning-based semantic map object recognition method of a robot after improvement of YOLOV 3-tiny;

FIG. 3 is a flow chart of an improved three-dimensional point cloud target segmentation method in the deep learning based robot semantic map object recognition method of the present invention;

FIG. 4 is a graph segmentation model of a curved surface block in the semantic map object recognition method of the robot based on deep learning according to the invention;

FIG. 5 is a flow chart of three-dimensional semantic map construction in the robot semantic map object recognition method based on deep learning.

Detailed Description

The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.

The invention discloses a flow chart of a semantic map object recognition method of a robot based on deep learning, which specifically comprises the following steps as shown in figure 1:

in the step 1, an RGB-D vision sensor is used for collecting environmental image information in real time;

the specific process of selecting the image key frame in the step 1 is as follows: taking one frame of image with good image quality, sufficient feature point quantity and uniform feature point distribution as a key frame in 10-15 frames of image; the relationship between the key frame and other key frames needs to have a small amount of common-view relationship with other key frames in the local map, but most feature points are new feature points, so that the effects of existing constraint and minimum information redundancy are achieved.

the specific process of target detection in step 2 is as follows:

step 2.1: making a data set with object markers; labeling the image sample by using an image labeling tool LableImg; labeling different objects with text boxes to generate a labeling file conforming to a network format, wherein the labeling file comprises information of the type and the target position of a target object;

step 2.2: setting a network hyper-parameter to train a YOLOV3-tiny improved network, as shown in FIG. 2, and obtaining a proper network model;

as shown in fig. 3, the specific process of the point cloud segmentation method in step 3 is as follows:

the specific process of step 3.1 is as follows:

step 3.1.1: one or more rectangles containing objects are defined in the key frame picture. The area outside the rectangle is automatically considered as background;

step 3.1.2: the background and foreground are modeled with a Gaussian Mixture Model (GMM) and undefined pixels are labeled as possible foreground or background.

For each pixel in the image, it is modeled by a GMM with K gaussian components, and the final segmentation can be performed using a vector K ═ K₁,k₂,...k_nDenotes that each element in the vector represents a corresponding one of the components belonging to k, i.e. k_nE {1, 2.., K }, while distinguishing between foreground and background.

E(α,k,θ,z)＝U(α,k,θ,z)+V(α,z) (1)

u () is a region term representing a penalty of a pixel being classified as a target or background, and V () is a boundary term representing a penalty of discontinuity between neighboring pixels. The gaussian mixture model is of the form:

where pi represents the weight of each gaussian component, mu represents the mean vector of each gaussian component, which is a three-element vector for a three-channel image, and Σ represents the covariance matrix.

Step 3.1.3: each pixel (i.e., node in the algorithm) will be connected to a foreground or background node. After the nodes are connected (possibly connected with the background or the foreground), if edges between the nodes belong to different terminals (namely one node belongs to the foreground and the other node belongs to the background), the edges between the nodes are cut off, so that all parts of the image can be segmented, and the initial semantic point cloud map is constructed.

the specific process of the step 3.2 is as follows: for three-dimensional point cloud obtained by a visual SLAM system, point cloud segmentation is carried out by using a Super Voxel (hyper Voxel) based point cloud segmentation method, and Voxel clustering is carried out by using point cloud Voxel connectivity to realize a three-dimensional point cloud clustering segmentation algorithm for target segmentation, which comprises the following specific processes:

and 3.2.1, constructing an adjacency graph of the voxel point cloud, wherein three adjacent definitions are respectively 6, 18 and 26 adjacent in a voxelized three-dimensional space, and voxels share a surface, an edge and a fixed point.

VCCS selects 26 adjacent voxels, realizes an adjacency graph through a Kd-tree, records the resolution of the voxels for segmentation as Rvoxel, and keeps the centers of all 26 adjacent voxels at

In (1). The kd-Tree is used to organize a set of points representing k-dimensional space, and is a binary search tree with other constraints.

And 3.2.2, generating and filtering seed point clouds after the adjacency graph is established.

The method comprises the steps of firstly initializing hyper-voxels by utilizing a plurality of seed point clouds, and dividing a space point cloud into a voxel network with a selected resolution Rseed, wherein the Rseed determines the distance between the hyper-voxels, and the Rsearch is used for determining whether a sufficient number of seeds occupy the voxels. And selecting a point closest to the center of the seed voxel in the point cloud as a seed candidate point, and deleting the noise point seed according to the Rsearch after the candidate point is determined.

And 3.2.3, initializing a super voxel characteristic vector after the seed voxel is selected, then carrying out iterative clustering based on constraint, and iteratively assigning a voxel point to the super voxel. The distance of each voxel to the center of the hyper-voxel is calculated using equation (3):

wherein D is_c,D_n,D_sRespectively representing the difference between the voxel and the center color of the hyper-voxel, the difference between the normal and the distance, w_c,w_n,w_sThe weights on different gaps are respectively expressed, and the shape of the hyper-voxel is determined. The iterative process for each superpixel is: point cloud cluster center voxel flows outwards → the distance between the center neighborhood voxel and the center of the super voxel is calculated → if the minimum distance of the voxel is met → a voxel label is set, the farther neighborhood voxel is searched → the next super voxel iteration is carried out.

Each layer from the center outward considers all the voxels simultaneously when iterating to the next hyper-voxel. The outward search is repeated until the edge of each super voxel is found. Then the same level of all super-voxels is examined before searching in depth into the map. When all leaf nodes are searched or none of the searched nodes have their label, the label setting of the current superpixel ends.

Step 3.2.4, after the search of all the superpixels is finished and the labels are marked, the center of each superpixel cluster is updated by taking the average value of all the components of the superpixels. This operation is performed in an iterative manner until convergence.

The specific process of the step 3.3 is as follows: the three-dimensional point cloud is processed by a VCCS algorithm to obtain a group of different curved surface slices, and the result of the algorithm can be represented as an adjacent graph G ═ V, E }, wherein V represents a curved surface slice V_iE represents different surface patches V_iAnd V_jSet of connected edges e_iA collection of (a). Each curved sheet corresponds to a center of gravity c_iSum normal vector n_iThus, the segmentation of the three-dimensional scene constitutes a map partition.

Due to the existence of noise, the classification method of connecting edges by only using normal vectors has insufficient segmentation precision. As shown in FIG. 4For this purpose, the fusion support plane is used to eliminate the influence of noise. Suppose there are K support planes { s } in the point cloud₁,s₂,...,s_kAll the surface patches are contained in these support planes and define the variable b_i},{b_i}∈[0,K],b_iK denotes that the surface belongs to the support plane s_k。

And for all the curved surface pieces, generating a supporting plane of the point cloud by using the centroid and normal vector of the curved surface pieces, and setting M candidate planes which are generated preliminarily after removing redundancy through a brief repeated plane.

wherein l_i，l_jA label representing the contiguous sheet of curved surface,

As shown in fig. 5, the specific process of step 4 is:

the specific process of the step 4.1 is as follows: firstly, for each detection of a key frame, selecting an object after target segmentation as an object candidate set, searching in the neighborhood of each three-dimensional point of the current target before the current target is added into a map, finding out a three-dimensional point closest to the point from point cloud data of the object candidate set, and calculating the Euler distance between the two points.

If the Euler distance between two points is less than a certain threshold, the two points can be regarded as the same point, and the similarity between the current target and the object candidate set can be calculated, and the calculation formula is represented as:

wherein M represents the number of three-dimensional points with Euler distances smaller than a threshold value in the current target, and N represents the total number of the three-dimensional points of each candidate object in the candidate object set. And calculating the similarity between the current target and each candidate object, if the similarity meets a threshold value, determining the current target is the same target, associating the current target with the current target information, and maintaining the object models together, otherwise, adding a new object model.

Step 4.2: and constructing a three-dimensional semantic map based on Octomap.

The specific process of the step 4.2 is as follows: the OctreeImage based map form has the advantages of flexible map storage, small memory occupation and real-time update support. The probability p is used in the octree to express whether a leaf is occupied, and is expressed as:

alpha represents the corresponding node state, and the probability p takes the value from 0 to 1. Let n be a certain node in the octree, z be the observation data, and L (nz) be the node information alpha of a certain node from the beginning to the time t_1:t-1) (ii) a The node information at time t +1 can be expressed as:

L(n|z_1:t)＝L(n|z_1:t-1)+L(n|z_t) (7)

if the depth of a certain pixel in the depth map is d, there is occupied data on the spatial node corresponding to the value d, and there is no occupied information in the image area where the depth value is smaller than d. The node information in the octree map can be updated quickly according to the above formula.

Octope-tree maps based on Octope not only store the probability of voxel occupation, but also store the color information of objects, in order to mark the semantic information of the objects, different colors are adopted to represent different object categories, different target objects are colored according to the corresponding relation set by the user, and then the three-dimensional color semantic map containing both environment geometric information and object semantic information can be obtained, so that the recognition work of the robot on the objects in the map is realized.

According to the robot semantic map object identification method based on deep learning, detailed semantic information acquired by a target detection module is fused into a visual SLAM according to the visual SLAM principle of a mobile robot, a three-dimensional target segmentation algorithm is used for segmenting objects in a map to obtain an environment semantic segmentation result, the three-dimensional environment semantic map is established, an accurate semantic map is established, object information in the map is stored, the mobile robot can identify the objects in the map, and positioning is guided by matching of the objects.

The invention discloses a semantic map object recognition method of a robot based on deep learning, which is characterized in that a target detection module and a three-dimensional object segmentation module are integrated according to the visual SLAM principle of a mobile robot, an octree map based on an Octomap library stores the color information of objects, a three-dimensional environment semantic map is established, and the recognition of the robot to the objects in the map is completed. By the target detection method based on deep learning, 13 convolution layers of 3 multiplied by 3 are added on the basis of the YOLO-tiny network to improve the depth of the network, so that the network can learn information of a deeper layer in an input image, and the aim of improving the detection precision is fulfilled. And adding the target detection result into the SLAM thread to enable semantic information to be fused into the point cloud map. And (4) carrying out environmental object segmentation on the three-dimensional map through a target segmentation algorithm to construct a three-dimensional environmental point cloud map. The mobile robot can identify objects in the map, guide positioning by matching of the objects, and complete high-level human-computer interaction tasks.

Claims

1. A semantic map object recognition method of a robot based on deep learning is characterized by comprising the following steps:

2. The deep learning based semantic map object recognition method for the robot as claimed in claim 1, wherein in step 1, an RGB-D visual sensor is used for collecting environmental image information in real time; the specific process of selecting the image key frame in the step 1 is as follows: in 10-15 frames of images, one frame of image with good image quality, sufficient number of characteristic points and uniform distribution of the characteristic points is taken as a key.

3. The deep learning-based semantic robot map object recognition method according to claim 1, wherein the specific process of target detection in the step 2 is as follows:

4. The deep learning-based semantic robot map object recognition method according to claim 1, wherein the point cloud segmentation method in step 3 specifically comprises the following steps:

5. The deep learning-based semantic robot map object recognition method according to claim 4, wherein the specific process of the step 3.1 is as follows:

for each pixel in the image, it is modeled by a GMM with K gaussian components, and the final segmentation is performed using a vector K ═ K₁,k₂,...k_nDenotes that each element in the vector represents a corresponding one of the components belonging to k, i.e. k_nE.g., {1,2,. K }, while distinguishing the foreground from the background;

E(α,k,θ,z)＝U(α,k,θ,z)+V(α,z) (1)

6. The deep learning-based semantic map object recognition method for the robot according to claim 5, wherein the step 3.2 comprises the following specific processes: for the three-dimensional point cloud obtained by the visual SLAM system, point cloud segmentation is carried out by using a Super Voxel-based point cloud segmentation method, and Voxel clustering is carried out by using point cloud Voxel connectivity, so that a three-dimensional point cloud clustering segmentation algorithm for target segmentation is realized.

7. The deep learning-based robot semantic map object recognition method according to claim 6, wherein in step 3.2.1, a neighborhood map of a voxel point cloud is constructed, three neighboring definitions are provided in a voxelized three-dimensional space, the three neighboring definitions are 6, 18 and 26, the voxels share a surface, an edge and a fixed point, the VCCS selects 26 neighboring voxels, the neighborhood map is realized by a Kd-tree, the resolution of the voxels for segmentation is recorded as Rvoxel, and the centers of all 26 neighboring voxels are kept at the same position

Performing the following steps;

8. The deep learning-based semantic map object recognition method for the robot according to claim 7, wherein the step 3.3 specifically comprises the following steps: the three-dimensional point cloud is processed by a VCCS algorithm to obtain a group of different curved surface slices, and the result of the algorithm is expressed as an adjacent graph G ═ V, E }, wherein V represents a curved surface slice V_iE represents different surface patches V_iAnd V_jSet of connected edges e_iEach curved sheet corresponding to a center of gravity c_iSum normal vector n_iThus, the division of the three-dimensional scene is constructed into image partitions;

wherein l_i，l_jA label representing the contiguous sheet of curved surface,

9. The deep learning-based semantic robot map object recognition method according to claim 1, wherein the step 4 specifically comprises the following steps:

step 4.2: and constructing a three-dimensional semantic map based on Octomap.

10. The deep learning-based semantic map object recognition method for the robot according to claim 9, wherein the step 4.1 specifically comprises the following steps: firstly, for each detection of a key frame, selecting an object after target segmentation as an object candidate set, searching in the neighborhood of each three-dimensional point of the current target before the current target is added into a map, finding out a three-dimensional point closest to the point from point cloud data of the object candidate set, and calculating the Euler distance between the two points;

L(n|z_1:t)＝L(n|z_1:t-1)+L(n|z_t) (7)