CN112288857A - Robot semantic map object recognition method based on deep learning - Google Patents

Robot semantic map object recognition method based on deep learning Download PDF

Info

Publication number
CN112288857A
CN112288857A CN202011189866.3A CN202011189866A CN112288857A CN 112288857 A CN112288857 A CN 112288857A CN 202011189866 A CN202011189866 A CN 202011189866A CN 112288857 A CN112288857 A CN 112288857A
Authority
CN
China
Prior art keywords
voxel
point cloud
segmentation
map
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011189866.3A
Other languages
Chinese (zh)
Inventor
王晓华
李耀光
王文杰
张蕾
苏泽斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Polytechnic University
Original Assignee
Xian Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Polytechnic University filed Critical Xian Polytechnic University
Priority to CN202011189866.3A priority Critical patent/CN112288857A/en
Publication of CN112288857A publication Critical patent/CN112288857A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/005Tree description, e.g. octree, quadtree
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a semantic map object recognition method of a robot based on deep learning, which comprises the following specific steps of; step 1: acquiring image information of each frame of the surrounding environment in real time, and selecting image key frames; step 2: performing target detection on the selected image key frame through a convolutional neural network, and extracting semantic information; and step 3: completing the segmentation of the three-dimensional point cloud object level by using a three-dimensional point cloud segmentation method for the detected key frame image output in the step 2; and 4, step 4: and (3) fusing the segmentation module in the step (3) and the detection module in the step (2) into an ORB _ SLAM2 visual SLAM framework to obtain a three-dimensional semantic map containing object information. The invention solves the problem that the robot in the prior art can not understand the high-level semantic information of the articles in the environment and identify the objects in the map.

Description

Robot semantic map object recognition method based on deep learning
Technical Field
The invention belongs to the technical field of artificial intelligence, and relates to a semantic map object recognition method of a robot based on deep learning.
Background
With the development of science and technology and the improvement of living standard, the robot gradually enters the daily life of people. However, the current robots usually employ a Simultaneous Localization and Mapping (SLAM) based method to construct a sparse road map containing geometric information, which can only be used to perform navigation and Localization tasks. In this case, the robot cannot understand the high-level semantic information of the object in the environment and recognize the object in the map. Therefore, combining the SLAM and the object detection and recognition technology, constructing the semantic map containing the object semantic information becomes an effective solution. The efficient semantic SLAM is the basis for the intelligent robot system to realize high-level tasks such as autonomous positioning, environmental navigation and intelligent search in an unknown environment.
Disclosure of Invention
The invention aims to provide a semantic map object recognition method of a robot based on deep learning, which solves the problem that the robot in the prior art cannot understand the high-level semantic information of articles in the environment and recognize objects in the map.
The technical scheme adopted by the invention is that a robot semantic map object recognition method based on deep learning specifically comprises the following steps:
step 1: acquiring image information of each frame of the surrounding environment in real time, and selecting image key frames;
step 2: performing target detection on the selected image key frame through a convolutional neural network, and extracting semantic information;
and step 3: completing the segmentation of the three-dimensional point cloud object level by using a three-dimensional point cloud segmentation method for the detected key frame image output in the step 2;
and 4, step 4: and (3) fusing the segmentation module in the step (3) and the detection module in the step (2) into an ORB _ SLAM2 visual SLAM framework to obtain a three-dimensional semantic map containing object information.
The invention is also characterized in that:
in the step 1, an RGB-D vision sensor is used for collecting environmental image information in real time; the specific process of selecting the image key frame in the step 1 is as follows: in 10-15 frames of images, one frame of image with good image quality, sufficient number of characteristic points and uniform distribution of the characteristic points is taken as a key.
The specific process of target detection in step 2 is as follows:
step 2.1: making a data set with object markers; labeling the image sample by using an image labeling tool LableImg; labeling different objects with a text box to generate a labeling file conforming to a network format;
step 2.2: setting a network hyper-parameter to train a YOLOV3-tiny improved network to obtain a proper network model;
step 2.3: and sending the selected image key frame into a YOLO model for object detection to obtain the position and classification probability of the object.
The specific process of the point cloud segmentation method in the step 3 is as follows:
step 3.1: performing two-dimensional target segmentation on the key frame image output in the step 2 by using a GrabCT algorithm in the SLAM process;
step 3.2: performing three-dimensional point cloud segmentation by using an improved VCCS algorithm;
step 3.3: and eliminating pixel points which do not belong to the target in the point cloud by using the point cloud segmentation result, and completing the three-dimensional point cloud segmentation of the object level.
The specific process of step 3.1 is as follows:
step 3.1.1: defining one or more rectangles containing objects in the key frame picture, wherein the areas outside the rectangles are automatically considered as backgrounds;
step 3.1.2: modeling the background and foreground with a gaussian mixture model and labeling undefined pixels as possible foreground or background;
for each pixel in the image, a model is built with a GMM containing K Gaussian components, and the final segmentation is performed using a vector K{k1,k2,...knDenotes that each element in the vector represents a corresponding one of the components belonging to k, i.e. knE.g., {1,2,. K }, while distinguishing the foreground from the background;
for each pixel, a gaussian component from the target GMM or a gaussian component from the background GMM is calculated by substituting the RGB three channel values into the GMM model, and the energy function of the whole image is expressed as:
E(α,k,θ,z)=U(α,k,θ,z)+V(α,z) (1)
u () is a region item and represents the penalty of classifying a pixel as a target or a background, V () is a boundary item and represents the penalty of discontinuity between adjacent pixels, and the Gaussian mixture model is in the form of:
Figure BDA0002752466270000031
wherein pi represents the weight of each Gaussian component, mu represents the mean vector of each Gaussian component, and is a three-element vector for a three-channel image, and sigma represents a covariance matrix;
step 3.1.3: each pixel (i.e. a node in the algorithm) is connected with a foreground or background node, and after the nodes are connected, if edges between the nodes belong to different terminals, the edges between the nodes are cut off, so that each part of the image can be segmented to construct an initial semantic point cloud map.
The specific process of the step 3.2 is as follows: for the three-dimensional point cloud obtained by the visual SLAM system, point cloud segmentation is carried out by using a Super Voxel-based point cloud segmentation method, and Voxel clustering is carried out by using point cloud Voxel connectivity, so that a three-dimensional point cloud clustering segmentation algorithm for target segmentation is realized.
Step 3.2.1, constructing an adjacency graph of the voxel point cloud, wherein in a voxelized three-dimensional space, three adjacent definitions are provided, namely 6, 18 and 26 adjacent voxels, the voxels share one surface, edge and fixed point, the VCCS selects 26 adjacent voxels, the adjacency graph is realized through a Kd-tree, the resolution of the voxels for segmentation is recorded as Rvoxel, and all 26 voxels areThe centers of the neighboring voxels are kept at
Figure BDA0002752466270000041
Performing the following steps;
step 3.2.2, generating and filtering seed point cloud after the adjacency graph is established;
firstly, initializing a hyper-voxel by utilizing a plurality of seed point clouds, and dividing a space point cloud into a voxel network with a selected resolution Rseed, wherein the Rseed determines the distance between the hyper-voxels, and the Rseed is used for determining whether a sufficient number of seeds occupy the voxel; selecting a point closest to the center of a seed voxel in the point cloud as a seed candidate point, and deleting a noise point seed according to the Rsearch after the candidate point is determined;
step 3.2.3, after the seed voxel is selected, the super voxel characteristic vector can be initialized, then iterative clustering is carried out based on constraint, voxel points are assigned to the super voxel in an iterative manner, and the distance from each voxel to the center of the super voxel is calculated by using the formula (3):
Figure BDA0002752466270000042
wherein D isc,Dn,DsRespectively representing the difference between the voxel and the center color of the hyper-voxel, the difference between the normal and the distance, wc,wn,wsRespectively representing weights on different differences, and determining the shape of the hyper-voxel;
the iterative process for each superpixel is: point cloud cluster center voxel flows outwards → the distance from the center neighborhood voxel to the center of the super voxel is calculated → if the minimum distance of the voxel is met → a voxel label is set, the farther neighborhood voxel is searched → the next super voxel iteration is carried out;
when iterating to the next superpixel, all the superpixels are considered at the same time from each layer from the center to the outside; iteratively searching outward until an edge is found for each super voxel; then, before searching in depth for the map, checking the same level of all super voxels; when all leaf nodes are searched or the searched nodes do not have labels, the label setting of the current hyper-voxel is finished;
step 3.2.4, after the search of all the hyper-voxels is finished and the labels are drawn, the center of each hyper-voxel cluster is updated by taking the average of all its components, and this operation is performed in an iterative manner until convergence.
The specific process of the step 3.3 is as follows: the three-dimensional point cloud is processed by a VCCS algorithm to obtain a group of different curved surface slices, and the result of the algorithm is expressed as an adjacent graph G ═ V, E }, wherein V represents a curved surface slice ViE represents different surface patches ViAnd VjSet of connected edges eiEach curved sheet corresponding to a center of gravity ciSum normal vector niThus, the division of the three-dimensional scene is constructed into image partitions;
eliminating the influence caused by noise by adopting the fusion support plane, and assuming that K support planes { s } exist in the point cloud1,s2,...,skAll the surface patches are contained in these support planes and define the variable bi},{bi}∈[0,K],biK denotes that the surface belongs to the support plane sk
For all the curved surface pieces, generating a supporting plane of point cloud by using the centroid and normal vector of the curved surface pieces, and setting M candidate planes which are generated preliminarily after removing redundancy of a brief repeated plane;
firstly, a final point cloud support plane set is determined from M planes, P support planes meeting the geometric relationship are obtained from the point cloud support plane set, all the curved surface pieces are distributed to the P support planes, and each curved surface piece corresponds to one label lpWherein l ispE {0, 1, 2.,. p }, and if the curved surface sheet does not belong to any plane, the label of the curved surface sheet is 0; if two connected curved surface pieces belong to the same plane, the corresponding edge weight is assigned to be 0, otherwise, the edge weight is assigned to be 1, as shown in formula (4):
Figure BDA0002752466270000061
wherein li,ljA label representing the contiguous sheet of curved surface,
Figure BDA0002752466270000062
and representing the weights of the connected edges of the curved surface pieces, constructing a graph optimization problem by using the curved surface pieces and the connected edges, and endowing the label with the minimum edge weight sum as the optimal segmentation, thereby completing the object segmentation of the point cloud scene.
The specific process of the step 4 is as follows:
step 4.1: data association and object model updating of the three-dimensional semantic map;
step 4.2: and constructing a three-dimensional semantic map based on Octomap.
The specific process of the step 4.1 is as follows: firstly, for each detection of a key frame, selecting an object after target segmentation as an object candidate set, searching in the neighborhood of each three-dimensional point of the current target before the current target is added into a map, finding out a three-dimensional point closest to the point from point cloud data of the object candidate set, and calculating the Euler distance between the two points;
if the Euler distance between two points is less than a certain threshold, the two points are considered as the same point, and the similarity between the current target and the object candidate set is calculated, wherein the calculation formula is represented as:
Figure BDA0002752466270000063
the method comprises the steps that M represents the number of three-dimensional points with Euler distances smaller than a threshold value in a current target, N represents the total number of three-dimensional points of each candidate object in a candidate object set, the similarity between the current target and each candidate object is calculated, if the similarity meets the threshold value, the current target and the current target are considered to be the same target, information of the current target and the current target are associated, object models are maintained together, and otherwise, a new object model is added;
the specific process of the step 4.2 is as follows: in the octal tree map form based on Octomap, a probability p is used in an octal tree to express whether a leaf is occupied, and the probability is expressed as:
Figure BDA0002752466270000064
alpha represents the corresponding node state, the probability p takes values from 0 to 1, a certain node in the octree is set as n, the observation data is set as z, and the node information alpha of the certain node from the beginning to the time t is represented as L (n | z1:t-1) (ii) a The node information at time t +1 is represented as:
L(n|z1:t)=L(n|z1:t-1)+L(n|zt) (7)
if the depth of a certain pixel in the depth map is d, occupied data exists on a space node corresponding to the value d, no occupied information exists in an image area with the depth value smaller than d, and node information in the octree map is updated rapidly according to the above formula.
The method has the beneficial effects that the robot semantic map object recognition method based on deep learning combines the current target detection algorithm and visual SLAM algorithm based on deep learning, so that the robot can see and understand the real world and make autonomous decision and action planning.
Drawings
FIG. 1 is a flow chart of a semantic map object recognition method of a robot based on deep learning according to the invention;
FIG. 2 is a network structure diagram of a deep learning-based semantic map object recognition method of a robot after improvement of YOLOV 3-tiny;
FIG. 3 is a flow chart of an improved three-dimensional point cloud target segmentation method in the deep learning based robot semantic map object recognition method of the present invention;
FIG. 4 is a graph segmentation model of a curved surface block in the semantic map object recognition method of the robot based on deep learning according to the invention;
FIG. 5 is a flow chart of three-dimensional semantic map construction in the robot semantic map object recognition method based on deep learning.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
The invention discloses a flow chart of a semantic map object recognition method of a robot based on deep learning, which specifically comprises the following steps as shown in figure 1:
step 1: acquiring image information of each frame of the surrounding environment in real time, and selecting image key frames;
in the step 1, an RGB-D vision sensor is used for collecting environmental image information in real time;
the specific process of selecting the image key frame in the step 1 is as follows: taking one frame of image with good image quality, sufficient feature point quantity and uniform feature point distribution as a key frame in 10-15 frames of image; the relationship between the key frame and other key frames needs to have a small amount of common-view relationship with other key frames in the local map, but most feature points are new feature points, so that the effects of existing constraint and minimum information redundancy are achieved.
Step 2: performing target detection on the selected image key frame through a convolutional neural network, and extracting semantic information;
the specific process of target detection in step 2 is as follows:
step 2.1: making a data set with object markers; labeling the image sample by using an image labeling tool LableImg; labeling different objects with text boxes to generate a labeling file conforming to a network format, wherein the labeling file comprises information of the type and the target position of a target object;
step 2.2: setting a network hyper-parameter to train a YOLOV3-tiny improved network, as shown in FIG. 2, and obtaining a proper network model;
step 2.3: and sending the selected image key frame into a YOLO model for object detection to obtain the position and classification probability of the object.
And step 3: completing the segmentation of the three-dimensional point cloud object level by using a three-dimensional point cloud segmentation method for the detected key frame image output in the step 2;
as shown in fig. 3, the specific process of the point cloud segmentation method in step 3 is as follows:
step 3.1: performing two-dimensional target segmentation on the key frame image output in the step 2 by using a GrabCT algorithm in the SLAM process;
the specific process of step 3.1 is as follows:
step 3.1.1: one or more rectangles containing objects are defined in the key frame picture. The area outside the rectangle is automatically considered as background;
step 3.1.2: the background and foreground are modeled with a Gaussian Mixture Model (GMM) and undefined pixels are labeled as possible foreground or background.
For each pixel in the image, it is modeled by a GMM with K gaussian components, and the final segmentation can be performed using a vector K ═ K1,k2,...knDenotes that each element in the vector represents a corresponding one of the components belonging to k, i.e. knE {1, 2.., K }, while distinguishing between foreground and background.
For each pixel, a gaussian component from the target GMM or a gaussian component from the background GMM is calculated by substituting the RGB three channel values into the GMM model, and the energy function of the whole image is expressed as:
E(α,k,θ,z)=U(α,k,θ,z)+V(α,z) (1)
u () is a region term representing a penalty of a pixel being classified as a target or background, and V () is a boundary term representing a penalty of discontinuity between neighboring pixels. The gaussian mixture model is of the form:
Figure BDA0002752466270000091
where pi represents the weight of each gaussian component, mu represents the mean vector of each gaussian component, which is a three-element vector for a three-channel image, and Σ represents the covariance matrix.
Step 3.1.3: each pixel (i.e., node in the algorithm) will be connected to a foreground or background node. After the nodes are connected (possibly connected with the background or the foreground), if edges between the nodes belong to different terminals (namely one node belongs to the foreground and the other node belongs to the background), the edges between the nodes are cut off, so that all parts of the image can be segmented, and the initial semantic point cloud map is constructed.
Step 3.2: performing three-dimensional point cloud segmentation by using an improved VCCS algorithm;
the specific process of the step 3.2 is as follows: for three-dimensional point cloud obtained by a visual SLAM system, point cloud segmentation is carried out by using a Super Voxel (hyper Voxel) based point cloud segmentation method, and Voxel clustering is carried out by using point cloud Voxel connectivity to realize a three-dimensional point cloud clustering segmentation algorithm for target segmentation, which comprises the following specific processes:
and 3.2.1, constructing an adjacency graph of the voxel point cloud, wherein three adjacent definitions are respectively 6, 18 and 26 adjacent in a voxelized three-dimensional space, and voxels share a surface, an edge and a fixed point.
VCCS selects 26 adjacent voxels, realizes an adjacency graph through a Kd-tree, records the resolution of the voxels for segmentation as Rvoxel, and keeps the centers of all 26 adjacent voxels at
Figure BDA0002752466270000101
In (1). The kd-Tree is used to organize a set of points representing k-dimensional space, and is a binary search tree with other constraints.
And 3.2.2, generating and filtering seed point clouds after the adjacency graph is established.
The method comprises the steps of firstly initializing hyper-voxels by utilizing a plurality of seed point clouds, and dividing a space point cloud into a voxel network with a selected resolution Rseed, wherein the Rseed determines the distance between the hyper-voxels, and the Rsearch is used for determining whether a sufficient number of seeds occupy the voxels. And selecting a point closest to the center of the seed voxel in the point cloud as a seed candidate point, and deleting the noise point seed according to the Rsearch after the candidate point is determined.
And 3.2.3, initializing a super voxel characteristic vector after the seed voxel is selected, then carrying out iterative clustering based on constraint, and iteratively assigning a voxel point to the super voxel. The distance of each voxel to the center of the hyper-voxel is calculated using equation (3):
Figure BDA0002752466270000102
wherein D isc,Dn,DsRespectively representing the difference between the voxel and the center color of the hyper-voxel, the difference between the normal and the distance, wc,wn,wsThe weights on different gaps are respectively expressed, and the shape of the hyper-voxel is determined. The iterative process for each superpixel is: point cloud cluster center voxel flows outwards → the distance between the center neighborhood voxel and the center of the super voxel is calculated → if the minimum distance of the voxel is met → a voxel label is set, the farther neighborhood voxel is searched → the next super voxel iteration is carried out.
Each layer from the center outward considers all the voxels simultaneously when iterating to the next hyper-voxel. The outward search is repeated until the edge of each super voxel is found. Then the same level of all super-voxels is examined before searching in depth into the map. When all leaf nodes are searched or none of the searched nodes have their label, the label setting of the current superpixel ends.
Step 3.2.4, after the search of all the superpixels is finished and the labels are marked, the center of each superpixel cluster is updated by taking the average value of all the components of the superpixels. This operation is performed in an iterative manner until convergence.
Step 3.3: and eliminating pixel points which do not belong to the target in the point cloud by using the point cloud segmentation result, and completing the three-dimensional point cloud segmentation of the object level.
The specific process of the step 3.3 is as follows: the three-dimensional point cloud is processed by a VCCS algorithm to obtain a group of different curved surface slices, and the result of the algorithm can be represented as an adjacent graph G ═ V, E }, wherein V represents a curved surface slice ViE represents different surface patches ViAnd VjSet of connected edges eiA collection of (a). Each curved sheet corresponds to a center of gravity ciSum normal vector niThus, the segmentation of the three-dimensional scene constitutes a map partition.
Due to the existence of noise, the classification method of connecting edges by only using normal vectors has insufficient segmentation precision. As shown in FIG. 4For this purpose, the fusion support plane is used to eliminate the influence of noise. Suppose there are K support planes { s } in the point cloud1,s2,...,skAll the surface patches are contained in these support planes and define the variable bi},{bi}∈[0,K],biK denotes that the surface belongs to the support plane sk
And for all the curved surface pieces, generating a supporting plane of the point cloud by using the centroid and normal vector of the curved surface pieces, and setting M candidate planes which are generated preliminarily after removing redundancy through a brief repeated plane.
Firstly, a final point cloud support plane set is determined from M planes, P support planes meeting the geometric relationship are obtained from the point cloud support plane set, all the curved surface pieces are distributed to the P support planes, and each curved surface piece corresponds to one label lpWherein l ispE {0, 1, 2.,. p }, and if the curved surface sheet does not belong to any plane, the label of the curved surface sheet is 0; if two connected curved surface pieces belong to the same plane, the corresponding edge weight is assigned to be 0, otherwise, the edge weight is assigned to be 1, as shown in formula (4):
Figure BDA0002752466270000121
wherein li,ljA label representing the contiguous sheet of curved surface,
Figure BDA0002752466270000122
and representing the weights of the connected edges of the curved surface pieces, constructing a graph optimization problem by using the curved surface pieces and the connected edges, and endowing the label with the minimum edge weight sum as the optimal segmentation, thereby completing the object segmentation of the point cloud scene.
And 4, step 4: and (3) fusing the segmentation module in the step (3) and the detection module in the step (2) into an ORB _ SLAM2 visual SLAM framework to obtain a three-dimensional semantic map containing object information.
As shown in fig. 5, the specific process of step 4 is:
step 4.1: data association and object model updating of the three-dimensional semantic map;
the specific process of the step 4.1 is as follows: firstly, for each detection of a key frame, selecting an object after target segmentation as an object candidate set, searching in the neighborhood of each three-dimensional point of the current target before the current target is added into a map, finding out a three-dimensional point closest to the point from point cloud data of the object candidate set, and calculating the Euler distance between the two points.
If the Euler distance between two points is less than a certain threshold, the two points can be regarded as the same point, and the similarity between the current target and the object candidate set can be calculated, and the calculation formula is represented as:
Figure BDA0002752466270000123
wherein M represents the number of three-dimensional points with Euler distances smaller than a threshold value in the current target, and N represents the total number of the three-dimensional points of each candidate object in the candidate object set. And calculating the similarity between the current target and each candidate object, if the similarity meets a threshold value, determining the current target is the same target, associating the current target with the current target information, and maintaining the object models together, otherwise, adding a new object model.
Step 4.2: and constructing a three-dimensional semantic map based on Octomap.
The specific process of the step 4.2 is as follows: the OctreeImage based map form has the advantages of flexible map storage, small memory occupation and real-time update support. The probability p is used in the octree to express whether a leaf is occupied, and is expressed as:
Figure BDA0002752466270000131
alpha represents the corresponding node state, and the probability p takes the value from 0 to 1. Let n be a certain node in the octree, z be the observation data, and L (nz) be the node information alpha of a certain node from the beginning to the time t1:t-1) (ii) a The node information at time t +1 can be expressed as:
L(n|z1:t)=L(n|z1:t-1)+L(n|zt) (7)
if the depth of a certain pixel in the depth map is d, there is occupied data on the spatial node corresponding to the value d, and there is no occupied information in the image area where the depth value is smaller than d. The node information in the octree map can be updated quickly according to the above formula.
Octope-tree maps based on Octope not only store the probability of voxel occupation, but also store the color information of objects, in order to mark the semantic information of the objects, different colors are adopted to represent different object categories, different target objects are colored according to the corresponding relation set by the user, and then the three-dimensional color semantic map containing both environment geometric information and object semantic information can be obtained, so that the recognition work of the robot on the objects in the map is realized.
According to the robot semantic map object identification method based on deep learning, detailed semantic information acquired by a target detection module is fused into a visual SLAM according to the visual SLAM principle of a mobile robot, a three-dimensional target segmentation algorithm is used for segmenting objects in a map to obtain an environment semantic segmentation result, the three-dimensional environment semantic map is established, an accurate semantic map is established, object information in the map is stored, the mobile robot can identify the objects in the map, and positioning is guided by matching of the objects.
The invention discloses a semantic map object recognition method of a robot based on deep learning, which is characterized in that a target detection module and a three-dimensional object segmentation module are integrated according to the visual SLAM principle of a mobile robot, an octree map based on an Octomap library stores the color information of objects, a three-dimensional environment semantic map is established, and the recognition of the robot to the objects in the map is completed. By the target detection method based on deep learning, 13 convolution layers of 3 multiplied by 3 are added on the basis of the YOLO-tiny network to improve the depth of the network, so that the network can learn information of a deeper layer in an input image, and the aim of improving the detection precision is fulfilled. And adding the target detection result into the SLAM thread to enable semantic information to be fused into the point cloud map. And (4) carrying out environmental object segmentation on the three-dimensional map through a target segmentation algorithm to construct a three-dimensional environmental point cloud map. The mobile robot can identify objects in the map, guide positioning by matching of the objects, and complete high-level human-computer interaction tasks.

Claims (10)

1. A semantic map object recognition method of a robot based on deep learning is characterized by comprising the following steps:
step 1: acquiring image information of each frame of the surrounding environment in real time, and selecting image key frames;
step 2: performing target detection on the selected image key frame through a convolutional neural network, and extracting semantic information;
and step 3: completing the segmentation of the three-dimensional point cloud object level by using a three-dimensional point cloud segmentation method for the detected key frame image output in the step 2;
and 4, step 4: and (3) fusing the segmentation module in the step (3) and the detection module in the step (2) into an ORB _ SLAM2 visual SLAM framework to obtain a three-dimensional semantic map containing object information.
2. The deep learning based semantic map object recognition method for the robot as claimed in claim 1, wherein in step 1, an RGB-D visual sensor is used for collecting environmental image information in real time; the specific process of selecting the image key frame in the step 1 is as follows: in 10-15 frames of images, one frame of image with good image quality, sufficient number of characteristic points and uniform distribution of the characteristic points is taken as a key.
3. The deep learning-based semantic robot map object recognition method according to claim 1, wherein the specific process of target detection in the step 2 is as follows:
step 2.1: making a data set with object markers; labeling the image sample by using an image labeling tool LableImg; labeling different objects with a text box to generate a labeling file conforming to a network format;
step 2.2: setting a network hyper-parameter to train a YOLOV3-tiny improved network to obtain a proper network model;
step 2.3: and sending the selected image key frame into a YOLO model for object detection to obtain the position and classification probability of the object.
4. The deep learning-based semantic robot map object recognition method according to claim 1, wherein the point cloud segmentation method in step 3 specifically comprises the following steps:
step 3.1: performing two-dimensional target segmentation on the key frame image output in the step 2 by using a GrabCT algorithm in the SLAM process;
step 3.2: performing three-dimensional point cloud segmentation by using an improved VCCS algorithm;
step 3.3: and eliminating pixel points which do not belong to the target in the point cloud by using the point cloud segmentation result, and completing the three-dimensional point cloud segmentation of the object level.
5. The deep learning-based semantic robot map object recognition method according to claim 4, wherein the specific process of the step 3.1 is as follows:
step 3.1.1: defining one or more rectangles containing objects in the key frame picture, wherein the areas outside the rectangles are automatically considered as backgrounds;
step 3.1.2: modeling the background and foreground with a gaussian mixture model and labeling undefined pixels as possible foreground or background;
for each pixel in the image, it is modeled by a GMM with K gaussian components, and the final segmentation is performed using a vector K ═ K1,k2,...knDenotes that each element in the vector represents a corresponding one of the components belonging to k, i.e. knE.g., {1,2,. K }, while distinguishing the foreground from the background;
for each pixel, a gaussian component from the target GMM or a gaussian component from the background GMM is calculated by substituting the RGB three channel values into the GMM model, and the energy function of the whole image is expressed as:
E(α,k,θ,z)=U(α,k,θ,z)+V(α,z) (1)
u () is a region item and represents the penalty of classifying a pixel as a target or a background, V () is a boundary item and represents the penalty of discontinuity between adjacent pixels, and the Gaussian mixture model is in the form of:
Figure FDA0002752466260000031
wherein pi represents the weight of each Gaussian component, mu represents the mean vector of each Gaussian component, and is a three-element vector for a three-channel image, and sigma represents a covariance matrix;
step 3.1.3: each pixel (i.e. a node in the algorithm) is connected with a foreground or background node, and after the nodes are connected, if edges between the nodes belong to different terminals, the edges between the nodes are cut off, so that each part of the image can be segmented to construct an initial semantic point cloud map.
6. The deep learning-based semantic map object recognition method for the robot according to claim 5, wherein the step 3.2 comprises the following specific processes: for the three-dimensional point cloud obtained by the visual SLAM system, point cloud segmentation is carried out by using a Super Voxel-based point cloud segmentation method, and Voxel clustering is carried out by using point cloud Voxel connectivity, so that a three-dimensional point cloud clustering segmentation algorithm for target segmentation is realized.
7. The deep learning-based robot semantic map object recognition method according to claim 6, wherein in step 3.2.1, a neighborhood map of a voxel point cloud is constructed, three neighboring definitions are provided in a voxelized three-dimensional space, the three neighboring definitions are 6, 18 and 26, the voxels share a surface, an edge and a fixed point, the VCCS selects 26 neighboring voxels, the neighborhood map is realized by a Kd-tree, the resolution of the voxels for segmentation is recorded as Rvoxel, and the centers of all 26 neighboring voxels are kept at the same position
Figure FDA0002752466260000032
Performing the following steps;
step 3.2.2, generating and filtering seed point cloud after the adjacency graph is established;
firstly, initializing a hyper-voxel by utilizing a plurality of seed point clouds, and dividing a space point cloud into a voxel network with a selected resolution Rseed, wherein the Rseed determines the distance between the hyper-voxels, and the Rseed is used for determining whether a sufficient number of seeds occupy the voxel; selecting a point closest to the center of a seed voxel in the point cloud as a seed candidate point, and deleting a noise point seed according to the Rsearch after the candidate point is determined;
step 3.2.3, after the seed voxel is selected, the super voxel characteristic vector can be initialized, then iterative clustering is carried out based on constraint, voxel points are assigned to the super voxel in an iterative manner, and the distance from each voxel to the center of the super voxel is calculated by using the formula (3):
Figure FDA0002752466260000041
wherein D isc,Dn,DsRespectively representing the difference between the voxel and the center color of the hyper-voxel, the difference between the normal and the distance, wc,wn,wsRespectively representing weights on different differences, and determining the shape of the hyper-voxel;
the iterative process for each superpixel is: point cloud cluster center voxel flows outwards → the distance from the center neighborhood voxel to the center of the super voxel is calculated → if the minimum distance of the voxel is met → a voxel label is set, the farther neighborhood voxel is searched → the next super voxel iteration is carried out;
when iterating to the next superpixel, all the superpixels are considered at the same time from each layer from the center to the outside; iteratively searching outward until an edge is found for each super voxel; then, before searching in depth for the map, checking the same level of all super voxels; when all leaf nodes are searched or the searched nodes do not have labels, the label setting of the current hyper-voxel is finished;
step 3.2.4, after the search of all the hyper-voxels is finished and the labels are drawn, the center of each hyper-voxel cluster is updated by taking the average of all its components, and this operation is performed in an iterative manner until convergence.
8. The deep learning-based semantic map object recognition method for the robot according to claim 7, wherein the step 3.3 specifically comprises the following steps: the three-dimensional point cloud is processed by a VCCS algorithm to obtain a group of different curved surface slices, and the result of the algorithm is expressed as an adjacent graph G ═ V, E }, wherein V represents a curved surface slice ViE represents different surface patches ViAnd VjSet of connected edges eiEach curved sheet corresponding to a center of gravity ciSum normal vector niThus, the division of the three-dimensional scene is constructed into image partitions;
eliminating the influence caused by noise by adopting the fusion support plane, and assuming that K support planes { s } exist in the point cloud1,s2,...,skAll the surface patches are contained in these support planes and define the variable bi},{bi}∈[0,K],biK denotes that the surface belongs to the support plane sk
For all the curved surface pieces, generating a supporting plane of point cloud by using the centroid and normal vector of the curved surface pieces, and setting M candidate planes which are generated preliminarily after removing redundancy of a brief repeated plane;
firstly, a final point cloud support plane set is determined from M planes, P support planes meeting the geometric relationship are obtained from the point cloud support plane set, all the curved surface pieces are distributed to the P support planes, and each curved surface piece corresponds to one label lpWherein l ispE {0, 1, 2.,. p }, and if the curved surface sheet does not belong to any plane, the label of the curved surface sheet is 0; if two connected curved surface pieces belong to the same plane, the corresponding edge weight is assigned to be 0, otherwise, the edge weight is assigned to be 1, as shown in formula (4):
Figure FDA0002752466260000051
wherein li,ljA label representing the contiguous sheet of curved surface,
Figure FDA0002752466260000052
and representing the weights of the connected edges of the curved surface pieces, constructing a graph optimization problem by using the curved surface pieces and the connected edges, and endowing the label with the minimum edge weight sum as the optimal segmentation, thereby completing the object segmentation of the point cloud scene.
9. The deep learning-based semantic robot map object recognition method according to claim 1, wherein the step 4 specifically comprises the following steps:
step 4.1: data association and object model updating of the three-dimensional semantic map;
step 4.2: and constructing a three-dimensional semantic map based on Octomap.
10. The deep learning-based semantic map object recognition method for the robot according to claim 9, wherein the step 4.1 specifically comprises the following steps: firstly, for each detection of a key frame, selecting an object after target segmentation as an object candidate set, searching in the neighborhood of each three-dimensional point of the current target before the current target is added into a map, finding out a three-dimensional point closest to the point from point cloud data of the object candidate set, and calculating the Euler distance between the two points;
if the Euler distance between two points is less than a certain threshold, the two points are considered as the same point, and the similarity between the current target and the object candidate set is calculated, wherein the calculation formula is represented as:
Figure FDA0002752466260000061
the method comprises the steps that M represents the number of three-dimensional points with Euler distances smaller than a threshold value in a current target, N represents the total number of three-dimensional points of each candidate object in a candidate object set, the similarity between the current target and each candidate object is calculated, if the similarity meets the threshold value, the current target and the current target are considered to be the same target, information of the current target and the current target are associated, object models are maintained together, and otherwise, a new object model is added;
the specific process of the step 4.2 is as follows: in the octal tree map form based on Octomap, a probability p is used in an octal tree to express whether a leaf is occupied, and the probability is expressed as:
Figure FDA0002752466260000062
alpha represents the corresponding node state, the probability p takes values from 0 to 1, a certain node in the octree is set as n, the observation data is set as z, and the node information alpha of the certain node from the beginning to the time t is represented as L (n | z1:t-1) (ii) a The node information at time t +1 is represented as:
L(n|z1:t)=L(n|z1:t-1)+L(n|zt) (7)
if the depth of a certain pixel in the depth map is d, occupied data exists on a space node corresponding to the value d, no occupied information exists in an image area with the depth value smaller than d, and node information in the octree map is updated rapidly according to the above formula.
CN202011189866.3A 2020-10-30 2020-10-30 Robot semantic map object recognition method based on deep learning Pending CN112288857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011189866.3A CN112288857A (en) 2020-10-30 2020-10-30 Robot semantic map object recognition method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011189866.3A CN112288857A (en) 2020-10-30 2020-10-30 Robot semantic map object recognition method based on deep learning

Publications (1)

Publication Number Publication Date
CN112288857A true CN112288857A (en) 2021-01-29

Family

ID=74354114

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011189866.3A Pending CN112288857A (en) 2020-10-30 2020-10-30 Robot semantic map object recognition method based on deep learning

Country Status (1)

Country Link
CN (1) CN112288857A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160292A (en) * 2021-04-27 2021-07-23 武汉理工大学 Laser radar point cloud data three-dimensional modeling device and method based on intelligent mobile terminal
CN113160293A (en) * 2021-05-13 2021-07-23 南京信息工程大学 Complex scene ground station point cloud automatic registration method based on feature probability
CN113222961A (en) * 2021-05-27 2021-08-06 大连海事大学 Intelligent ship body detection system and method
CN113405547A (en) * 2021-05-21 2021-09-17 杭州电子科技大学 Unmanned aerial vehicle navigation method based on semantic VSLAM
CN113485373A (en) * 2021-08-12 2021-10-08 苏州大学 Robot real-time motion planning method based on Gaussian mixture model
CN113547501A (en) * 2021-07-29 2021-10-26 中国科学技术大学 SLAM-based mobile mechanical arm cart task planning and control method
CN113554701A (en) * 2021-07-16 2021-10-26 杭州派珞特智能技术有限公司 PDS tray intelligent identification and positioning system and working method thereof
CN113703462A (en) * 2021-09-02 2021-11-26 东北大学 Unknown space autonomous exploration system based on quadruped robot
CN113744397A (en) * 2021-07-30 2021-12-03 中南大学 Real-time object-level semantic map construction and updating method and device
CN113888611A (en) * 2021-09-03 2022-01-04 北京三快在线科技有限公司 Method and device for determining image depth and storage medium
CN114359493A (en) * 2021-12-20 2022-04-15 中国船舶重工集团公司第七0九研究所 Method and system for generating three-dimensional semantic map for unmanned ship
CN115937451A (en) * 2022-12-16 2023-04-07 武汉大学 Dynamic scene multi-semantic map construction method and device based on visual SLAM
CN116704137A (en) * 2023-07-27 2023-09-05 山东科技大学 Reverse modeling method for point cloud deep learning of offshore oil drilling platform

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150009214A1 (en) * 2013-07-08 2015-01-08 Vangogh Imaging, Inc. Real-time 3d computer vision processing engine for object recognition, reconstruction, and analysis
CN108304798A (en) * 2018-01-30 2018-07-20 北京同方软件股份有限公司 The event video detecting method of order in the street based on deep learning and Movement consistency

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150009214A1 (en) * 2013-07-08 2015-01-08 Vangogh Imaging, Inc. Real-time 3d computer vision processing engine for object recognition, reconstruction, and analysis
CN108304798A (en) * 2018-01-30 2018-07-20 北京同方软件股份有限公司 The event video detecting method of order in the street based on deep learning and Movement consistency

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李佳俊: ""基于RGBD 的无人机语义SLAM 系统原型设计及实现"", 《中国优秀硕士学位论文全文数据库》, pages 1 - 54 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160292B (en) * 2021-04-27 2023-10-31 武汉理工大学 Laser radar point cloud data three-dimensional modeling device and method based on intelligent mobile terminal
CN113160292A (en) * 2021-04-27 2021-07-23 武汉理工大学 Laser radar point cloud data three-dimensional modeling device and method based on intelligent mobile terminal
CN113160293A (en) * 2021-05-13 2021-07-23 南京信息工程大学 Complex scene ground station point cloud automatic registration method based on feature probability
CN113160293B (en) * 2021-05-13 2023-06-20 南京信息工程大学 Automatic registration method for complex scene ground site cloud based on feature probability
CN113405547B (en) * 2021-05-21 2022-03-22 杭州电子科技大学 Unmanned aerial vehicle navigation method based on semantic VSLAM
CN113405547A (en) * 2021-05-21 2021-09-17 杭州电子科技大学 Unmanned aerial vehicle navigation method based on semantic VSLAM
CN113222961A (en) * 2021-05-27 2021-08-06 大连海事大学 Intelligent ship body detection system and method
CN113554701A (en) * 2021-07-16 2021-10-26 杭州派珞特智能技术有限公司 PDS tray intelligent identification and positioning system and working method thereof
CN113547501A (en) * 2021-07-29 2021-10-26 中国科学技术大学 SLAM-based mobile mechanical arm cart task planning and control method
CN113744397A (en) * 2021-07-30 2021-12-03 中南大学 Real-time object-level semantic map construction and updating method and device
CN113744397B (en) * 2021-07-30 2023-10-24 中南大学 Real-time object-level semantic map construction and updating method and device
CN113485373B (en) * 2021-08-12 2022-12-06 苏州大学 Robot real-time motion planning method based on Gaussian mixture model
CN113485373A (en) * 2021-08-12 2021-10-08 苏州大学 Robot real-time motion planning method based on Gaussian mixture model
CN113703462A (en) * 2021-09-02 2021-11-26 东北大学 Unknown space autonomous exploration system based on quadruped robot
CN113888611A (en) * 2021-09-03 2022-01-04 北京三快在线科技有限公司 Method and device for determining image depth and storage medium
CN114359493A (en) * 2021-12-20 2022-04-15 中国船舶重工集团公司第七0九研究所 Method and system for generating three-dimensional semantic map for unmanned ship
CN115937451A (en) * 2022-12-16 2023-04-07 武汉大学 Dynamic scene multi-semantic map construction method and device based on visual SLAM
CN115937451B (en) * 2022-12-16 2023-08-25 武汉大学 Dynamic scene multi-semantic map construction method and device based on visual SLAM
CN116704137A (en) * 2023-07-27 2023-09-05 山东科技大学 Reverse modeling method for point cloud deep learning of offshore oil drilling platform
CN116704137B (en) * 2023-07-27 2023-10-24 山东科技大学 Reverse modeling method for point cloud deep learning of offshore oil drilling platform

Similar Documents

Publication Publication Date Title
CN112288857A (en) Robot semantic map object recognition method based on deep learning
CN108648233B (en) Target identification and capture positioning method based on deep learning
CN108898605B (en) Grid map segmentation method based on map
Lim et al. 3D terrestrial LIDAR classifications with super-voxels and multi-scale Conditional Random Fields
Sun et al. Aerial 3D building detection and modeling from airborne LiDAR point clouds
Xu et al. Voxel-and graph-based point cloud segmentation of 3D scenes using perceptual grouping laws
CN107273905B (en) Target active contour tracking method combined with motion information
JP2008217706A (en) Labeling device, labeling method and program
CN110188763B (en) Image significance detection method based on improved graph model
CN113408584B (en) RGB-D multi-modal feature fusion 3D target detection method
CN107194929B (en) Method for tracking region of interest of lung CT image
Kohli et al. Dynamic graph cuts and their applications in computer vision
CN113362341B (en) Air-ground infrared target tracking data set labeling method based on super-pixel structure constraint
Lin et al. Temporally coherent 3D point cloud video segmentation in generic scenes
CN112396655B (en) Point cloud data-based ship target 6D pose estimation method
CN112200248A (en) Point cloud semantic segmentation method, system and storage medium under urban road environment based on DBSCAN clustering
Triebel et al. Parsing outdoor scenes from streamed 3d laser data using online clustering and incremental belief updates
Li A super voxel-based Riemannian graph for multi scale segmentation of LiDAR point clouds
CN111986223B (en) Method for extracting trees in outdoor point cloud scene based on energy function
Goswami et al. Multi-faceted hierarchical image segmentation taxonomy (MFHIST)
Huang et al. Semantic labeling and refinement of LiDAR point clouds using deep neural network in urban areas
Kaleci et al. Plane segmentation of point cloud data using split and merge based method
Rong et al. 3D semantic labeling of photogrammetry meshes based on active learning
CN113837215B (en) Point cloud semantic and instance segmentation method based on conditional random field
CN111462181B (en) Video single-target tracking method based on rectangular asymmetric inverse layout model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination