CN109816686A

CN109816686A - Robot semanteme SLAM method, processor and robot based on object example match

Info

Publication number: CN109816686A
Application number: CN201910037102.3A
Authority: CN
Inventors: 吴皓; 迟金鑫; 马庆; 焦梦林
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2019-01-15
Filing date: 2019-01-15
Publication date: 2019-05-28

Abstract

Present disclose provides robot semanteme SLAM method, processor and robots based on object example match.Wherein, robot semanteme SLAM method carries out feature point extraction, matching and tracking to every frame image to estimate camera motion including obtaining the image sequence shot in robot operational process；Key frame is extracted, example segmentation is carried out to key frame, obtains all object examples in every frame key frame；Feature point extraction is carried out to key frame and calculates feature point description, feature extraction is carried out to all objects example in key frame and coding carrys out the feature description vectors of calculated examples, while obtaining example three-dimensional point cloud；Respectively to the characteristic point and object example progress Feature Points Matching and example match between adjacent key frame；The matching of fusion feature point and example match carry out local nonlinearity optimization to the pose estimated result of SLAM, obtain the key frame for carrying object example semantic markup information, and map that in example three-dimensional point cloud, construct three-dimensional semantic map.

Description

Robot semanteme SLAM method, processor and robot based on object example match

Technical field

The disclosure belongs to robot navigation's technical field more particularly to a kind of robot based on object example match is semantic SLAM method, processor and robot.

Background technique

Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill Art.

In robot navigation field, synchronous superposition (Simultaneous Localization and Mapping, SLAM) refer to: robot is arrived by repeated measures during the motion from the unknown place of circumstances not known Environmental characteristic positions self-position and posture, further according to itself pose constructing environment map, to reach while position and map The purpose of building.The purpose of SLAM technology mainly solves positioning and map structuring both of these problems, since proposition, rapid The concern and research for having arrived numerous scholars are considered as realizing the key technology of full autonomous mobile robot all the time.Needle To some specific robots (such as unmanned plane) due to itself can not carry odometer carry out location estimation and laser radar at Originally the reasons such as higher, in recent years, SLAM (Visual SLAM, vSLAM) technology of view-based access control model has received widespread attention and grinds Study carefully, vSLAM mainly includes that modules, the implementations such as visual odometry, rear end optimization, map structuring and closed loop detection mainly have spy Sign point method and direct method；Data source mainly has monocular, binocular and RGB-D video flowing.

However tradition vSLAM algorithm dependent on the feature extracting and matching algorithm based on the low semantic hierarchies such as point, line, surface come Estimate camera motion, not only lack semantic information, but also the robustness of characteristic matching is lower, easily causes biggish estimation and miss Difference.The closed loop detection algorithm of tradition vSLAM has very strong viewpoint dependencies simultaneously, is easy to produce in complicated or duplicate environment Raw biggish error rate.Semantic SLAM is allowed the robot to by the way that semantic information and vSLAM are carried out effective integration from geometry Environment is perceived with two aspects of content, improves the service ability of robot and the intelligence of human-computer interaction.But invention human hair Be mostly for the research work of semanteme SLAM at present now need known to threedimensional model as priori knowledge, or only to having Several object categories of limit carry out semantic segmentation and the individual of indistinguishable object.

Summary of the invention

According to the one aspect of one or more other embodiments of the present disclosure, a kind of machine based on object example match is provided Human speech justice SLAM method can be realized object individual in identification scene and construct three-dimensional semantic map, while again can be based on difference Example match between key frame optimizes the estimated result of SLAM pose.

A kind of robot semanteme SLAM method based on object example match of the disclosure, comprising:

Obtain the image sequence that shoots in robot operational process, to every frame image carry out feature point extraction, matching and with Track estimates camera motion；

Key frame is extracted, example segmentation is carried out to key frame, obtains all object examples in every frame key frame；

Feature point extraction is carried out to key frame and calculates feature point description, all objects example in key frame is carried out Feature extraction and coding carry out the feature description vectors of calculated examples, while obtaining example three-dimensional point cloud；

According to feature point description and feature description vectors, respectively to the characteristic point and object example between adjacent key frame Carry out Feature Points Matching and example match；

The matching of fusion feature point and example match carry out local nonlinearity optimization to the pose estimated result of SLAM, are taken Key frame with object example semantic markup information；

The key frame for carrying object example semantic markup information is mapped in example three-dimensional point cloud, three-dimensional semanteme is constructed Map.

In one or more embodiments, during estimating camera motion, consecutive frame is solved using light-stream adjustment Camera motion:

First to consecutive frame image carry out ORB feature point extraction with match, obtain several pairs of ORB characteristic points of consecutive frame；

Then non-linear least square problem is constructed according to this several pairs of ORB characteristic points, solution obtains the pose of camera.

In one or more embodiments, during extracting key frame, the size of interframe relative motion distance is made For the foundation for extracting the key frame in image sequence.

In one or more embodiments, if interframe relative motion distance between permission interframe minimum relative motion distance Between maximum relative motion distance, then present frame is key frame.

In one or more embodiments, using example segmentation framework-Mask R-CNN network based on deep learning come Example segmentation is carried out to key frame images, to obtain all examples in every frame key frame images；Wherein, Mask R-CNN network It adds a Ge Quan convolutional neural networks branch on the basis of Faster R-CNN to be used to export example mask, thus to detection block The profile of middle example carries out the segmentation of Pixel-level.

In one or more embodiments, the process of the feature description vectors of calculated examples includes:

Visual vocabulary table is constructed using training set based on VLAD algorithm, grid is carried out to every frame image in training set and is drawn Point, dense SIFT feature and RGB color value are extracted to each grid element center, obtain the feature description vectors of each grid；

Obtained grid search-engine description vectors are clustered into the class of preset quantity using k-mean algorithm, and calculate each net Lattice feature description vectors with its cluster in residual vector, the normalization of power rate and L2 norm normalizing are carried out to all residual vector Change, then carry out feature coding using detection block image of the normalized residual vector to example, obtains the feature description of example Vector.

According to the other side of one or more other embodiments of the present disclosure, a kind of machine based on object example match is provided Device human speech justice SLAM processor can be realized object individual in identification scene and construct three-dimensional semantic map, while can be based on again Example match between different key frames optimizes the estimated result of SLAM pose.

A kind of robot semanteme SLAM processor based on object example match of the disclosure, comprising:

Camera motion estimation module is used to obtain the image sequence shot in robot operational process, to every frame image Feature point extraction, matching and tracking are carried out to estimate camera motion；

Case-based system module is used to extract key frame, carries out example segmentation to key frame, obtains in every frame key frame All object examples；

Feature describing module is used to carry out feature point extraction to key frame and calculates feature point description, to key frame In all objects example carry out feature extraction and coding carrys out the feature description vectors of calculated examples, while obtaining example three-dimensional point Cloud；

Characteristic point and example match module, are used for according to feature point description and feature description vectors, respectively to adjacent Characteristic point and object example between key frame carry out Feature Points Matching and example match；

Pose Estimation Optimization module, be used for the matching of fusion feature point and example match to the pose estimated result of SLAM into The optimization of row local nonlinearity obtains the key frame for carrying object example semantic markup information；

Three-dimensional semanteme map structuring module is used to the key frame for carrying object example semantic markup information being mapped to reality In example three-dimensional point cloud, three-dimensional semantic map is constructed.

In one or more embodiments, it in the camera motion estimation module, is solved using light-stream adjustment adjacent The camera motion of frame:

In one or more embodiments, in the case-based system module, the size of interframe relative motion distance is made For the foundation for extracting the key frame in image sequence.

In one or more embodiments, in the case-based system module, divided using the example based on deep learning Frame-Mask R-CNN network to carry out example segmentation to key frame images, to obtain all realities in every frame key frame images Example；Wherein, Mask R-CNN network adds a Ge Quan convolutional neural networks branch on the basis of Faster R-CNN for defeated Example mask out, to carry out the segmentation of Pixel-level to the profile of example in detection block.

In one or more embodiments, in the feature describing module, the mistake of the feature description vectors of calculated examples Journey includes:

In one or more embodiments, in the case-based system module, if interframe relative motion distance is between permission Interframe minimum relative motion distance and maximum relative motion distance between, then present frame is key frame.

According to the other side of one or more other embodiments of the present disclosure, a kind of machine based on object example match is provided Device human speech justice SLAM robot can be realized object individual in identification scene and construct three-dimensional semantic map, while can be based on again Example match between different key frames optimizes the estimated result of SLAM pose.

A kind of semanteme SLAM robot, robot based on object example match of the disclosure, including it is described above based on The robot semanteme SLAM processor of object example match.

The beneficial effect of the disclosure is:

(1) the robot semanteme SLAM method based on object example match that the disclosure provides, for base under indoor environment In the semantic SLAM method of RGB-D video flowing, by combine the currently advanced example partitioning algorithm based on deep learning and VSLAM algorithm realizes the individual of various objects detectable and in identification scene, and is building up in three-dimensional semantic map, Simultaneously using the pose estimated result of object example match optimization SLAM, the method for improving vSLAM positioning accuracy.

(2) the semantic SLAM method that disclosure Case-based Reasoning cutting techniques and vSLAM technology are realized divides skill using example Art carries out all examples in processing acquisition key frame to key frame, and example is mapped in three-dimensional point cloud by vSLAM Construct the semantic map of three-dimensional towards object example.

(3) disclosure and semantic segmentation unlike traditional semantic segmentation technology only to the object category in image into Row is distinguished, and example cutting techniques can distinguish the Different Individual of same category object, and at the same time eliminating example Background pixel.The disclosure passes through fusion feature point matching result and example match result carry out office in different crucial interframe simultaneously The positioning accuracy of portion's nonlinear optimization raising vSLAM.

Detailed description of the invention

The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.

Fig. 1 is a kind of semantic SLAM embodiment of the method flow diagram based on object example match of the disclosure.

Fig. 2 is the local nonlinearity optimization method embodiment schematic diagram of the vSLAM of the disclosure.

Fig. 3 is a kind of semantic SLAM processor example structure schematic diagram based on object example match of the disclosure.

Specific embodiment

It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another It indicates, all technical and scientific terms used herein has usual with disclosure person of an ordinary skill in the technical field The identical meanings of understanding.

It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.

Term is explained:

(1) ORB:Oriented FASTand Rotated BRIEF algorithm is the characteristic point detection of current most fast and stable And extraction algorithm, many image mosaics and target tracking technology are realized using ORB feature.

(2) RGBD=RGB+Depth Map；

RGB:RGB color mode is a kind of color standard of industry, is by red (R), green (G), blue (B) three face To obtain miscellaneous color, RGB is to represent red, green, blue for the variation of chrominance channel and their mutual superpositions The color in three channels, this standard almost include all colours that human eyesight can perceive, and are current with most wide One of color system.

Depth Map: in 3D computer graphics, Depth Map (depth map) is comprising the scenario objects with viewpoint The image or image channel of the information of the distance dependent on surface.Wherein, Depth Map is similar to gray level image, and only its is every A pixel value is the actual range of sensor distance object.Usual RGB image and Depth image are registrations, thus pixel Between have one-to-one corresponding relationship.

(3) ICP (Iterative Closest Point iteration closest approach) algorithm is a kind of point set to point set registration side Method.The essence of ICP algorithm is the Optimum Matching based on least square method, it repeats " to determine point set → meter of corresponding relationship Calculate optimal rigid body translation " process, until some indicates that correct matched convergence criterion is met.

(4) MSCOCO data set is a data set of Microsoft's building, it includes detection, segmentation, The tasks such as keypoints.MSCOCO is primarily to solution detecting non-iconic views of objects is (right The detection that should often say), contextual reasoning between objects and the precise 2D Problem under localization of objects (the corresponding segmentation problem often said) these three scenes.

As shown in Figure 1, a kind of semantic SLAM method based on object example match that the embodiment of the present disclosure provides, by right The RGB-D video sequence of depth camera is handled, to estimate camera motion and the simultaneously semantic map of three-dimensional of constructing environment, and Object example match result optimizing vSLAM pose estimated result is utilized simultaneously, improves positioning accuracy.It specifically includes following:

Step 1: obtain the image sequence that shoots in robot operational process, to every frame image carry out feature point extraction, Match and tracks to estimate camera motion.

Specifically, feature point extraction is carried out to image data, camera motion is estimated in matching with tracking, realizes vision mileage The function of meter；The image data is the RGB-D image sequence of depth camera shooting, is shot during the motion for robot RGB-D image collection I={ I¹..., I^N, wherein IⁿIt is the n-th width RGB-D image data；The depth camera that robot carries Internal reference matrix is K.

The camera motion that consecutive frame is solved using light-stream adjustment (Bundle Adjustment, BA), first to consecutive frame RGB-D image carry out ORB feature point extraction with match, obtain the n of consecutive frame to ORB characteristic point, then according to this n to ORB Characteristic point constructs non-linear least square problem, is shown below, finally solves the optimization problem and obtain the pose ξ of camera^*。

Building non-linear least square and the movement that consecutive frame camera is solved using BA, used cost function:

Wherein, ξ^*It is the representation of Lie algebra form of the camera pose obtained after BA optimizes, min_ξExpression passes through optimized variable ξ minimizes cost function, u_iIt is i-th point of the pixel coordinate observed, exp (ξ ^) is that the Lie group of camera pose indicates shape Formula is a four-matrix, P_iIt is i-th point of 3D coordinate, s_iIt is i-th point of depth value.

Step 2: extracting key frame, example segmentation is carried out to key frame, all objects obtained in every frame key frame are real Example.

In specific implementation, using the size of interframe relative motion distance as the key frame extracted in RGB-D image sequence Foundation, interframe rotating vector R and translation vector t are calculated first, then calculates the distance D of interframe relative motion, it is as follows Shown in formula:

D=‖ Δ t ‖+min (2 π-‖ R ‖, ‖ R ‖)

Wherein, Δ t indicates that translation vector is poor；The length of ‖ ‖ expression vector.

Key frame is selected according to interframe relative motion distance D, rule is as follows:

If 1) D_min≤D≤D_max, then Frame_curr=Frame_key；

If 2) D < D_minOr D > D_max, then Frame_curr≠Frame_key

Wherein, D_minAnd D_maxIt is allowed interframe minimum relative motion distance and maximum relative motion distance respectively, Frame_currFor present frame, Frame_keyFor key frame.

In specific implementation, using based on the example of deep learning segmentation framework-Mask R-CNN network come to key frame Image carries out example segmentation, to obtain all examples in every frame key frame images, the target detection including object Different Individual The mask of example pixel grade after frame and removal background.

In this step, using based on the example segmentation framework-Mask R-CNN of deep learning come to key frame images into The segmentation of row example, to obtain all examples in every frame key frame images, detection block and removal back including object Different Individual The mask of example pixel grade after scape.

Wherein, Mask R-CNN network is improved on the basis of Faster R-CNN network, Faster R- CNN network can carry out target detection and obtain the detection block of each target in image, and can not be to the profile of target in detection block Accurately divided.In order to carry out example segmentation, Mask R-CNN network is added on the basis of Faster R-CNN One Ge Quan convolutional neural networks branch is used to export example mask, to carry out point of Pixel-level to the profile of example in detection block It cuts.

The shortcomings that R-CNN: it is extracted even with pre-treatment steps such as selective search (selective search) Potential bounding box (frame) is as input, but R-CNN still has serious speed bottle-neck, reason it is also obvious that It is exactly to have to compute repeatedly when computer carries out feature extraction to all region, Fast-R-CNN is asked precisely in order to solving this What topic was born.

In Fast-R-CNN, bbox regression (frame recurrence) is placed into inside neural network, with region (boundary) classification and and become multi-task (multitask) model, actual experiment also turns out that the two tasks can Shared convolution feature, and mutually promote.A Fast-R-CNN critically important contribution is that multiclass detection can really guarantee standard Processing speed is promoted while true rate.

Example segmentation is carried out to key frame using Mask R-CNN network, selects MS COCO data set as Mask first The training set of R-CNN, with the different classes of target of 80 classes, used loss function are as follows: L=L_c+L_b+L_m, wherein L_cFor Error in classification, L_bFor target detection error, L_mDivide error for pixel, is defined as average two-value and intersects entropy loss.Then instruction is utilized The weight model for getting Mask R-CNN predicts every frame key frame, extracts article example all in key frame, obtains Example pixel grade mask to after the target detection frame of each example and rejecting background.

Step 3: feature point extraction is carried out to key frame and calculates feature point description, it is real to all objects in key frame Example carries out feature extraction and coding carrys out the feature description vectors of calculated examples, while obtaining example three-dimensional point cloud.

Specifically, it is primarily based on VLAD algorithm and constructs visual vocabulary table using training set, to every frame image in training set Grid dividing is carried out, dense SIFT feature and RGB color value are extracted to each grid element center, obtains the feature description of each grid VectorN grid search-engine description vectors are clustered into 64 classes using k-mean algorithm later, and are calculated Each grid search-engine description vectors with its cluster in residual vector, the normalization of power rate and L2 model are carried out to all residual vector Then number normalization carries out feature coding using detection block image of the normalized residual vector to example:

Grid dividing is carried out to each example detection block diagram picture first and dense SIFT feature is extracted, and is based on being constructed above Visual vocabulary table the image in each grid is encoded, obtain the feature description vectors ψ of example image, then use 3 Grade image space pyramid structure is distributed to count the characteristic point of example image, obtains the spatial information of example image, first will I-th layer of example image is divided into 4ⁱSub-regions, the then statistic histogram feature in each sub-regions, finally by 3 layers Secondary histogram is composed in series the feature description vectors of example

Wherein, VLAD is the abbreviation of vector of locally aggregated descriptors, is by Jegou Et al. proposed that core concept was aggregated (accumulation), is mainly used in field of image search in 2010.

VLAD algorithm can regard a kind of FV of simplification as, and main method is by clustering method one small code of training This, finds the feature in each image nearest code book cluster centre, and the difference of subsequent all features and cluster centre is done It is cumulative, the vlad matrix of a k*d is obtained, wherein k is cluster centre number, and d is intrinsic dimensionality (such as sift is 128 dimensions), with Afterwards by the matrix-expand it is the vector of one (k*d) dimension, and its L2 is normalized, obtained vector is VLAD.

VLAD algorithm flow:

(1) picture file path and feature extraction are read；

(2) using clustering method training code book；

(3) feature of every picture and nearest cluster centre are added up；

* (4) carry out PCA dimensionality reduction to the VLAD after adding up and normalize to it；

* after (5) obtain VLAD, continue to reduce storage space using ADC method and improve search speed.

Wherein step * (4), * (5) are optional, carry out the i.e. available Europe of L2 normalization after step (3) obtains the cumulative vector of residual error Family name's distance etc. calculates the similitude of two pictures to realize picture retrieval.

Step 4: according to feature point description and feature description vectors, respectively to the characteristic point and object between adjacent key frame Body example carries out Feature Points Matching and example match.

Step 5: the matching of fusion feature point and example match carry out local nonlinearity optimization to the pose estimated result of SLAM, Obtain carrying the key frame of object example semantic markup information.

Fig. 2 is the side that fusion example match and Feature Points Matching optimize vSLAM pose estimated result in the disclosure Method schematic diagram, as shown in Fig. 2, method of disclosure is on the basis of traditional characteristic point matches geometrical constraint, to be added to example match several What is constrained, and is carried out further nonlinear optimization to pose estimated result, is improved positioning accuracy.For Feature Points Matching as a result, setting z_ijIt is in pose ξ_iPlace's observation road sign characteristic point p_jThe data of generation, then cost function are as follows:

Wherein: m and n is respectively the number of the number and characteristic point that participate in the pose of optimization, and i is the index of pose, and j is special Levy the index of point, e_ijIt is in pose ξ_iLocate the re-projection error of j-th of characteristic point, h (ξ_i, p_j) indicate j-th of characteristic point p_j? I pose ξ_iThe projection at place.

For example match as a result, using ICP algorithm be registrated example point cloud to realize the optimization to pose, ifIt is pose ξ_iThe data of k-th point of generation in j-th of example point cloud of place's observation, then cost function is written as:

Weighted average is taken to be merged with the pose after being optimized by example match the pose after being optimized by Feature Points Matching Pose estimated result afterwards.

Step 6: the key frame for carrying object example semantic markup information being mapped in three-dimensional point cloud, is constructed three-dimensional semantic Map.

The depth camera internal reference matrix that known machine people is carried is K, and the pose of the i-th frame key frame is ξ_i, key frame it is every A pixel indicates p=[u, v, l] with 3 dimensional vectors, wherein u, and v is transverse and longitudinal coordinate, and l is example label, and key frame is reflected It is mapped in three-dimensional point cloud:

Wherein, [u_j, v_j, 1] and indicate j-th of pixel of corresponding key frame；d_jIndicate the depth value of j-th of characteristic point, [X_j, Y_j, Z_j]^TIt is the coordinate vector that j-th of characteristic point projects in three dimensions；Exp (ξ ^) is that the Lie group of camera pose indicates Form is a four-matrix.

Then each point is expressed as P=[X, Y, Z, l] in three-dimensional point cloud.

In conclusion the perfect vSLAM technology based on RGB-D data of disclosure combination example partitioning algorithm, so that VSLAM also obtains the environment semantic information towards object example while obtaining environment geological information, and utilizes object Example match carries out further geometrical constraint to vSLAM, improves the pose estimated accuracy of vSLAM.

Fig. 3 is a kind of semantic SLAM processor structure schematic diagram based on object example match of the disclosure.

(1) camera motion estimation module is used to obtain the image sequence shot in robot operational process, to every frame figure Camera motion is estimated as carrying out feature point extraction, matching and tracking；

Specifically, in the camera motion estimation module, the camera motion of consecutive frame is solved using light-stream adjustment:

(2) case-based system module is used to extract key frame, carries out example segmentation to key frame, obtains every frame key frame In all object examples；

Specifically, in the case-based system module, using the size of interframe relative motion distance as extraction image sequence In key frame foundation.

Specifically, in the case-based system module, using example segmentation framework-Mask R- based on deep learning CNN network to carry out example segmentation to key frame images, to obtain all examples in every frame key frame images；Wherein, Mask R-CNN network adds a Ge Quan convolutional neural networks branch on the basis of Faster R-CNN and is used to export example mask, from And the segmentation of Pixel-level is carried out to the profile of example in detection block.

Wherein, in the case-based system module, if interframe relative motion distance is between the opposite fortune of interframe minimum of permission Between dynamic distance and maximum relative motion distance, then present frame is key frame.

(3) feature describing module is used to carry out feature point extraction to key frame and calculates feature point description, to key All objects example in frame carries out feature extraction and coding carrys out the feature description vectors of calculated examples, while obtaining example three-dimensional Point cloud；

In the feature describing module, the process of the feature description vectors of calculated examples includes:

(4) characteristic point and example match module, are used for according to feature point description and feature description vectors, respectively to phase Characteristic point and object example between adjacent key frame carry out Feature Points Matching and example match；

(5) pose Estimation Optimization module is used for the matching of fusion feature point and example match to the pose estimation knot of SLAM Fruit carries out local nonlinearity optimization, obtains the key frame for carrying object example semantic markup information；

(6) three-dimensional semantic map structuring module is used to carry the key frame mapping of object example semantic markup information Into example three-dimensional point cloud, three-dimensional semantic map is constructed.

The robot semanteme SLAM processor based on object example match that the disclosure provides, for being based under indoor environment The semantic SLAM method of RGB-D video flowing, by combining currently advanced example partitioning algorithm and vSLAM based on deep learning Algorithm realizes the individual of various objects detectable and in identification scene, and is building up in three-dimensional semantic map, simultaneously Using the pose estimated result of object example match optimization SLAM, the method for improving vSLAM positioning accuracy.

The semantic SLAM method that disclosure Case-based Reasoning cutting techniques and vSLAM technology are realized, utilizes example cutting techniques All examples in processing acquisition key frame are carried out to key frame, example is mapped in three-dimensional point cloud by vSLAM and removes structure Build out the semantic map of three-dimensional towards object example.

A kind of semanteme SLAM robot, robot based on object example match of the disclosure, including base as shown in Figure 3 In the robot semanteme SLAM processor of object example match.

The disclosure is only to carry out to the object category in image from semantic segmentation unlike traditional semantic segmentation technology It distinguishes, and example cutting techniques can distinguish the Different Individual of same category object, and at the same time eliminating example Background pixel.The disclosure carries out part by fusion feature point matching result and example match result in different crucial interframe simultaneously The positioning accuracy of nonlinear optimization raising vSLAM.

It should be understood by those skilled in the art that, embodiment of the disclosure can provide as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the disclosure Formula.Moreover, the disclosure, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).

The disclosure is referring to method, the process of equipment (system) and computer program product according to the embodiment of the present disclosure Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.

These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random AccessMemory, RAM) etc..

Although above-mentioned be described in conjunction with specific embodiment of the attached drawing to the disclosure, model not is protected to the disclosure The limitation enclosed, those skilled in the art should understand that, on the basis of the technical solution of the disclosure, those skilled in the art are not Need to make the creative labor the various modifications or changes that can be made still within the protection scope of the disclosure.

Claims

1. a kind of robot semanteme SLAM method based on object example match characterized by comprising

The image sequence shot in robot operational process is obtained, feature point extraction, matching and tracking are carried out to every frame image and come Estimate camera motion；

Feature point extraction is carried out to key frame and calculates feature point description, feature is carried out to all objects example in key frame The feature description vectors for carrying out calculated examples with coding are extracted, while obtaining example three-dimensional point cloud；

According to feature point description and feature description vectors, respectively to the characteristic point and the progress of object example between adjacent key frame Feature Points Matching and example match；

The matching of fusion feature point and example match carry out local nonlinearity optimization to the pose estimated result of SLAM, obtain belongings The key frame of body example semantic markup information；

The key frame for carrying object example semantic markup information is mapped in example three-dimensional point cloud, constructs three-dimensional semantically Figure.

2. the robot semanteme SLAM method based on object example match as described in claim 1, which is characterized in that estimating During camera motion, the camera motion of consecutive frame is solved using light-stream adjustment:

3. the robot semanteme SLAM method based on object example match as described in claim 1, which is characterized in that extracting During key frame, using the size of interframe relative motion distance as the foundation for extracting the key frame in image sequence.

4. the robot semanteme SLAM method based on object example match as claimed in claim 3, which is characterized in that if interframe Relative motion distance is between the interframe minimum relative motion distance and maximum relative motion distance of permission, then present frame is to close Key frame.

5. the robot semanteme SLAM method based on object example match as described in claim 1, which is characterized in that use base Come to carry out example segmentation to key frame images in example segmentation framework-Mask R-CNN network of deep learning, to obtain every frame All examples in key frame images；Wherein, Mask R-CNN network adds a full volume on the basis of Faster R-CNN Product neural network branch is used to export example mask, to carry out the segmentation of Pixel-level to the profile of example in detection block.

6. the robot semanteme SLAM method based on object example match as described in claim 1, which is characterized in that calculate real The process of feature description vectors of example includes:

Visual vocabulary table is constructed using training set based on VLAD algorithm, grid dividing is carried out to every frame image in training set, it is right Each grid element center extracts dense SIFT feature and RGB color value, obtains the feature description vectors of each grid；

Obtained grid search-engine description vectors are clustered into the class of preset quantity using k-mean algorithm, and it is special to calculate each grid Sign description vectors and its cluster in residual vector, the normalization of power rate and the normalization of L2 norm are carried out to all residual vector, Then feature coding is carried out to the detection block image of example using normalized residual vector, obtain example feature describe to Amount.

7. a kind of robot semanteme SLAM processor based on object example match characterized by comprising

Camera motion estimation module is used to obtain the image sequence shot in robot operational process, carries out to every frame image Camera motion is estimated in feature point extraction, matching and tracking；

Case-based system module is used to extract key frame, carries out example segmentation to key frame, obtains all in every frame key frame Object example；

Feature describing module is used to carry out feature point extraction to key frame and calculates feature point description, in key frame All objects example carries out feature extraction and coding carrys out the feature description vectors of calculated examples, while obtaining example three-dimensional point cloud；

Characteristic point and example match module are used for according to feature point description and feature description vectors, respectively to adjacent key Characteristic point and object example between frame carry out Feature Points Matching and example match；

Pose Estimation Optimization module is used for the matching of fusion feature point and example match to the pose estimated result carry out office of SLAM Portion's nonlinear optimization obtains the key frame for carrying object example semantic markup information；

Three-dimensional semanteme map structuring module is used to the key frame for carrying object example semantic markup information being mapped to example three In dimension point cloud, three-dimensional semantic map is constructed.

8. the robot semanteme SLAM method based on object example match as described in claim 1, which is characterized in that described In camera motion estimation module, the camera motion of consecutive frame is solved using light-stream adjustment:

Then non-linear least square problem is constructed according to this several pairs of ORB characteristic points, solution obtains the pose of camera；

Or in the case-based system module, using the size of interframe relative motion distance as the key frame extracted in image sequence Foundation；

Or in the case-based system module, using example segmentation framework-MaskR-CNN network based on deep learning come pair Key frame images carry out example segmentation, to obtain all examples in every frame key frame images；Wherein, Mask R-CNN network exists It adds a Ge Quan convolutional neural networks branch on the basis of Faster R-CNN to be used to export example mask, thus in detection block The profile of example carries out the segmentation of Pixel-level；

Or in the feature describing module, the process of the feature description vectors of calculated examples includes:

9. the robot semanteme SLAM method based on object example match as claimed in claim 8, which is characterized in that described In case-based system module, if interframe relative motion distance is between the interframe minimum relative motion distance and maximum relative motion of permission Between distance, then present frame is key frame.

10. a kind of robot, which is characterized in that including being based on object example match as claimed in any one of claims 7-9 Robot semanteme SLAM processor.