CN111797938B

CN111797938B - Semantic information and VSLAM fusion method for sweeping robot

Info

Publication number: CN111797938B
Application number: CN202010681784.4A
Authority: CN
Inventors: 金梅; 张少阔; 张立国; 张子豪; 孙胜春; 刘博�; 张勇; 郎梦园; 王娜
Original assignee: Yanshan University
Current assignee: Yanshan University
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2022-03-15
Anticipated expiration: 2040-07-15
Also published as: CN111797938A

Abstract

The invention provides a semantic information and VSLAM fusion method for a sweeping robot, which is characterized in that a vector containing voice information in a semantic dictionary is added in front of a vector of a traditional dictionary to generate a fusion dictionary fusing the traditional information and the semantic information, so that the information source of a VSLAM system is enhanced, the defect that the traditional VSLAM cannot acquire environment prior information is overcome, and the precision of solving an essential matrix by the VSLAM system is improved by utilizing the semantic information; semantic information matching is firstly carried out in loop detection, if the semantic information can not be matched, the point is considered to be wrong matching, searching in a word bag is not needed, and the robustness of the system and the accuracy of building an indoor map are improved.

Description

Semantic information and VSLAM fusion method for sweeping robot

Technical Field

The invention belongs to the technical field of synchronous positioning and mapping (SLAM), and particularly relates to a semantic information and VSLAM fusion method for a sweeping robot.

Background

Sweeping robots are more and more common in people's lives. The core technology of the sweeping robot comprises the aspects of sweeping, mopping, obstacle avoidance, drawing construction, man-machine interaction and the like. In which, except for sweeping, the functions of wiping the floor and the like have different degrees of problems and are still in the exploration stage.

The position and the map are often closely related, that is, the positioning and the map are interdependent, and positioning cannot be referred to without the map, but a map suitable for the robot needs to be constructed by the indoor robot, and the position of the robot needs to be known. Because the indoor objects such as tables, chairs, boxes, cabinets and the like can be often placed or the engineering blueprints of indoor scenes do not completely conform to the reality, the robot cannot directly use a given artificial map, and needs to be constructed by sensing the environment by the robot. The positioning and mapping of the indoor robot are mainly realized through an SLAM technology, the mainstream scheme of the SLAM technology comprises a visual SLAM and a laser SLAM, the visual SLAM is the current research hotspot and has the advantages of low cost, rich information and the like, but the visual SLAM is poor in stability and precision and more complex than the laser SLAM; the laser SLAM has a better effect in the aspect of indoor robot positioning and map building, but laser data is single, closed-loop detection cannot be well realized, and in the low-cost laser radar, the laser point density is low, and meanwhile, a shielding phenomenon exists, so that a built map often has a phenomenon that the map cannot be closed; the semantic information of the picture can provide more information for the SLAM and more accurate semantic information in the subsequent process, so that the combination of the semantic information and the SLAM is a trend.

Disclosure of Invention

The invention aims to provide a method for combining Visual SLAM (VSLAM) and semantic information, and improve the accuracy of positioning and mapping of a sweeping robot.

In order to solve the technical problem, the invention provides a method for fusing semantic information and VSLAM of a sweeping robot, which comprises the following steps:

s1, fusing the semantic information into the VSLAM system, and establishing the indoor map fused with the semantic information, wherein the method comprises the following specific steps:

s11, extracting semantic information and identifying the semantic information;

s111, removing a last full connection layer by using ResNet-18 as a basic network according to the existing classification model, and extracting and identifying semantic information;

s112, establishing a public data set for the indoor object, dividing the indoor object into n regions, sequencing the n regions from left to right and from top to bottom, establishing an offline semantic information dictionary, and using n components c of the n-dimensional vector c₁-cn in turn represents n regions, a component being 1 if the region represented by the component is present, otherwise 0;

s12, carrying out I-shaped cleaning indoors;

s121, determining a maximum region, identifying and segmenting an indoor object in the process of determining the maximum region, and returning to the initial position;

s122, starting I-shaped cleaning;

s13, generating a fusion dictionary fusing the traditional information and the semantic information;

s131, generating a semantic dictionary;

identifying the object and the position of the object in the cleaning process through the established off-line semantic dictionary to obtain a vector containing semantic information and generate a semantic information dictionary;

s132, generating a traditional dictionary;

obtaining the feature point of each frame of image through feature point extraction and matching, obtaining the pose of each feature point through motion estimation, and determining the attribute of each feature point; placing the feature points on the image into a dictionary to form a traditional dictionary;

s133, adding a vector containing the voice information in the semantic dictionary to the front of the vector of the traditional dictionary to generate a fusion dictionary fusing the traditional information and the semantic information;

s14, generating a map; constructing a point cloud map according to the obtained pose of each point in the space;

s15, detecting a loop;

firstly, matching semantic information, and if the semantic information cannot be matched, considering that the point is wrong matching and does not need to be searched in a word bag; otherwise, comparing the position relation of the objects in the traditional dictionary and the semantic information dictionary, when the matching degree of the semantic information exceeds a certain threshold value, carrying out loop detection, and optimizing the accumulated error to obtain an optimized map;

s2, establishing a self-learning model of the sweeping robot, self-learning the messiness degrees of different areas in the process of sweeping the indoor aiming at the established indoor map fused with semantic information, and dividing the indoor area to obtain a primary area, a secondary area and a tertiary area;

s3, establishing a multi-mode cleaning mechanism;

and aiming at the established self-learning model of the sweeping robot, a multi-mode sweeping mechanism is established.

Preferably, in step S132, a brute force matching method is used to extract and match feature points, and the method selects ORB features and uses a fast approximate nearest neighbor algorithm to perform feature point matching.

Preferably, in step S132, an EPnP algorithm is used to obtain the poses of the feature points.

Preferably, in the step S2, performing indoor area division by using an improved K-means clustering algorithm based on semantic information, including the following steps:

s21, selecting semantically recognized indoor objects according to the indoor map, and setting k clustering centroid points as mu₁,μ₂,…，μ_k；

S22, using formula C⁽ⁱ⁾＝argmin_j||x⁽ⁱ⁾-μ_j||²Clustering the indoor space sample data set for each data individual i according to the Euclidean distance nearest principle, and determining and classifying the samples, wherein C⁽ⁱ⁾Represents the class of sample i closest to the k classes, μ_jIs the centroid point, x, of each cluster⁽ⁱ⁾Is the coordinates of each point in the room;

s23, for each cluster mu_jClassifying the clustering area according to the number of dirt swept by the sweeping robot;

and S24, repeatedly executing the steps S22 and S23, and updating the regional grade change all the time.

Preferably, the multi-mode sweeping mechanism in step S3 includes: a power saving mode, an intelligent mode and a functional mode;

the power saving mode is as follows: according to the divided region grades, a primary region is mainly cleaned;

the intelligent mode comprises the following steps: according to the divided region grades, a first-class region is intensively cleaned, and other regions are slightly cleaned;

the functional mode is as follows: and according to the divided region grades, cleaning the indoor regions with no difference and emphasis.

Compared with the prior art, the invention has the following beneficial effects:

according to the method, the vector containing the semantic information is added into the vector of the traditional information to generate the dictionary fusing the semantic information, so that the information source of the VSLAM system is enhanced, the defect that the traditional VSLAM cannot obtain the prior information of the environment is overcome, the precision of solving the essential matrix of the VSLAM system is improved by utilizing the semantic information, and the robustness of the system and the precision of building an indoor map are improved.

Drawings

FIG. 1 is a schematic diagram of a map object of an embodiment of the present invention;

FIG. 2 is a flow chart of a mapping algorithm based on semantic information according to an embodiment of the present invention;

FIG. 3 is a schematic view of an I-cleaning mechanism according to an embodiment of the present invention;

FIG. 4 is a flow chart of the K-means algorithm for improving based on voice information according to the embodiment of the present invention; and

FIG. 5 is a schematic diagram of a multi-mode cleaning mechanism according to an embodiment of the present invention.

Detailed Description

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

the embodiment provides a semantic information and VSLAM fusion method for a sweeping robot, which includes the following steps:

s1, fusing the semantic information into a VSLAM system, establishing an indoor map fused with the semantic information, wherein a map construction algorithm based on the semantic information is shown in FIG. 2 and comprises the following specific steps:

s11, extracting semantic information and identifying the semantic information;

s111, because the existing classification model can better extract and identify the semantic information, the embodiment of the invention removes the last full connection layer by using ResNet-18 as a basic network according to the existing classification model, and realizes the extraction and identification of the semantic information;

s112, establishing a public data set for indoor common objects such as beds, sofas and the like, dividing the objects into 9 regions, establishing an offline semantic information dictionary, representing whether a certain region exists or not by using a vector c, sequentially representing by 1-9 from left to right and from top to bottom, and if the region exists, determining that the position is 1, otherwise, determining that the position is 0.

S12, carrying out I-shaped cleaning on an indoor environment, such as a bedroom;

s121, determining the maximum area of the room, wherein the maximum area is determined as shown in the implementation of figure 3, identifying the indoor object in the process of determining the maximum area, dividing the indoor object, and returning to the initial position;

s122, starting I-shaped cleaning, wherein the cleaning process is shown by a dotted line in figure 4;

s13, in the cleaning process, recognizing the passing object part while generating a dictionary by the traditional SLAM to obtain a semantic information dictionary, and combining the semantic information dictionary and the semantic information dictionary to obtain a dictionary fused with semantic information;

s131, generating a semantic dictionary, and identifying the object and the object position in the I-shaped cleaning process through the established off-line semantic dictionary to obtain a semantic information dictionary; for example, at the left bedside position, the semantic dictionary is [1,0,0,0,0,0,0,0,0], if other objects exist, vectors are continuously added, indoor objects in the semantic information dictionary are fixed and sequenced, the first 9 bits are description beds, the next 9 bits are description sofas, and the later fixed sequences exist;

s132, generating a traditional dictionary, obtaining the feature point of each frame of image through feature point extraction and matching, and obtaining the pose of each feature point through motion estimation, so that the attribute of each feature point is determined; placing the feature points on a certain frame of image into a dictionary to form a traditional dictionary; for example, a frame of image is input, feature detection and feature description are performed, and each feature point is processed through a dictionary (which is carried by a traditional SLAM) to obtain a vector v (the dimension is generally 100 ten thousand).

S1321, extracting and matching features of the image;

edges, corners, points, regions, colors, etc. can be used as features to represent elements in an image, and good image features need to have scale invariance, rotation invariance, repeatability and certain robustness under illumination. Because the embodiment of the invention is used for real-time positioning and mapping, ORB characteristics with low matching precision but minimum computing resources are selected;

the feature point matching is to find a feature matching relationship between images or between images and maps. The simplest scheme is violence matching, and the basic principle is that all feature points are matched

And

calculating the hamming distance of the BRIEF descriptor in the ORB features, and leaving the nearest neighbor as a matching point, which becomes cumbersome as the number of feature points increases, in this embodiment, FLANN (fast approximate nearest neighbor) is used for feature point matching;

s1322, performing pose estimation on the robot;

the embodiment of the invention selects an EPnP algorithm to estimate the pose; the EPnP algorithm has the core idea that the weighting is solved by adopting four virtual points which are not in the same plane, other points are represented by the four virtual points, and n three-dimensional points are represented as the weighted sum of four virtual control points, so that the problem to be solved is to solve the camera coordinates of the four points; the method comprises the following specific steps:

f for camera coordinate system^cRepresenting, world coordinate system by F^wRepresenting, for each three-dimensional point, in a world coordinate system

Four control points alpha can be found_ijJ ═ 1,2,3,4 such that:

wherein:

α_ijis a homogeneous coordinate, represents the weight of the control point,

the coordinates of four virtual points which are not on the same plane are in a world coordinate system;

also in the camera coordinate system:

wherein:

three-dimensional coordinates representing a reference point, i ═ 1 … n;

three-dimensional coordinates representing control points, j 1 … n; alpha is alpha_ijIs a homogeneous coordinate, represents the weight of the control point,

representing coordinates of four points which are not on the same plane in a camera coordinate system;

let K be the camera internal reference matrix, { u_i}_i＝1,…,nIs the reference point { P_i}_i＝1,…,nHas the following 2D projection coordinates:

written in matrix form:

wherein: coordinates of 12 control points in camera coordinate system

And n projection parameters w_i}_i＝1,…,nIs an unknown parameter of the linear system; the difference between the pixel coordinate system and the imaging plane is a zoom and a translation of the origin, the matrix K is an internal reference, indicating this relationship, where f_xα f, where the pixel coordinates are scaled by α times on the u-axis, f_yβ f, is the pixel coordinate scaled by β times on the v-axis, c_x，c_yIs a translation of the origin; unrolling (4) the last row to get:

bringing (5) into (4), and unfolding the first row and the second row to obtain:

in (6) and (7), only the control points are unknown quantities. Considering n reference points yields:

MX＝0 (8)

the expansion is as follows:

wherein: the second matrix on the left side of the equation is the control point to be solved, and is written as:

x has 12 unknown variables and M is a matrix of 2n X12. X can be obtained as:

wherein: v. of_iIs the right singular vector of M; an integer N: 1-4 are matrices M^TThe effective dimension of the M null space; { beta ]_i}_i＝1,…,NIs calculating X time v_iLinear combination coefficients of (c).

If the complexity of direct solution is O (n)³) M may be used^TAnd solving the zero-space feature vector of M, and reducing the computational complexity to O (n). Since the four control points are in the world coordinate system and the camera coordinate system, the same distance between each two points constitutes a constraint:

wherein

Is the coordinates of a point in the camera coordinate system,

is the coordinates of a point in the world coordinate system.

An appropriate linear combination { beta } can be found_i}_i＝1,…,NFollowed by initial values in turn, by further optimization of { β by Gauss Newton's method, continuing by minimizing the distance between controllable points_i}_i＝1,…,N。

Knowing the coordinates of the control point in the camera coordinate system, the coordinates of the reference point in the camera coordinate system can be obtained by the formula (2), and for two groups of 3D coordinates

Solving the pose through SVD, comprising the following steps:

(1) calculating the centroid of a set of reference points

(2) Calculating centroid-removed coordinates of a set of reference points

(3) Computing

(4) Singular value decomposition of S-U-V^T；

(5) Solving the rotation matrix R ═ VU^T；

(6) Calculating a translation matrix t ═ mu^c-μ^w(ii) a And finishing the pose calculation.

S133, adding the vector containing the voice information in the semantic dictionary to the front of the vector in the traditional dictionary to generate a fusion dictionary fusing the traditional information and the semantic information

S14, generating a map; and constructing a point cloud map according to the obtained pose of each point in the space. The point cloud is a group of discrete points in a three-dimensional space, can also comprise color information of r, g and b besides basic three-dimensional coordinates x, y and z, and is a measurement map which can clearly express the relationship between objects in the environment;

s15, detecting a loop;

the robot runs for a long time, errors are inevitably generated in each time period, accumulated errors are generated after the robot runs for a long time, the current data can be associated with earlier data through loop detection, and because the same place is observed twice, constraints can be established between the two observations, so that the accumulated errors are eliminated, and a globally consistent map is obtained.

The loop detection in the embodiment firstly matches the semantic information, because the semantic information does not need to be searched, the semantic information can be directly matched, if the semantic information can not be matched, the point is considered to be wrong matching, and searching in a word bag is not needed; otherwise, comparing the position relation of the objects in the traditional dictionary and the semantic information dictionary, when the matching degree of the semantic information exceeds a certain threshold value, carrying out traditional loop detection, and optimizing the accumulated error to obtain an optimized map;

the current loop detection mainly relies on Bag of Words (BoW) algorithm, and is an efficient image retrieval matching algorithm. In order to realize quick retrieval, the BoW algorithm needs to construct a dictionary, which is usually constructed by using a K-ary tree. The traditional loop detection is only to simply compare the geometric characteristics of each node, and the invention introduces semantic information matching in the loop detection. Firstly, semantic information is matched, because the semantic information does not need to be searched, the semantic information can be directly matched, if the semantic information cannot be matched, the matching is considered to be wrong, and the semantic information does not need to be searched in the word bag. After loop back detection is completed, the map construction is complete.

S2, establishing a self-learning model of the sweeping robot, self-learning the messiness degree of different areas in the process of cleaning the indoor aiming at the established indoor map fused with semantic information, and dividing the different areas to obtain a primary area (dirtier), a secondary area and a tertiary area;

the traditional K-means algorithm is very sensitive to the initial point center, the clustering result is unstable, and aiming at the indoor environment, based on the established map fused with the semantic information, each dirty area is set as the initial center of the K-means algorithm according to the semantic information, and then the final clustering result is carried out according to the K-means algorithm, so that the classification of the dirty and dirty areas in the room is realized.

The K-means clustering algorithm is a typical unsupervised learning method and mainly automatically classifies similar samples. And dividing the samples into different categories according to the similarity among the samples in the K-means clustering algorithm. The invention divides the indoor areas into different grades according to different dirty degrees. The basic idea of the traditional K-means clustering algorithm is as follows: initializing a clustering center and presetting k, continuously iterating and recalculating the clustering center until the clustering center is not changed any more and the sum of squared distance errors and the local part is minimum, and taking the obtained compact and mutually independent classes as the final target of the algorithm. In order to obtain the optimal clustering effect, the threshold value of the iteration times can be adjusted by using a function extremum solving method.

The embodiment of the invention provides an improved K-means clustering algorithm based on semantic information as shown in FIG. 5, the algorithm has a dirty region grading algorithm with environment self-learning capability, and realizes the region grading of an indoor map, and the method comprises the following operation steps:

s21, according to the indoor map, selecting semantically identified indoor objects such as balconies, tea tables, toilets and the like, and setting k clustering center points to be mu₁,μ₂,…，μ_k；

S22, using formula C⁽ⁱ⁾＝argmin_j||x⁽ⁱ⁾-μ_j||²Clustering the indoor space sample data set for each data individual i according to the Euclidean distance nearest principle, and determining and classifying the samples, wherein C⁽ⁱ⁾Representing the class of the sample i with the closest distance to the k classes;

s23, for each cluster mu_jAnd grading the clustering area according to the number of dirt swept by the sweeping robot.

S24, repeatedly executing the steps S22 and S23 to update the regional grade change all the time;

s3, establishing a multi-mode cleaning mechanism;

aiming at the established self-learning model of the sweeping robot, a multi-mode sweeping mechanism is established, and only a first-level area is swept in a power-saving mode; the intelligent area emphatically cleans the first-level area, and other areas clean for one time; functional mode, no difference indoor cleaning;

the schematic diagram of the multifunctional cleaning mechanism of the embodiment of the invention is shown in fig. 5, and comprises the following modes:

s31, power saving mode: cleaning a primary area (for multiple times) in an important way according to the divided area grades;

s32, intelligent mode: according to the divided region grades, a first-class region is intensively cleaned, and other regions are slightly cleaned;

s33, function mode: and according to the divided region grades, cleaning the indoor regions with no difference and emphasis.

According to the embodiment of the invention, semantic information is integrated into the original VSLAM system, the information source of the VSLAM system is enhanced, the defect that the traditional VSLAM cannot obtain the prior information of the environment is overcome, the precision of solving an essential matrix by the VSLAM system is improved by utilizing the semantic information, and the robustness of the system and the precision of establishing an indoor map are improved; based on the established indoor map, a self-learning model is established, a dirty region grading algorithm with environment self-learning capability based on a vector Taylor series compensation algorithm is provided, and region grading of the indoor map is realized; the embodiment of the invention also establishes a multi-mode conversion mechanism, realizes the multi-mode conversion of the sweeping robot, and has the advantages of more electricity saving, higher efficiency, higher indoor cleanliness and the like.

The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention shall fall within the protection scope defined by the claims of the present invention.

Claims

1. A semantic information and VSLAM fusion method for a sweeping robot is characterized by comprising the following steps:

s11, extracting semantic information and identifying the semantic information;

s112, establishing a public data set for the indoor object, dividing the indoor object into n regions, sequencing the n regions from left to right and from top to bottom, establishing an offline semantic information dictionary, and using n components c of the n-dimensional vector c₁-c_nSequentially representing n regions, wherein if the region represented by the component exists, the component is 1, otherwise, the component is 0;

s12, carrying out I-shaped cleaning indoors;

s122, starting I-shaped cleaning;

s131, generating a semantic dictionary;

s132, generating a traditional dictionary;

s15, detecting a loop;

s3, establishing a multi-mode cleaning mechanism;

2. The method of claim 1, wherein a brute force matching method is used in step S132 for feature point extraction and matching, and the method selects ORB features and uses a fast nearest neighbor algorithm for feature point matching.

3. The method for fusing the semantic information and the VSLAM of the sweeping robot according to claim 1, wherein an EPnP algorithm is used to obtain the pose of the feature point in the step S132.

4. The method for fusing the semantic information and the VSLAM of the sweeping robot according to claim 1, wherein the indoor area division is performed by using the improved K-means clustering algorithm based on the semantic information in the step S2, which comprises the following steps:

S22, using formula C⁽ⁱ⁾＝argmin_j||x⁽ⁱ⁾-μ_j||²Clustering the indoor space sample data set according to the Euclidean distance nearest principle for each data individual i, and determining and classifying the samplesWherein, C⁽ⁱ⁾Represents the class of sample i closest to the k classes, μ_jIs the centroid point, x, of each cluster⁽ⁱ⁾Is the coordinates of each point in the room;

5. The method for fusing the semantic information and the VSLAM of the sweeping robot according to claim 1, wherein the multi-mode sweeping mechanism in the step S3 comprises: a power saving mode, an intelligent mode and a functional mode;