CN110930519B

CN110930519B - Semantic ORB-SLAM sensing method and device based on environment understanding

Info

Publication number: CN110930519B
Application number: CN201911113708.7A
Authority: CN
Inventors: 柯晶晶; 周广兵; 蒙仕格; 郑辉; 林飞堞; 陈惠纲; 王珏
Original assignee: South China Robotics Innovation Research Institute
Current assignee: South China Robotics Innovation Research Institute
Priority date: 2019-11-14
Filing date: 2019-11-14
Publication date: 2023-06-20
Anticipated expiration: 2039-11-14
Also published as: CN110930519A

Abstract

The invention discloses a semantic ORB-SLAM sensing method and device based on environment understanding, wherein the method comprises the following steps: inputting the sequence frame into ORB-SLAM front end Tracking thread to extract key frame and obtain key frame data; inputting the key frame data into an adjacent key frame image optimizing thread to perform key frame data optimizing processing, and obtaining the key frame data after image optimizing; calculating error values among the key frame data after the graph optimization, and generating a candidate set based on the error values; and carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on correction results. In the embodiment of the invention, the robot is improved to have a remarkable effect on environmental perception, and can obtain higher-layer cognitive information of a scene, so that a more natural application mode is provided for application domains including robot navigation, augmented reality and automatic driving.

Description

Semantic ORB-SLAM sensing method and device based on environment understanding

Technical Field

The invention relates to the technical field of intelligent robot perception, in particular to a semantic ORB-SLAM perception method and device based on environment understanding.

Background

Synchronous positioning and map construction (Simultaneous Localization and Mapping, SLAM) are the basis for realizing autonomous navigation in an unknown environment by a mobile robot, and are one of preconditions for realizing autonomy and intellectualization; currently, visual SLAM can achieve real-time localization and three-dimensional map construction under static environment within a certain range, however, the map generated by conventional visual SLAM only contains simple geometric information (points, lines, etc.) or low-level pixel-level information (colors, brightness, etc.), and does not contain semantic information. While these simple geometric and pixel level information may be sufficient for autonomous navigation of the robot in a single environment, it may not be sufficient for the mobile robot to accomplish higher level tasks.

Patent CN201811514700 discloses a visual SLAM method based on ORB features, which only adopts ORB features to replace traditional SIFT feature extraction in the front end link, and uses hamming distance to perform feature matching judgment, so that the calculated amount can be reduced to a certain extent, and the real-time performance of the visual SLAM is improved; and in the back-end module, the graph optimization idea is adopted, and the accuracy of loop detection can be improved well based on the point cloud fusion optimization idea of combining the local loop and the global loop.

However, although ORB features are used for replacing traditional SIFT feature extraction, the visual SLAM method based on ORB features can effectively improve the calculation speed, but can only work in a static state or in a scene with a small number of dynamic objects, if a large number of feature points fall on the dynamic objects, the SLAM tracking and positioning result can deviate along with the movement of the dynamic objects, the robot map building and positioning accuracy is greatly influenced, and even the calculation pose failure can occur; the generation process of the feature points of the visual SLAM method based on ORB features can discard most pixel information in the original picture, and lacks effective semantic information, so that further understanding of the robot on environment perception is seriously affected.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a semantic ORB-SLAM sensing method and device based on environment understanding, improves the remarkable effect of a robot on environment sensing, can obtain higher-layer cognitive information of a scene, and provides a more natural application mode for application fields including robot navigation, augmented reality and automatic driving.

In order to solve the technical problems, an embodiment of the present invention provides a semantic ORB-SLAM sensing method based on environmental understanding, the method comprising:

Inputting the sequence frame into ORB-SLAM front end Tracking thread to extract key frame and obtain key frame data;

inputting the key frame data into an adjacent key frame image optimizing thread to perform key frame data optimizing processing, and obtaining the key frame data after image optimizing;

calculating error values among the key frame data after the graph optimization, and generating a candidate set based on the error values;

and carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on correction results.

Optionally, the inputting the sequence frame into the ORB-SLAM front end Tracking thread to perform key frame extraction processing, to obtain key frame data includes:

the ORB-SLAM front end Tracking thread adopts an inter-frame difference method to carry out dynamic background removal processing on the input sequence frames, and obtains sequence frames with the dynamic background removed;

establishing a mapping relation between the sequence frames with the dynamic background removed and object feature points, and obtaining a sequence frame with the mapping relation with the object feature points;

performing ORB feature extraction processing on the sequence frames with the object feature point mapping relation to obtain sequence frame ORB features;

matching the ORB characteristics of the sequence frame of the current frame with the ORB characteristics of the sequence frame of the previous frame to obtain matching characteristic point pairs;

Performing pose estimation and repositioning processing based on the matched feature point pairs to obtain pose estimation and repositioning results;

and carrying out pose estimation and repositioning results according to the matched adjacent sequence frames to carry out optimization processing, obtaining pose optimization of the adjacent frames, and obtaining a key frame sequence based on the pose optimization of the adjacent frames.

Optionally, the ORB-SLAM front end Tracking thread uses an inter-frame difference method to perform dynamic background removal processing on an input sequence frame, and obtains a sequence frame from which a dynamic background is removed, including:

performing differential operation on adjacent frames in continuous time intervals in the sequence frames, and performing change detection by using strong correlation of the adjacent frames in the sequence frames to obtain a moving target;

and removing the dynamic background of the moving object in the sequence frames based on the selected threshold value, and acquiring the sequence frames with the dynamic background removed.

Optionally, the step of establishing a mapping relationship between the sequence frame from which the dynamic background is removed and the object feature point to obtain a sequence frame with a mapping relationship with the object feature point includes:

according to the image points observed by the sequence frames of the current frame, from which the dynamic background is removed, and based on the image points, the sequence frames of the next frame, from which the dynamic background is removed, are observed, and are used as adjacent sequence frames of the current frame, from which the dynamic background is removed;

Generating a node tree by taking a sequence frame of the current frame with the dynamic background removed as a root node and taking an adjacent sequence frame as a child node;

and constructing a mapping relation between the sequence frames with the dynamic background removed and the object feature points based on the node tree, and obtaining the sequence frames with the mapping relation with the object feature points.

Optionally, the pose estimation and repositioning processing based on the matching feature point pairs includes:

and calculating the relative displacement of the sequence frame of the current frame and the sequence frame of the last frame by using the minimized reprojection error according to the matched characteristic point pairs.

Optionally, the method further comprises:

after pose estimation and repositioning processing failure is carried out based on the matched characteristic point pairs, the closest sequence frame between the sequence frames of the current frame is obtained based on the mapping relation with the object characteristic points;

obtaining the most similar sequence frame ORB characteristics, and matching the sequence frame ORB characteristics of the current frame with the most similar sequence frame ORB characteristics to obtain a first matching characteristic point pair;

and carrying out pose estimation and repositioning calculation on the repositioning by using the first matching feature points to obtain pose estimation and repositioning results.

Optionally, the optimizing obtaining a key frame sequence based on the pose of the adjacent frame includes:

Calculating the minimum re-projection error between the adjacent frames, and establishing a common view based on the minimum re-projection error;

and extracting the sequence frames in the common view as key sequence frames.

Optionally, the inputting the key frame data into an adjacent key frame image optimizing thread performs key frame data optimizing processing, and obtaining the key frame data after image optimization includes:

and inputting the key frame data into an adjacent key frame image optimization thread, and then sequentially carrying out redundant point elimination processing, semantic extraction processing, new image point creation processing and adjacent frame optimization processing on the key frame data to obtain the key frame data after image optimization.

Optionally, the semantic extraction processing is performed on the key frame data after the redundant point rejection processing, including:

performing object detection on the key frame data subjected to redundant point elimination processing based on a YOLO-v3 algorithm to obtain an object detection result;

carrying out semantic association processing on the object detection result by using a conditional random field to obtain combined object category probability and scene context information;

correcting and optimizing the combined object type probability and scene context information to generate a temporary object information candidate set;

judging whether the temporary object information in the temporary object information candidate set is a new object or an existing object, and searching each point information of each temporary object information in the temporary object information candidate set in a corresponding neighborhood to acquire a nearest three-dimensional point of the point;

And calculating the Euler distance between the point and the three-dimensional point, and if the Euler distance is smaller than a preset threshold value, considering the point and the three-dimensional point as the same point.

In addition, the embodiment of the invention also provides a semantic ORB-SLAM sensing device based on environment understanding, which comprises the following components:

a key frame extraction module: the method comprises the steps of inputting a sequence frame into an ORB-SLAM front end Tracking thread to perform key frame extraction processing, and obtaining key frame data;

key frame optimization module: the key frame data is input into an adjacent key frame image optimizing thread to perform key frame data optimizing processing, and the key frame data after image optimization is obtained;

and an error calculation module: the method comprises the steps of calculating error values between key frame data after the optimization of the graph, and generating a candidate set based on the error values;

synchronous positioning and map construction module: and the method is used for carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on correction results.

In the embodiment of the invention, aiming at the defects that the traditional visual ORB-SLAM is easy to be interfered by a dynamic target in a characteristic extraction process, the extracted characteristic points only contain color brightness and geometric information and lack of object environment semantic information, in an ORB-SLAM front-end Tracking thread, adjacent frames in an inter-frame difference method sequence frame are firstly utilized to carry out differential operation, a threshold value is set, dynamic objects are removed, a mapping relation between the sequence frame and the object characteristic points is built again, ORB characteristic extraction is carried out, the object environment information extracted based on deep learning semantic is integrated into an ORB-SLAM system, and the semantic ORB-SLAM perception method for realizing 'understanding' of the environment has the advantages of stable performance, difficult environmental interference, accurate matching and deeper understanding of the environment; the robot has remarkable effect on environment perception, can obtain higher-layer cognitive information of a scene, and provides a more natural application mode for application fields including robot navigation, augmented reality and automatic driving.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a semantic ORB-SLAM sensing method based on environmental understanding in an embodiment of the present invention;

FIG. 2 is a schematic diagram of the structural composition of a semantic ORB-SLAM sensing device based on environmental understanding in an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Examples

Referring to fig. 1, fig. 1 is a flow chart of a semantic ORB-SLAM perception method based on environmental understanding in an embodiment of the present invention.

As shown in fig. 1, a semantic ORB-SLAM perception method based on environmental understanding, the method comprising:

s11, inputting the sequence frame into an ORB-SLAM front end Tracking thread to perform key frame extraction processing, and acquiring key frame data;

in the implementation process of the invention, the step of inputting the sequence frame into the ORB-SLAM front end Tracking thread to perform key frame extraction processing to obtain key frame data comprises the following steps: the ORB-SLAM front end Tracking thread adopts an inter-frame difference method to carry out dynamic background removal processing on the input sequence frames, and obtains sequence frames with the dynamic background removed; establishing a mapping relation between the sequence frames with the dynamic background removed and object feature points, and obtaining a sequence frame with the mapping relation with the object feature points; performing ORB feature extraction processing on the sequence frames with the object feature point mapping relation to obtain sequence frame ORB features; matching the ORB characteristics of the sequence frame of the current frame with the ORB characteristics of the sequence frame of the previous frame to obtain matching characteristic point pairs; performing pose estimation and repositioning processing based on the matched feature point pairs to obtain pose estimation and repositioning results; and carrying out pose estimation and repositioning results according to the matched adjacent sequence frames to carry out optimization processing, obtaining pose optimization of the adjacent frames, and obtaining a key frame sequence based on the pose optimization of the adjacent frames.

Further, the ORB-SLAM front end Tracking thread performs dynamic background removal processing on an input sequence frame by adopting an inter-frame difference method to obtain a sequence frame with a dynamic background removed, and the method comprises the following steps: performing differential operation on adjacent frames in continuous time intervals in the sequence frames, and performing change detection by using strong correlation of the adjacent frames in the sequence frames to obtain a moving target; and removing the dynamic background of the moving object in the sequence frames based on the selected threshold value, and acquiring the sequence frames with the dynamic background removed.

Further, the step of establishing a mapping relationship between the sequence frame with the dynamic background removed and the object feature point to obtain a sequence frame with the mapping relationship with the object feature point includes: according to the image points observed by the sequence frames of the current frame, from which the dynamic background is removed, and based on the image points, the sequence frames of the next frame, from which the dynamic background is removed, are observed, and are used as adjacent sequence frames of the current frame, from which the dynamic background is removed; generating a node tree by taking a sequence frame of the current frame with the dynamic background removed as a root node and taking an adjacent sequence frame as a child node; and constructing a mapping relation between the sequence frames with the dynamic background removed and the object feature points based on the node tree, and obtaining the sequence frames with the mapping relation with the object feature points.

Further, the pose estimation and repositioning processing based on the matching feature point pairs includes: and calculating the relative displacement of the sequence frame of the current frame and the sequence frame of the last frame by using the minimized reprojection error according to the matched characteristic point pairs.

Further, the method further comprises: after pose estimation and repositioning processing failure is carried out based on the matched characteristic point pairs, the closest sequence frame between the sequence frames of the current frame is obtained based on the mapping relation with the object characteristic points; obtaining the most similar sequence frame ORB characteristics, and matching the sequence frame ORB characteristics of the current frame with the most similar sequence frame ORB characteristics to obtain a first matching characteristic point pair; and carrying out pose estimation and repositioning calculation on the repositioning by using the first matching feature points to obtain pose estimation and repositioning results.

Further, the optimizing obtaining a key frame sequence based on the pose of the adjacent frame includes: calculating the minimum re-projection error between the adjacent frames, and establishing a common view based on the minimum re-projection error; and extracting the sequence frames in the common view as key sequence frames.

Specifically, in the ORB-SLAM front end Tracking thread, a sequence frame is input into the ORB-SLAM front end Tracking thread, firstly, dynamic background removal is carried out, noise interference and influence of a dynamic object on a subsequent characteristic point extraction and matching process are eliminated, an inter-frame difference method is adopted, adjacent frames in continuous time intervals in the sequence frame are extracted for differential operation, strong correlation of the adjacent frames in the sequence frame is utilized for change detection, so that a moving target is detected, and then a moving region in the sequence frame is removed by selecting a threshold value; in the sequence frames, the kth frame f _k (x, y) and k+1 frame f _k+1 The change between (x, y) can be represented by a binarized differential value D (x, y) as follows:

wherein T is a set binary differential threshold; the part of '1' in the binary difference consists of a part of which the gray value of the corresponding pixel of the front frame and the rear frame is changed, and the part generally comprises a moving target and noise; the part of 0 is composed of the part of the corresponding pixel gray value of the two frames.

In the front-end Tracking thread, in order to integrate the extracted semantic information into the ORB-SLAM frame, a mapping relation between a sequence frame with a dynamic background removed and object feature points needs to be established; in ORB-SLAM, each sequence frame with the dynamic background removed stores the image point observed by the frame, and each image point also stores the sequence frame with the dynamic background removed and observed by the image point; according to the relation between the sequence frames and the image points of which the dynamic background is removed, a spanning tree of ORB-SLAM is established; in order to construct a spanning tree, firstly, according to the image points observed by the current sequence frames with the dynamic background removed, finding the sequence frames with the dynamic background removed for observing the image points, wherein the sequence frames with the dynamic background removed are adjacent sequence frames of the current sequence frames with the dynamic background removed, and have a large number of image points identical to those of the current sequence frames with the dynamic background removed; meanwhile, the picture points among the current sequence frames for removing the dynamic background are provided with associated sequence frames for removing the dynamic background; therefore, a spanning tree with the current sequence frame with the dynamic background removed as a root node and the adjacent sequence frame as a child node can be generated; in the spanning tree, the relationship between the child node and the father node is determined by the number of the common graph points; according to the spanning tree, the adjacent sequence frames of the current sequence frames with dynamic background removed can be conveniently found, so that more associated image points can be found; the mapping relation between the sequence frames and the objects with the dynamic background removed is established in the following way:

Each object O _i Comprising:

point cloud data which are contained in an object under a world coordinate system and are obtained through projection calculation of a camera; the number of object categories and the probability of the corresponding object category, the probability being iteratively updated by an iterative bayesian process; observing a set of keyframes for the object; the object belongs to the category with the highest corresponding probability; the number of times the object is observed.

The color image corresponding to the sequence frame with the dynamic background removed is used for object detection; the depth image corresponding to the sequence frame with the dynamic background removed is used for generating object point cloud data; the sequence frame with the dynamic background removed is observed to be object information. After the relation construction operation between the object and the sequence frame with the dynamic background removed is completed based on the mapping of the image points and the sequence frame with the dynamic background removed, the sequence frame with the dynamic background removed can be used for finding the associated object according to the found object, and the object can also find the associated sequence frame with the dynamic background removed.

ORB feature extraction is carried out on the sequence frames with the object feature point mapping relation, ORB feature points are extracted to replace SIFT feature points, so that the operation amount can be effectively searched, and the operation efficiency is improved.

In the ORB-SLAM front end Tracking thread, pose estimation and repositioning are carried out according to the sequence frame with the dynamic background removed from the previous frame, namely, sequence frame ORB characteristics of the current frame and sequence frame ORB characteristics of the previous frame are matched to obtain a matched characteristic point pair, and then the current matched characteristic point pair is utilized to calculate the relative displacement of the current frame and the previous frame by utilizing the minimized reprojection error; if the tracking and positioning are failed, the scene failure mode is utilized to find the sequence frame which is closest to the current frame, the current frame is matched with the sequence frame to obtain a matched picture point, and the pose of the current frame is recalculated by utilizing the matched picture point.

Generally, two adjacent frames can observe a part of the same image points at the same time, by calculating the minimum re-projection error between the two adjacent frames, the smaller the re-projection error is, the larger the correlation between the two adjacent sequence frames is, so that a corresponding preset threshold value is set, the projection error is required to be smaller than or equal to the preset threshold value by comparing the projection error with the preset threshold value, and the corresponding adjacent sequence frames are not removed, so that a common view can be established on the premise that pose optimization between the adjacent frames is formed, and the sequence frames in the common view are obtained as key sequence frames.

S12: inputting the key frame data into an adjacent key frame image optimizing thread to perform key frame data optimizing processing, and obtaining the key frame data after image optimizing;

in the implementation process of the invention, the step of inputting the key frame data into an adjacent key frame image optimizing thread to perform key frame data optimizing processing, and obtaining the key frame data after image optimization comprises the following steps: and inputting the key frame data into an adjacent key frame image optimization thread, and then sequentially carrying out redundant point elimination processing, semantic extraction processing, new image point creation processing and adjacent frame optimization processing on the key frame data to obtain the key frame data after image optimization.

Further, the semantic extraction processing is performed on the key frame data after the redundant point elimination processing, including: performing object detection on the key frame data subjected to redundant point elimination processing based on a YOLO-v3 algorithm to obtain an object detection result; carrying out semantic association processing on the object detection result by using a conditional random field to obtain combined object category probability and scene context information; correcting and optimizing the combined object type probability and scene context information to generate a temporary object information candidate set; judging whether a new object exists in temporary object information in the temporary object information candidate set or not, searching each point information of each temporary object information in the temporary object information candidate set in a corresponding neighborhood of each point information, and acquiring a nearest three-dimensional point of the point; and calculating the Euler distance between the point and the three-dimensional point, and if the Euler distance is smaller than a preset threshold value, considering the point and the three-dimensional point as the same point.

Specifically, after obtaining a key frame, inputting the key frame into an adjacent key frame image optimization thread, removing redundant points, designing a semantic extraction algorithm to realize the image optimization process of adjacent frames among the key frames, and designing a semantic extraction algorithm which comprises the functions of object detection, object semantic association, temporary object generation, object association, object model updating and the like, wherein the object detection is responsible for extracting object information of a picture by using a deep learning network, carrying out semantic association on the extracted object information, correcting and optimizing the semantic association, so that the extracted detected object is more accurate and reliable, and storing the extracted detected object in a temporary object information set; the object association and updating is responsible for associating the temporary object information with the object information existing in the object database according to the mapping relation of the key frame, the object information and the map points, and integrating the temporary object information update into the corresponding object information.

Here, a YOLO-v3 based algorithm is used for object detection, which divides each picture into n×n squares, then performs an object detection operation only once for each square, and finally fuses the detection results together.

Semantic detection is carried out on the key frames by utilizing a YOLO-v3 algorithm, semantic association is further carried out on objects extracted through deep learning detection by utilizing a conditional random field, and detection classification accuracy is improved by combining object class probability and scene context information, wherein the energy equation corresponding to the designed conditional random field combining the object class probability and the context information is as follows:

E(x)＝∑ _i ψ _μ (x _i )+∑ _i<j ψ _P (x _i ,y _i )；

Where x represents a random variable of the object class, i, j ranges from 1 to k, where k is the number of objects detected in the image, Z is a normalization factor, ensuring that the calculation result is probability, E (x) is the energy function of the conditional random field, a unitary potential function ψ _u Probability of marking class of node of random field map, binary potential function psi _P Is to characterize the correlation between random field pattern nodes.

Unitary potential function ψ _u The following is shown:

ψ _μ ＝-log p(x _i )；

binary potential function ψ _P The following is shown:

wherein p (x) _i ) Represents the probability distribution, omega, of the category to which the ith object belongs given by the YOLO-v3 model _m Is a linear combining weight, μ is a tag compatibility function, representing the likelihood of simultaneous occurrence of different classes within the neighborhood.

The semantic association of the detected object is realized through a conditional random field, the detection result is corrected and optimized, a temporary object information candidate set is generated, the temporary object is judged, and whether the temporary object is a new object or an existing object in the candidate set is determined; for the data of each candidate object, searching each point information of the temporary object in the neighborhood of the data, finding out a three-dimensional point closest to the point from the point cloud data of the candidate object, calculating the Euler distance between the two points, and if the Euler distance between the two points is smaller than a set threshold value, considering the two points as the same point.

S13: calculating error values among the key frame data after the graph optimization, and generating a candidate set based on the error values;

in the implementation process of the invention, the error value between the key frame data after the optimization of the graph is calculated, and the candidate set can be generated according to the error value.

S14: and carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on correction results.

In the implementation process of the invention, the candidate set is subjected to closed-loop correction processing through global map optimization and loop fusion; closed loop detection is realized, positioning accuracy is improved, and errors are reduced; and synchronous positioning and map construction are carried out based on the correction result.

Examples

Referring to fig. 2, fig. 2 is a schematic structural diagram of a semantic ORB-SLAM sensing device based on environmental understanding according to an embodiment of the present invention.

As shown in fig. 2, a semantic ORB-SLAM perception device based on environmental understanding, the device comprising:

key frame extraction module 21: the method comprises the steps of inputting a sequence frame into an ORB-SLAM front end Tracking thread to perform key frame extraction processing, and obtaining key frame data;

Further, the method further comprises: after pose estimation and repositioning processing failure is carried out based on the matched characteristic point pairs, the closest sequence frame between the sequence frames of the previous frame is obtained based on the mapping relation with the object characteristic points; obtaining the most similar sequence frame ORB characteristics, and matching the sequence frame ORB characteristics of the current frame with the most similar sequence frame ORB characteristics to obtain a first matching characteristic point pair; and carrying out pose estimation and repositioning calculation on the repositioning by using the first matching feature points to obtain pose estimation and repositioning results.

In the front-end Tracking thread, in order to integrate the extracted semantic information into the ORB-SLAM frame, a mapping relation between a sequence frame with a dynamic background removed and object feature points needs to be established; in ORB-SLAM, each sequence frame with the dynamic background removed stores the image point observed by the frame, and each image point also stores the sequence frame with the dynamic background removed and observed by the image point; according to the relation between the sequence frames and the image points of which the dynamic background is removed, a spanning tree of ORB-SLAM is established; in order to construct a spanning tree, firstly, according to the image points observed by the current sequence frames with the dynamic background removed, finding the sequence frames with the dynamic background removed for observing the image points, wherein the sequence frames with the dynamic background removed are adjacent sequence frames of the current sequence frames with the dynamic background removed, and have a large number of image points identical to those of the current sequence frames with the dynamic background removed; meanwhile, the current sequence frames for removing the dynamic background are provided with picture points, and each picture point is provided with an associated sequence frame for removing the dynamic background; therefore, a spanning tree with the current sequence frame with the dynamic background removed as a root node and the adjacent sequence frame as a child node can be generated; in the spanning tree, the relationship between the child node and the father node is determined by the number of the common graph points; according to the spanning tree, the adjacent sequence frames of the current sequence frames with dynamic background removed can be conveniently found, so that more associated image points can be found; the mapping relation between the sequence frames and the objects with the dynamic background removed is established in the following way:

Each object O _i Comprising:

In the ORB-SLAM front end Tracking thread, pose estimation and repositioning are carried out according to the sequence frame with the dynamic background removed from the previous frame, namely, sequence frame ORB characteristics of the current frame and sequence frame ORB characteristics of the previous frame are matched to obtain a matched characteristic point pair, then the current matched characteristic point pair is perceived, and the relative displacement between the current frame and the previous frame is calculated by utilizing the minimized reprojection error; if the tracking and positioning are failed, the scene failure mode is utilized to find the sequence frame which is closest to the current frame, the current frame is matched with the sequence frame to obtain a matched picture point, and the matched land is utilized to recalculate the pose of the current frame.

Key frame optimization module 22: the key frame data is input into an adjacent key frame image optimizing thread to perform key frame data optimizing processing, and the key frame data after image optimization is obtained;

in the implementation process of the invention, the step of inputting the key frame data into an adjacent key frame image optimizing thread to perform key frame data optimizing processing, and obtaining the key frame data after image optimization comprises the following steps: and inputting the key frame data into an adjacent key frame image optimizing thread, and then sequentially carrying out redundant point elimination processing, semantic extraction processing and new image point creation processing opinion adjacent frame optimizing processing on the key frame data to obtain the key frame data after image optimization.

Further, the semantic extraction processing is performed on the key frame data after the redundant point elimination processing, including: performing object detection on the key frame data subjected to redundant point elimination processing based on a YOLO-v3 algorithm to obtain an object detection result; carrying out semantic association processing on the object detection result by using a conditional random field to obtain combined object category probability and scene context information; correcting and optimizing the combined object type probability and scene context information to generate a temporary object information candidate set; judging whether the temporary object information in the temporary object information candidate set is a new object or an existing object, and searching each point information of each temporary object information in the temporary object information candidate set in a corresponding neighborhood to acquire a nearest three-dimensional point of the point; and calculating the Euler distance between the point and the three-dimensional point, and if the Euler distance is smaller than a preset threshold value, considering the point and the three-dimensional point as the same point.

Specifically, after obtaining a key frame, inputting the key frame into an adjacent key frame image optimization thread, removing redundant points, designing a semantic extraction algorithm, realizing the image optimization process of adjacent frames among the key frames, and designing a semantic extraction algorithm which comprises the functions of object detection, object semantic association, temporary object generation, object association, object model updating and the like, wherein the object detection is responsible for extracting object information of a picture by using a deep learning network, carrying out semantic tag on the extracted object information, carrying out object association with corresponding semantics, and then correcting and optimizing through semantic association, so that the extracted detected object is more accurate and reliable, and is stored in a temporary object information set; the object association and updating is responsible for associating the temporary object information with the object information existing in the object database according to the mapping relation of the key frame, the object information and the map points, and integrating the temporary object information update into the corresponding object information.

The method comprises the steps of dividing each picture into N square grids based on a YOLO algorithm for object detection, performing object detection operation on each square grid once, and finally fusing detection results; the design of YOLO solves the problem of duplicate detection.

Semantic detection is carried out on the key frames by utilizing a YOLO algorithm, semantic association is further carried out on objects extracted through deep learning detection by utilizing a conditional random field, and detection classification accuracy is improved by combining object class probability and scene context information, wherein the energy equation corresponding to the designed conditional random field combining the object class probability and the context information is as follows:

E(x)＝∑ _i ψ _μ (x _i )+∑ _i<j ψ _P (x _i ,y _i )；

Unitary potential function ψ _u The following is shown:

ψ _μ ＝-log p(x _i )；

binary potential function ψ _P The following is shown:

wherein p (x) _i ) Represents the probability distribution, omega, of the category to which the ith object belongs given by the YOLO model _m Is a linear combining weight, μ is a tag compatibility function, representing the likelihood of simultaneous occurrence of different classes within the neighborhood.

Error calculation module 23: the method comprises the steps of calculating error values between key frame data after the optimization of the graph, and generating a candidate set based on the error values;

Synchronous positioning and map construction module 24: and the method is used for carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on correction results.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program to instruct related hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

In addition, the semantic ORB-SLAM sensing method and device based on environmental understanding provided by the embodiments of the present invention are described in detail, and specific examples should be adopted to illustrate the principles and embodiments of the present invention, and the description of the above embodiments is only used to help understand the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. A semantic ORB-SLAM perception method based on environmental understanding, the method comprising:

performing closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and performing synchronous positioning and map construction based on correction results;

the step of inputting the sequence frame into the ORB-SLAM front end Tracking thread for key frame extraction processing to obtain key frame data comprises the following steps:

Performing pose estimation and repositioning results according to the matched adjacent sequence frames to perform optimization processing to obtain pose optimization of adjacent frames, and acquiring a key frame sequence based on the pose optimization of the adjacent frames;

the ORB-SLAM front end Tracking thread adopts an inter-frame difference method to carry out dynamic background removal processing on an input sequence frame to obtain a sequence frame with a dynamic background removed, and the method comprises the following steps:

removing a dynamic background of a moving object in the sequence frames based on a selected threshold value, and acquiring sequence frames from which the dynamic background is removed;

of the sequence frames, the kth frame

And k+1 frames->

The change between is with a binarized differential value +.>

The expression is as follows:

；

2. The semantic ORB-SLAM sensing method of claim 1, wherein the creating a mapping relationship between the sequence frame with the dynamic background removed and the object feature points to obtain a sequence frame with the object feature point mapping relationship comprises:

and constructing a mapping relation between the sequence frames with the dynamic background removed and the object feature points based on the node tree, and acquiring the sequence frames with the mapping relation with the object feature points.

3. The semantic ORB-SLAM sensing method of claim 1, wherein the performing pose estimation and repositioning processing based on the matching feature point pairs comprises:

4. The semantic ORB-SLAM awareness method of claim 1, further comprising:

5. The semantic ORB-SLAM perceptual method of claim 1, wherein the optimizing the acquisition of a sequence of keyframes based on pose of the neighboring frames comprises:

and extracting the sequence frames in the common view as key sequence frames.

6. The semantic ORB-SLAM sensing method of claim 1, wherein inputting the keyframe data into an adjacent keyframe map optimization thread performs keyframe data optimization processing to obtain map-optimized keyframe data, comprising:

7. The semantic ORB-SLAM sensing method of claim 6, wherein the performing semantic extraction on the redundant point culling processed keyframe data comprises:

8. A semantic ORB-SLAM awareness apparatus based on environmental understanding, the apparatus comprising:

synchronous positioning and map construction module: the method is used for carrying out closed-loop correction processing on the candidate set based on global map optimization and loop fusion, and carrying out synchronous positioning and map construction based on correction results;

of the sequence frames, the kth frame

And k+1 frames->

The change between is with a binarized differential value +.>

The expression is as follows:

；