CN110738667A - RGB-D SLAM method and system based on dynamic scene - Google Patents

RGB-D SLAM method and system based on dynamic scene Download PDF

Info

Publication number
CN110738667A
CN110738667A CN201910913318.1A CN201910913318A CN110738667A CN 110738667 A CN110738667 A CN 110738667A CN 201910913318 A CN201910913318 A CN 201910913318A CN 110738667 A CN110738667 A CN 110738667A
Authority
CN
China
Prior art keywords
region
image
potential dynamic
dynamic
optical flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910913318.1A
Other languages
Chinese (zh)
Inventor
吉长江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yingpu Technology Co Ltd
Original Assignee
Beijing Yingpu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yingpu Technology Co Ltd filed Critical Beijing Yingpu Technology Co Ltd
Priority to CN201910913318.1A priority Critical patent/CN110738667A/en
Publication of CN110738667A publication Critical patent/CN110738667A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/579Depth or shape recovery from multiple images from motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses RGB-D SLAM methods and systems based on dynamic scenes, and the method comprises the steps of adopting a semantic segmentation network model based on deep learning to determine a potential dynamic region of an image, adopting a motion consistency method to identify the potential dynamic region as the dynamic region, extracting ORB feature points in the potential dynamic region and a background region, adopting an ICP algorithm to match the ORB feature points to obtain pose information of a robot so as to initially optimize the pose of the robot.

Description

RGB-D SLAM method and system based on dynamic scene
Technical Field
The application relates to the technical field of robot positioning and mapping, in particular to RGB-DSLAM methods and systems based on dynamic scenes.
Background
The current SLAM (simultaneous localization and mapping) technology is expected to start from an unknown place of an unknown environment, locate the position and the posture of a robot through repeatedly observed map features (such as corners, columns and the like) in the motion process, and build a map incrementally according to the position of the robot, so that the purpose of simultaneously locating and building the map is achieved.
The conventional SLAM system usually assumes that the surrounding environment of the robot is static, however, in practical application scenarios, the surrounding environment of the robot is always in dynamic change, which reduces the accuracy of the robot in locating the position of the robot itself, because objects in the dynamic environment may destroy the mapping of the environment, thereby causing the robot to generate an erroneous position estimation
The existing positioning and mapping methods include a deep circular convolution neural network (VO) end-to-end Visual odometer method, which combines the circular convolution neural network with the VO (Visual odometer), through inputting video clips or monocular image sequences, in each time interval, RGB image frames subtract the average RGB value of a training set in preprocessing and optionally adjust the image size to be a multiple of 64, two continuous images are stacked on to form tensors, which are used for deep RCNN learning how to extract motion information and estimate poses, the image tensors are put into CNN to obtain the effective features of the monocular VO and then are transmitted to RNN for sequence learning, in each time interval, each image pair through the network generates pose estimation, VO system progresses and estimates new poses along with image capture.
After CNN, RNN is adopted to run the learning of key frame sequence, namely, the motion model and the data association model are implicitly modeled in the CNN feature sequence, LSTM is used to determine which previous hidden states to discard or reserve to update the current state, the motion can be learned during the estimation of the pose, the association between long-distance images can be found and applied, two LSTM layers are stacked at to construct depth RNN, and based on the visual features obtained from CNN, pose estimation is output at each time interval, and the specific process of the positioning mapping method in the prior art is carried out along with the motion of the camera and the image acquisition.
However, most of the existing VO (Visual odometer) or SLAM systems assume that the environment is static, and denier SLAM systems operate in a complex dynamic environment, the performance of the SLAM systems may be degraded, so that the robot is not accurate enough to locate its own position, and therefore methods and systems capable of accurately locating in a dynamic scene are urgently needed.
Disclosure of Invention
It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.
According to aspects of the present application, there are provided dynamic scene-based RGB-D SLAM methods, the method comprising the steps of:
segmenting the network model based on the deep-learning semantic to determine potential dynamic regions of the image;
determining whether the characteristic points of two continuous frame images in the images correspond to each other by adopting a motion consistency method, judging whether the potential dynamic region in the images and the object to be identified in the background region are consistency, and if consistency is met, identifying the potential dynamic region as a dynamic region;
respectively extracting ORB feature points in the potential dynamic region and the background region, if the region where the ORB feature points are located is the dynamic region, deleting the ORB feature points in the current image frame and the reference frame, and otherwise, keeping the ORB feature points in the potential dynamic region and the background region;
and matching the ORB characteristic points by adopting an ICP (inductively coupled plasma) algorithm to obtain pose information of the robot.
Optionally, the step of adopting the semantic segmentation network model based on deep learning to determine the potential dynamic region of the image includes the following sub-steps:
extracting the outline edge of the object to be recognized, and inputting the RGB image containing the outline edge into a Mask-RCNN model for semantic segmentation to obtain a Mask image of the object to be recognized;
carrying out contour restoration on a mask image of an object to be identified: constructing a contour feature of the object to be identified, wherein the contour feature comprises an edge centroid; obtaining coordinate values of the edge centroid, and calculating the distance between the contour edge point of the object to be recognized and the edge centroid; and if the distance is greater than a preset distance threshold value, removing the contour edge point to determine a potential dynamic area in the image.
Optionally, a canny edge algorithm is adopted to repair the mask image of the object to be identified.
Optionally, the identifying the potential dynamic region as a dynamic region includes the following sub-steps:
calculating the optical flow value of each pixels in the image of the potential dynamic area, and obtaining the optical flow field of each point according to the optical flow value;
tracking sparse points inside and outside an object to be identified in a potential dynamic area by adopting a Lucas-Kanade optical flow method, and dividing a background area of an image according to an optical flow field of each point;
constructing a standardized histogram of potential dynamic and background regions, determining the range of each interval of the standardized histogram, and distributing all optical flow vectors to different clusters to form a plurality of boxes; constructing motion vectors of potential dynamic areas and background areas according to the optical flow vectors in each box;
and calculating cosine similarity of the motion vectors of the potential dynamic area and the background area, if the cosine similarity is greater than the measured motion state tolerance, determining that the potential dynamic area moves, and identifying the potential dynamic area as the dynamic area.
Optionally, the method further comprises the steps of: and performing global optimization on the pose information based on a closed loop detection mode and constraints among all frames of the image.
According to another aspects of the present application, there are provided dynamic scene-based RGB-D SLAM systems, the systems comprising a determination module, a recognition module, an extraction module, and an initial optimization module, wherein,
the determination module segments the network model based on the deep learned semantics to determine potential dynamic regions of the image;
the identification module determines whether points of two continuous frame images in the images correspond to each other by adopting a motion consistency method, judges whether objects to be identified in a potential dynamic region and a background region in the images are consistency, and identifies the potential dynamic region as a dynamic region if consistency is achieved;
the extraction module performs the following operations: respectively extracting ORB feature points in the potential dynamic region and the background region, if the region where the ORB feature points are located is the dynamic region, deleting the ORB feature points in the current image frame and the reference frame, and otherwise, keeping the ORB feature points in the potential dynamic region and the background region;
and the initial optimization module adopts an ICP algorithm to match the ORB characteristic points so as to obtain the pose information of the robot.
Optionally, the determining module includes a semantic segmentation unit and a repair unit:
the semantic segmentation unit performs the following operations: extracting the outline edge of the object to be recognized, and inputting the RGB image containing the outline edge into a Mask-RCNN model for semantic segmentation to obtain a Mask image of the object to be recognized;
the repair unit performs the following operations: constructing a contour feature of the object to be identified, wherein the contour feature comprises an edge centroid; obtaining coordinate values of the edge centroid, and calculating the distance between the contour edge point of the object to be recognized and the edge centroid; and if the distance is greater than a preset distance threshold value, removing the contour edge point to determine a potential dynamic area in the image.
Optionally, the canny edge algorithm is adopted to repair the mask image of the object to be identified.
Optionally, the identification module includes an optical flow field obtaining unit, a background region obtaining unit, a construction unit, and a cosine similarity obtaining unit;
the optical flow field acquisition unit is used for calculating the optical flow value of each pixels in the image of the potential dynamic area and obtaining the optical flow field of each point according to the optical flow value;
the background area acquisition unit tracks sparse points inside and outside an object to be identified in a potential dynamic area by adopting a Lucas-Kanade optical flow method, and divides a background area of an image according to an optical flow field of each point;
the construction unit is used for constructing a standardized histogram of a potential dynamic area and a background area, determining the range of each interval of the standardized histogram, and distributing all optical flow vectors to different clusters to form a plurality of boxes; constructing motion vectors of potential dynamic areas and background areas according to the optical flow vectors in each box;
the cosine similarity obtaining unit is used for calculating the cosine similarity of the motion vectors of the potential dynamic area and the background area, if the cosine similarity is larger than the measured motion state tolerance, the potential dynamic area is determined to move, and the potential dynamic area is identified as the dynamic area.
Optionally, the system further includes a global optimization module:
the global optimization module is used for carrying out global optimization on the pose information based on a closed loop detection mode and constraints among all frames of the image.
According to yet another aspects of the application, there is provided computer electronic devices comprising a memory, a processor and a computer program stored in said memory and executable by said processor, the computer program being stored in a space in the memory for program code, the computer program, when executed by the processor, implementing the method steps according to the invention for carrying out any of the .
According to yet another aspects of the application, computer-readable storage media are provided, the computer-readable storage media comprising a storage unit for program code, the storage unit being provided with a program for performing the steps of the method according to the invention, the program being executed by a processor.
According to yet another aspects of the application, there is provided computer program product comprising instructions which, when run on a computer, cause the computer to perform the steps of the method according to the invention.
According to the method, the potential dynamic region of the image is determined through the semantic segmentation network model of deep learning, the image can be accurately segmented, the dynamic region is identified from the potential dynamic region by adopting a motion consistency method, even if the SLAM system is in a dynamic environment, the 3D motion track of each input image in the camera can be accurately estimated according to ORB feature points in the potential dynamic region and a background region, and meanwhile, the pose information of the robot can be accurately acquired.
The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.
Drawings
The detailed description of the specific embodiments of the present application will be presented by way of example and not limitation with reference to the accompanying figures in which like references indicate similar or analogous elements or parts.
FIG. 1 is a schematic flow chart of dynamic scene-based RGB-D SLAM methods according to embodiments of the present application;
FIG. 2 is a schematic structural diagram of dynamic scene-based RGB-D SLAM systems according to embodiments of the present application;
FIG. 3 is a schematic diagram of a computing device according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a computer-readable storage medium according to an embodiment of the application.
Detailed Description
definition of terms
To facilitate the description of the details of the present embodiment, the following terms are first defined.
1, ORB: the original FASTand Rotated BRIEF algorithm is the fastest and stable feature point detection and extraction algorithm at present, and a plurality of image splicing and target tracking technologies are realized by utilizing ORB features.
2.RGB-D=RGB+Depth Map;
RGB color model is an industry color standard, which is obtained by changing three color channels of red (R), green (G) and blue (B) and superimposing them on each other to obtain various colors, wherein RGB represents the colors of the three channels of red, green and blue, and the standard almost includes all colors that can be perceived by human vision, and is that currently uses the most color system.
The COCO datasets are large image datasets for object detection, segmentation, human keypoint detection, semantic segmentation, and subtitle generation, among others.
Mask-RCNN model: the Mask RCNN is a network framework based on the fast RCNN, a full convolution Mask segmentation sub-network is added behind a basic feature network, and the original classification and regression detection tasks are changed into the classification, regression and segmentation detection tasks.
The Canny edge algorithm is multi-stage edge detection algorithms, the contour edge of the image can be obtained by adopting the Canny edge algorithm, and the gradient limit of edge detection can be changed by setting a threshold value.
7. The optical flow is the instantaneous speed of the pixel motion of a spatially moving object on an observation imaging plane, and is methods that use the change of pixels in an image sequence in a time domain and the correlation between adjacent frames to find the correspondence existing between the upper frames and the current frame, thereby calculating the motion information of the object between the adjacent frames, generally speaking, the optical flow is generated by the movement of a foreground object in a scene, the motion of a camera, or the common motion of the two.
The Lucas-Kanade optical flow algorithm is an two-frame differential optical flow estimation algorithm and is based on the following three assumptions that the gray level of pixel points in an area is kept unchanged when an external light source is stable and the time interval delta t is small, the continuous change of time does not influence the drastic change of the motion position of an object, and the pixel points adjacent to the surface in scenes have similar motion changes.
The ICP (Iterative Closest Points) algorithm consists of two parts: searching corresponding points and solving the pose, wherein the aim of the method is to search the matching relation between point sets, and the solved result is the translation and rotation amount between the two point sets.
RGB-D SLAM method based on dynamic scene
The experimental data set used in this example was from the TUM (university of Munich industries, Germany) data set, a large data set containing RGB-D data and ground truth data, with the aim of establishing new benchmarks for the evaluation of visual ranging and visual SLAM systems.
The TUM data set contains color and depth images of a Microsoft Kinect sensor along the sensor ground truth track, recording data at full frame rate (30Hz) and sensor resolution (640X 480). Where the ground truth track is obtained from a high precision motion capture system with eight high speed tracking cameras (100Hz), the workflow of the RGB-D SLAM method based on dynamic scenes will be described below.
FIG. 1 is a flow chart of dynamic scene-based RGB-D SLAM methods according to embodiments of the present application, as shown in FIG. 1, the method comprising the steps of:
step 100, segmenting a network model based on deep learning semantics to determine potential dynamic regions of an image;
specifically, for example, in an indoor environment where a person is a main dynamic object, an image containing the activity of the person in the TUM data set may be first adopted, then the person is defined as an object to be identified, and every frames of the image are input into a deep-learning semantic segmentation network model to determine a dynamic or potential dynamic region in the image;
preferably, the training samples of the deep-learning semantic segmentation network are from a COCO dataset, which is used to detect and classify images to determine dynamic or potentially dynamic regions in the images.
, step 100 includes the following substeps 110 and 120;
step 110, extracting the outline edge of the object to be recognized, inputting the RGB image containing the outline edge into a Mask-RCNN model for semantic segmentation to obtain a Mask image of the object to be recognized;
since useless points may still be detected around the object to be recognized after the RGB image including the contour edge is input to the Mask-RCNN model for semantic segmentation, contour refinement needs to be performed on the Mask image of the object to be recognized by a further step.
And 120, performing contour restoration on the mask image of the object to be identified: constructing a contour feature of the object to be identified, wherein the contour feature comprises an edge centroid; obtaining coordinate values of the edge centroid, and calculating the distance between the contour edge point of the object to be recognized and the edge centroid; if the distance is larger than a preset distance threshold value, removing the edge points of the outline to determine a potential dynamic area in the image;
preferably, a canny edge algorithm is adopted to detect the contour edge of the image and repair the mask image of the object to be identified.
And 200, determining whether the characteristic points of two continuous frame images in the images correspond to each other by adopting a motion consistency method to judge whether the potential dynamic region in the images and the object to be identified in the background region are consistency, and if consistency is reached, identifying the potential dynamic region as a dynamic region.
Preferably, after the potential dynamic region is determined in step 100, the motion -induced method, such as an optical flow method, adopted in step can be further used to determine whether the potential dynamic region in the image and the object to be identified in the background region are -induced, that is, whether the spatial-temporal -induced object to be identified is assumed first, and then whether the points of two continuous frame images correspond to each other is determined to determine whether the potential dynamic region and the background region are -induced.
Specifically, step 200 includes the following substeps:
step 210, calculating the light flow value of each pixels in the image of the potential dynamic area, and obtaining the light flow field of each point according to the light flow value;
step 220: tracking sparse points inside and outside an object to be identified in a potential dynamic area by adopting a Lucas-Kanade optical flow method, and dividing a background area of an image according to an optical flow field of each point;
step 230: constructing a standardized histogram of potential dynamic and background regions, determining the range of each interval of the standardized histogram, and distributing all optical flow vectors to different clusters to form a plurality of boxes; constructing motion vectors of potential dynamic areas and background areas according to the optical flow vectors in each box;
wherein the construction of motion vectors for potential dynamic and background regions from the optical flow vectors in each bin is obtained by the following sub-steps 231-:
calculating motion vectors of every boxes in the potential dynamic region;
Figure BDA0002215339470000081
Figure BDA0002215339470000087
is the magnitude of the optical flow vector in the R-th bin,
Figure BDA0002215339470000086
is the magnitude of the optical flow vector of the R-th bin in the potential dynamic region;
step 232, calculating motion vectors of every boxes in the background area;
Figure BDA0002215339470000082
is the magnitude of the optical flow vector in the R' th bin,
Figure BDA0002215339470000084
is the magnitude of the optical flow vector for the R' th bin in the background region;
constructing a motion vector of the potential dynamic region according to the motion vectors of boxes in the potential dynamic region, constructing a motion vector of the background region according to the motion vectors of boxes in the background region;
VD=(H1,H2,H3,...,H[R]);
VB=(H1′,H2′,H3′,...,H[R′]);
wherein, VDAnd VBMotion vectors for the potential dynamic region and the background region, respectively; r is the number of the box, HR]Is the motion vector of the R-th bin in the potential dynamic region, H [ R']Is a backgroundMotion vectors of the R' th bin in the region.
Step 240: and calculating cosine similarity of the motion vectors of the potential dynamic area and the background area, if the cosine similarity is greater than the measured motion state tolerance gamma, determining that the potential dynamic area moves, and identifying the potential dynamic area as the dynamic area.
Figure BDA0002215339470000085
Wherein cos Δ is a cosine similarity between motion vectors of the potential dynamic region and the background region; d is a potential dynamic area, and B is a background area; vDMotion vectors, V, for potential dynamic regionsBWhich is the motion vector of the background area, in this embodiment the measured motion state tolerance y is preset thresholds.
And S300, respectively extracting ORB feature points in the potential dynamic region and the background region, if the region where the ORB feature points are located is the dynamic region, ignoring the ORB feature points in the current image frame and the reference frame, and otherwise, keeping the ORB feature points in the potential dynamic region and the background region.
Specifically, in this embodiment, a Kinect sensor may be used to obtain ORB feature points of the potential dynamic region and the background region for feature extraction, so as to determine whether the ORB feature points are damaged;
moreover, the 3D motion track of each input image in the camera can be accurately estimated according to the ORB characteristic points in the potential dynamic area and the background area; wherein, the reference frame refers to the adjacent image of the current frame.
S400, matching the ORB characteristic points by adopting an ICP (inductively coupled plasma) algorithm to obtain pose information of the robot;
in the embodiment, the world coordinates are minimized, the ORB feature points are subjected to key point matching, and the pose of the robot is initially optimized by using the obtained errors among the 3D points;
specifically, firstly, determining positions of feature points to be located in a potential dynamic area and a background area;
then, carrying out fusion matching on the positions of the feature points, namely carrying out feature matching on the common feature points, reserving different feature points and removing noise points;
and measuring and minimizing the errors of the positions of the feature points so as to initially optimize the pose of the robot.
In another embodiment, after step 400 is completed, the dynamic scene based RGB-D SLAM method further further includes step 500:
performing global optimization on the pose information based on a closed loop detection mode and constraints among all frames of the image;
i.e. the global optimization of the pose information results by steps using the closed-loop detection mode and the constraints between all frames of the image input to the back-end of the SLAM system.
The method of the embodiment identifies potential dynamic regions of an image by utilizing semantic information of the image, then verifies consistency of optical flows in the potential dynamic regions and a background region by adopting a motion consistency method to identify the dynamic regions from the potential dynamic regions, and applies the method to the front end of an RGB-D SLAM system based on the identified dynamic regions, so that the track and the map of a robot can be tracked simultaneously.
RGB-D SLAM system based on dynamic scene
Fig. 2 is a schematic structural diagram of dynamic scene-based RGB-D SLAM systems according to embodiments of the present application, and referring to fig. 2, the systems include a determination module, a recognition module, an extraction module, and an initial optimization module, wherein,
the determination module segments the network model based on the deep learned semantics to determine potential dynamic regions of the image;
the identification module determines whether points of two continuous frame images in the images correspond to each other by adopting a motion consistency method, judges whether objects to be identified in a potential dynamic region and a background region in the images are consistency, and identifies the potential dynamic region as a dynamic region if consistency is achieved;
the extraction module performs the following operations: respectively extracting ORB feature points in the potential dynamic region and the background region, if the region where the ORB feature points are located is the dynamic region, deleting the ORB feature points in the current image frame and the reference frame, and otherwise, keeping the ORB feature points in the potential dynamic region and the background region;
and the initial optimization module adopts an ICP algorithm to match the ORB characteristic points so as to obtain the pose information of the robot.
In this embodiment, optionally, the determining module includes a semantic segmentation unit and a repair unit:
the semantic segmentation unit performs the following operations: extracting the outline edge of the object to be recognized, and inputting the RGB image containing the outline edge into a Mask-RCNN model for semantic segmentation to obtain a Mask image of the object to be recognized;
the repair unit performs the following operations: constructing a contour feature of the object to be identified, wherein the contour feature comprises an edge centroid; obtaining coordinate values of the edge centroid, and calculating the distance between the contour edge point of the object to be recognized and the edge centroid; and if the distance is greater than a preset distance threshold value, removing the contour edge point to determine a potential dynamic area in the image.
In this embodiment, optionally, the mask image of the object to be identified is repaired by using a canny edge algorithm.
In this embodiment, optionally, the identification module includes an optical flow field obtaining unit, a background region obtaining unit, a construction unit, and a cosine similarity obtaining unit;
the optical flow field acquisition unit is used for calculating the optical flow value of each pixels in the image of the potential dynamic area and obtaining the optical flow field of each point according to the optical flow value;
the background area acquisition unit tracks sparse points inside and outside an object to be identified in a potential dynamic area by adopting a Lucas-Kanade optical flow method, and divides a background area of an image according to an optical flow field of each point;
the construction unit is used for constructing a standardized histogram of a potential dynamic area and a background area, determining the range of each interval of the standardized histogram, and distributing all optical flow vectors to different clusters to form a plurality of boxes; constructing motion vectors of potential dynamic areas and background areas according to the optical flow vectors in each box;
the cosine similarity obtaining unit is used for calculating the cosine similarity of the motion vectors of the potential dynamic area and the background area, if the cosine similarity is larger than the measured motion state tolerance, the potential dynamic area is determined to move, and the potential dynamic area is identified as the dynamic area.
In this embodiment, optionally, the system further includes a global optimization module: the global optimization module is used for carrying out global optimization on the pose information based on a closed loop detection mode and constraints among all frames of the image.
The system provided by this embodiment may execute the method provided by any one of the RGB-D SLAM methods based on a dynamic scene, and the detailed process is described in the method embodiment and is not described herein again.
An embodiment of the present application further provides computing devices, referring to fig. 3, comprising a memory 620, a processor 610 and a computer program stored in said memory 620 and executable by said processor 610, the computer program being stored in a space 630 for program code in the memory 620, the computer program, when executed by the processor 610, implementing the method step 631 according to the invention for performing any of the items.
computer-readable storage media are also provided in embodiments of the application referring to fig. 4, the computer-readable storage media comprises a storage unit for program code provided with a program 631' for performing the steps of the method according to the invention, the program being executed by a processor.
computer program product containing instructions for causing a computer to perform the steps of the method according to the invention when the computer program product is run on a computer are also provided.
The computer instructions may be stored in a computer readable storage medium, or transmitted from website sites, computers, servers, or data centers via wired (e.g., coaxial cable, optical fiber, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) to website sites, computers, servers, or data centers via a wired (e.g., optical fiber, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) manner, the computer storage medium may be any available Solid State storage medium, such as a Solid State storage medium, a magnetic Disk, or a Solid State storage medium, such as a Solid State storage medium, a magnetic Disk, a Solid State storage medium, a computer 82, a computer network, a network, or other programmable apparatus.
should also further be appreciated that the exemplary elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of both, and that the exemplary components and steps have been described in the foregoing description generally in terms of functionality for clarity of illustrating interchangeability of hardware and software.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1, RGB-D SLAM method based on dynamic scene, the method includes the following steps:
segmenting the network model based on the deep-learning semantic to determine potential dynamic regions of the image;
determining whether the characteristic points in two continuous frame images in the images correspond to each other by adopting a motion consistency method, judging whether the potential dynamic region in the images and the object to be identified in the background region are consistency, and if consistency is met, identifying the potential dynamic region as a dynamic region;
respectively extracting ORB feature points in a potential dynamic region and a background region, if the region where the ORB feature points are located is the dynamic region, deleting the ORB feature points in the current image frame and the reference frame, and otherwise, keeping the ORB feature points in the potential dynamic region and the background region;
and matching the ORB characteristic points by adopting an ICP (inductively coupled plasma) algorithm to obtain pose information of the robot.
2. The method of claim 1, wherein: the deep learning based semantic segmentation network model to determine potential dynamic regions of an image comprises the sub-steps of:
extracting the outline edge of the object to be recognized, and inputting the RGB image containing the outline edge into a Mask-RCNN model for semantic segmentation to obtain a Mask image of the object to be recognized;
carrying out contour restoration on a mask image of an object to be identified: constructing a contour feature of the object to be identified, wherein the contour feature comprises an edge centroid; obtaining coordinate values of the edge centroid, and calculating the distance between the contour edge point of the object to be recognized and the edge centroid; and if the distance is greater than a preset distance threshold value, removing the contour edge point to determine a potential dynamic area in the image.
3. The method of claim 2, wherein: and repairing the mask image of the object to be identified by adopting a canny edge algorithm.
4. The method of claim 1, wherein:
said identifying the potential dynamic region as a dynamic region comprises the sub-steps of:
calculating the optical flow value of each pixels in the image of the potential dynamic area, and obtaining the optical flow field of each point according to the optical flow value;
tracking sparse points inside and outside an object to be identified in a potential dynamic area by adopting a Lucas-Kanade optical flow method, and dividing a background area of an image according to an optical flow field of each point;
constructing a standardized histogram of potential dynamic and background regions, determining the range of each interval of the standardized histogram, and distributing all optical flow vectors to different clusters to form a plurality of boxes; constructing motion vectors of potential dynamic areas and background areas according to the optical flow vectors in each box;
and calculating cosine similarity of the motion vectors of the potential dynamic area and the background area, if the cosine similarity is greater than the measured motion state tolerance, determining that the potential dynamic area moves, and identifying the potential dynamic area as the dynamic area.
5. The method according to any one of claims 1-4 to , further comprising the steps of:
and performing global optimization on the pose information based on a closed loop detection mode and constraints among all frames of the image.
6, RGB-D SLAM system based on dynamic scene, the system includes a determination module, a recognition module, an extraction module and an initial optimization module, wherein,
the determination module segments the network model based on the deep learned semantics to determine potential dynamic regions of the image;
the identification module determines whether points of two continuous frame images in the images correspond to each other by adopting a motion consistency method, judges whether objects to be identified in a potential dynamic region and a background region in the images are consistency, and identifies the potential dynamic region as a dynamic region if consistency is achieved;
the extraction module performs the following operations: respectively extracting ORB feature points in the potential dynamic region and the background region, if the region where the ORB feature points are located is the dynamic region, deleting the ORB feature points in the current image frame and the reference frame, and otherwise, keeping the ORB feature points in the potential dynamic region and the background region;
and the initial optimization module adopts an ICP algorithm to match the ORB characteristic points so as to obtain the pose information of the robot.
7. The system of claim 6, wherein the determination module comprises a semantic segmentation unit and a repair unit:
the semantic segmentation unit performs the following operations: extracting the outline edge of the object to be recognized, and inputting the RGB image containing the outline edge into a Mask-RCNN model for semantic segmentation to obtain a Mask image of the object to be recognized;
the repair unit performs the following operations: constructing a contour feature of the object to be identified, wherein the contour feature comprises an edge centroid; obtaining coordinate values of the edge centroid, and calculating the distance between the contour edge point of the object to be recognized and the edge centroid; and if the distance is greater than a preset distance threshold value, removing the contour edge point to determine a potential dynamic area in the image.
8. The system according to claim 7, wherein the repairing unit adopts canny edge algorithm to repair the mask image of the object to be identified.
9. The system according to claim 6, wherein the identification module comprises an optical flow field acquisition unit, a background region acquisition unit, a construction unit and a cosine similarity acquisition unit;
the optical flow field acquisition unit is used for calculating the optical flow value of each pixels in the image of the potential dynamic area and obtaining the optical flow field of each point according to the optical flow value;
the background area acquisition unit tracks sparse points inside and outside an object to be identified in a potential dynamic area by adopting a Lucas-Kanade optical flow method, and divides a background area of an image according to an optical flow field of each point;
the construction unit is used for constructing a standardized histogram of a potential dynamic area and a background area, determining the range of each interval of the standardized histogram, and distributing all optical flow vectors to different clusters to form a plurality of boxes; constructing motion vectors of potential dynamic areas and background areas according to the optical flow vectors in each box;
the cosine similarity obtaining unit is used for calculating the cosine similarity of the motion vectors of the potential dynamic area and the background area, if the cosine similarity is larger than the measured motion state tolerance, the potential dynamic area is determined to move, and the potential dynamic area is identified as the dynamic area.
10. The system according to any one of claims 6-9 and , wherein the system further comprises a global optimization module:
the global optimization module is used for carrying out global optimization on the pose information based on a closed loop detection mode and constraints among all frames of the image.
CN201910913318.1A 2019-09-25 2019-09-25 RGB-D SLAM method and system based on dynamic scene Pending CN110738667A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910913318.1A CN110738667A (en) 2019-09-25 2019-09-25 RGB-D SLAM method and system based on dynamic scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910913318.1A CN110738667A (en) 2019-09-25 2019-09-25 RGB-D SLAM method and system based on dynamic scene

Publications (1)

Publication Number Publication Date
CN110738667A true CN110738667A (en) 2020-01-31

Family

ID=69269616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910913318.1A Pending CN110738667A (en) 2019-09-25 2019-09-25 RGB-D SLAM method and system based on dynamic scene

Country Status (1)

Country Link
CN (1) CN110738667A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709982A (en) * 2020-05-22 2020-09-25 浙江四点灵机器人股份有限公司 Three-dimensional reconstruction method for dynamic environment
CN112381841A (en) * 2020-11-27 2021-02-19 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on GMS feature matching in dynamic scene
CN112529934A (en) * 2020-12-02 2021-03-19 北京航空航天大学杭州创新研究院 Multi-target tracking method and device, electronic equipment and storage medium
CN112884835A (en) * 2020-09-17 2021-06-01 中国人民解放军陆军工程大学 Visual SLAM method for target detection based on deep learning
WO2022217794A1 (en) * 2021-04-12 2022-10-20 深圳大学 Positioning method of mobile robot in dynamic environment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109658449A (en) * 2018-12-03 2019-04-19 华中科技大学 A kind of indoor scene three-dimensional rebuilding method based on RGB-D image

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109658449A (en) * 2018-12-03 2019-04-19 华中科技大学 A kind of indoor scene three-dimensional rebuilding method based on RGB-D image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LILI ZHAO等: ""A Compatible Framework for RGB-D SLAM in Dynamic Scenes"", 《IEEE ACCESS》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709982A (en) * 2020-05-22 2020-09-25 浙江四点灵机器人股份有限公司 Three-dimensional reconstruction method for dynamic environment
CN112884835A (en) * 2020-09-17 2021-06-01 中国人民解放军陆军工程大学 Visual SLAM method for target detection based on deep learning
CN112381841A (en) * 2020-11-27 2021-02-19 广东电网有限责任公司肇庆供电局 Semantic SLAM method based on GMS feature matching in dynamic scene
CN112529934A (en) * 2020-12-02 2021-03-19 北京航空航天大学杭州创新研究院 Multi-target tracking method and device, electronic equipment and storage medium
CN112529934B (en) * 2020-12-02 2023-12-19 北京航空航天大学杭州创新研究院 Multi-target tracking method, device, electronic equipment and storage medium
WO2022217794A1 (en) * 2021-04-12 2022-10-20 深圳大学 Positioning method of mobile robot in dynamic environment

Similar Documents

Publication Publication Date Title
JP6095018B2 (en) Detection and tracking of moving objects
CN110738667A (en) RGB-D SLAM method and system based on dynamic scene
Crivellaro et al. Robust 3D tracking with descriptor fields
Greene et al. Multi-level mapping: Real-time dense monocular slam
CN108198201A (en) A kind of multi-object tracking method, terminal device and storage medium
WO2015017539A1 (en) Rolling sequential bundle adjustment
US10268929B2 (en) Method and device for generating binary descriptors in video frames
US10249046B2 (en) Method and apparatus for object tracking and segmentation via background tracking
KR20110021500A (en) Method for real-time moving object tracking and distance measurement and apparatus thereof
Tang et al. Fmd stereo slam: Fusing mvg and direct formulation towards accurate and fast stereo slam
Zhang et al. An optical flow based moving objects detection algorithm for the UAV
Wientapper et al. Composing the feature map retrieval process for robust and ready-to-use monocular tracking
CN115511970B (en) Visual positioning method for autonomous parking
JP2014102805A (en) Information processing device, information processing method and program
Suttasupa et al. Plane detection for Kinect image sequences
Yu et al. Accurate motion detection in dynamic scenes based on ego-motion estimation and optical flow segmentation combined method
Lee et al. Multisensor fusion-based object detection and tracking using active shape model
Ahn et al. Human tracking and silhouette extraction for human–robot interaction systems
CN110910418B (en) Target tracking algorithm based on rotation invariance image feature descriptor
Zhou et al. Speeded-up robust features based moving object detection on shaky video
Rong et al. IMU-Assisted Online Video Background Identification
Talouki et al. An introduction to various algorithms for video completion and their features: a survey
Dargazany Human body parts tracking: Applications to activity recognition
Raju et al. Motion detection and optical flow
Hu et al. Research on a line-expanded visual odometry in dynamic environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200131