CN117570960A - Indoor positioning navigation system and method for blind guiding robot - Google Patents

Indoor positioning navigation system and method for blind guiding robot Download PDF

Info

Publication number
CN117570960A
CN117570960A CN202311522530.8A CN202311522530A CN117570960A CN 117570960 A CN117570960 A CN 117570960A CN 202311522530 A CN202311522530 A CN 202311522530A CN 117570960 A CN117570960 A CN 117570960A
Authority
CN
China
Prior art keywords
obstacle
information
rgb
image
blind
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311522530.8A
Other languages
Chinese (zh)
Inventor
高书苑
张敏慧
包蕊
束余鑫
华翔
王月
陈少飞
徐洋
马雁东
肖苏杰
孙杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202311522530.8A priority Critical patent/CN117570960A/en
Publication of CN117570960A publication Critical patent/CN117570960A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/38Electronic maps specially adapted for navigation; Updating thereof
    • G01C21/3804Creation or updating of map data
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/20Instruments for performing navigational calculations
    • G01C21/206Instruments for performing navigational calculations specially adapted for indoor navigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30261Obstacle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Automation & Control Theory (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The invention discloses an indoor positioning navigation system for a blind guiding robot, which comprises the following components: the method comprises the steps that an acquisition input end is used for acquiring an RGB-D visual image of a front environment by adopting an RGB-D camera, and the RGB-D visual image comprises an RGB image and a depth image; the core processing end is used for processing the RGB-D visual image acquired by the acquisition input end; and the information output end is used for outputting target information, including outputting position information of the obstacle in the RGB image and outputting article information to the earphone through voice transmission. The invention can improve the walking orientation, navigation and space cognition capability of the blind in the indoor space, and reminds the blind to avoid the obstacle in a combined feedback mode, so as to help the blind to realize indoor autonomous high-precision and real-time positioning navigation and guidance, and further improve the trip safety problem of the blind.

Description

Indoor positioning navigation system and method for blind guiding robot
Technical Field
The invention belongs to the field of blind guiding robot control, and particularly relates to an indoor positioning navigation system and method for a blind guiding robot.
Background
The travel of the blind is always a great difficulty in the life of the blind, and for the blind, the traditional travel mode depends on a blind road, a blind guiding stick and a blind guiding dog. However, the blind sidewalk is occupied and damaged, the blind guiding stick cannot actively guide, the blind guiding dogs are not received, and the like, and great inconvenience is brought to the daily life of the blind people.
According to the relevant statistics, 80% of people are moving indoors, and compared with outdoors, the indoor environment is safer for the blind. But on the other hand, the indoor space is more complex than the outdoor space, and the blind cannot perceive the self-positioning and the surrounding environment where the blind is positioned due to the lack of visual perception capability, so that the blind cannot avoid independent and free actions of the obstacles. Therefore, accurate indoor positioning plays an important role in solving the indoor activities of the blind, and the blind guiding robot capable of realizing the autonomous positioning navigation task is provided with very important research significance and application value.
Through prior art search findings, many researchers have made corresponding results in this regard. For example:
patent CN110448436a discloses an intelligent blind guiding stick with a blind road detection and positioning function, aiming at an outdoor blind road, the blind road information is obtained by using a camera on the blind road, image processing is carried out, and an angle is calculated and fed back to a visually impaired person, but the method can only realize accurate path tracking and cannot obtain accurate positioning.
Patent CN110368276A discloses an intelligent blind crutch, which is laid in a blind road where a blind person needs to walk by utilizing RFID, is identified by a re-utilization reader, and additionally comprises voice man-machine interaction, road surface ponding detection and traffic light identification, and has a powerful function. However, large-area RFID paving is a large project, is expensive, requires frequent maintenance, and has navigation accuracy affected by RFID paving density.
Patent CN110623820A discloses a wearable intelligent blind guiding device, which comprises a 3D infrared imaging camera, a system on a SOC (system on a chip), an ultrasonic matrix radar, a GPS (global positioning system) module and a natural language interaction processor, and receives a blind instruction by using voice and prompts obstacle avoidance information. However, the voice cannot describe the environmental obstacle information accurately in real time, and there is a large angle error when prompting the blind person to advance in the direction.
Ni et al at Tonan university designed a walking auxiliary robot system based on computer vision and vibration feedback, and extracted a safe traveling direction by adopting a method based on depth image information, and the scheme has the functions of sensing environment and feeding back information, but does not have navigation capability.
Disclosure of Invention
Aiming at the problems of low positioning navigation precision, poor man-machine interaction experience and the like of the existing intelligent blind guiding robot, the invention discloses an indoor positioning navigation system and method for the blind guiding robot, which can improve the walking orientation, navigation and space cognition capability of the blind in an indoor space, and remind the blind to avoid obstacles in a combined feedback mode so as to help the blind to realize indoor autonomous high-precision and real-time positioning navigation guidance and further improve the trip safety problem of the blind.
The technical aim of the invention is realized by the following technical scheme:
an indoor positioning navigation system for a blind guiding robot is characterized in that: comprising the following steps:
the method comprises the steps that an acquisition input end is used for acquiring an RGB-D visual image of a front environment by adopting an RGB-D camera, and the RGB-D visual image comprises an RGB image and a depth image;
the core processing end is used for processing the RGB-D visual image acquired by the acquisition input end, and specifically comprises the following steps:
obstacle detection: inputting the RGB image into an improved deep learning object detection algorithm YOLOv7 model for object recognition, and detecting and recognizing the condition of an obstacle in the RGB image;
obstacle positioning: matching the detected obstacle boundary box with a map constructed by an ORB-SLAM2 system, and acquiring position information of the obstacle based on a depth image;
positioning and building: identifying frame-by-frame images by adopting an ORB-SLAM2 system based on feature vectors created by RGB image key points, acquiring map construction information in a three-dimensional space through tracking, map construction and closed loop detection, maintaining a global navigation map by adopting the ORB-SLAM2 system, calculating an optimal path for reaching a destination at the current position in the global navigation map by using a path planning algorithm, and performing motion planning;
and the information output end is used for outputting target information, including outputting position information of the obstacle in the RGB image and outputting article information to the earphone through voice transmission.
Preferably, the ORB-SLAM2 system adopts matching based on overlapping degree and appearance characteristic rules, and associates the obstacle with characteristic points in the map, so that position information of the obstacle is provided.
Preferably, a deep learning target detection algorithm YOLOv7 model is combined with an ORB-SLAM2 system to build a YOLOv7-ORBSLAM2 indoor navigation network model, real-time obstacle detection is carried out while geometrical environment information is output, category and distance of obstacles are output by voice, so that the blind person obtains environment semantic information containing the category and coordinates of the obstacles, and a semantic point cloud map is built on the basis of a sparse point cloud map.
Preferably, in the process of building the YOLOv7-ORBSLAM2 indoor navigation network model, a main network module and a Neck module of the YOLOv7 model are improved, a double-layer attention mechanism module is embedded into the main network module of the YOLOv7 model, standard convolution for feature fusion in a neg module is changed into a lightweight convolution GSConv, and the improvement is carried out by using a Slim-neg model based on VoV-GSCSPC.
Preferably, the specific improvement method of the backbone network module of the YOLOv7 model is as follows: inputting the RGB image into a backbone network model for feature extraction, wherein the extracted features are called feature layers; a coarse-to-fine double-layer attention mechanism module is inserted behind the three effective feature layer output modules, deep convolution is adopted to strengthen local parts, a BRA module is applied to filter irrelevant redundant information in a rough area, and nonlinear transformation and feature extraction are carried out on attention output through a multi-layer perceptron module.
Preferably, the lightweight convolution GSConv in the negk module is a mixed convolution formed by SC, DSC and shuffle, by uniformly exchanging local feature information on different channels, using the shuffle to infiltrate the SC-generated information into each part of the DSC-generated information, and reducing the parameters and the number of FLOPs by DSC operations.
An indoor positioning navigation method for a blind guiding robot comprises the following steps:
s1: training an obstacle detection model, training by adopting an improved deep learning target detection algorithm YOLOv7 model to obtain the obstacle detection model, and identifying and positioning various different types of obstacles and predicting the categories and bounding boxes of the obstacles;
s2: acquiring an RGB image and a depth image, and acquiring the RGB image and the depth image of the front environment by adopting an RGB-D camera;
s3: detecting the type of the obstacle, taking the RGB image as the input of an obstacle detection model to carry out obstacle detection and identification, and outputting the identified type of the obstacle information and the pixel coordinate position information of the boundary frame through a deep learning target detection algorithm;
s4: establishing a semantic map, matching the detected obstacle target with a map constructed by an ORB-SLAM2 system, and generating a three-dimensional point cloud map with semantic position information through point cloud processing and ICP point cloud matching;
s5: calculating the obstacle distance: calculating the distance between an obstacle target and an RGB-D camera based on the depth value of a region corresponding to the obstacle boundary frame in the depth image, obtaining a 3D coordinate of the obstacle target in a camera coordinate system, and obtaining the position information of the obstacle target in the world coordinate system through the conversion relation between the camera coordinate system and the world coordinate system;
s6: man-machine interaction obstacle avoidance: the category information and the distance information of the obstacle are combined into voice prompts through voice, and the voice prompts and the vibration module work cooperatively to output.
The beneficial effects are that: the invention discloses an indoor positioning navigation system and method for a blind guiding robot, which have the following advantages:
(1) According to the invention, the improved YOLOv7-ORBSLAM2 is applied to a module for autonomous accurate positioning navigation of the blind, the image library of the object to be detected is trained and identified by adopting a deep learning object detection algorithm YOLOv7, so that the detection and identification of a plurality of targets in the image are realized, and meanwhile, the man-machine interaction is performed by utilizing a combined feedback form of voice information and vibration function, so that the blind can quickly obtain abundant and practical feedback information, the navigation safety is improved, and the problem that the blind is easy to be influenced by obstacles in the action of the blind in the indoor environment and cannot realize autonomous positioning navigation is solved.
(2) According to the invention, through improving the YOLOv7 algorithm framework, the occurrence of the condition of missed detection in the environment is reduced, the network structure is lightened, the requirement of the follow-up SLAM environment on real-time performance is met, and the problems that the calculation amount of the whole YOLOv7 framework is large and the SLAM real-time processing requirement cannot be met are solved.
Drawings
FIG. 1 is a system block diagram of the present invention;
FIG. 2 is a diagram of a prior art ORB-SLAM2 system architecture;
FIG. 3 is a diagram of a modified Yolov7-ORBSLAM2 system framework;
FIG. 4 is a schematic diagram of a dual-layer attention mechanism module;
FIG. 5 is a network architecture diagram of the modified YOLOv7 model;
FIG. 6 is a flow chart of a positioning navigation method of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
As shown in fig. 1, an indoor positioning navigation system of a blind guiding robot includes:
the method comprises the steps of acquiring an input end, and acquiring an RGB-D visual image of a front environment by adopting an RGB-D camera for sensing, wherein the RGB-D visual image comprises an RGB image and a depth image;
the core processing end is used for processing the RGB-D visual image acquired by the acquisition input end, and specifically comprises the following steps:
obstacle detection: inputting the RGB image into an improved deep learning target detection algorithm YOLOv7 model for target recognition, and detecting and recognizing the condition of the obstacle (comprising category information and obstacle bounding boxes) in the RGB image so that the blind person knows the types of surrounding objects;
obstacle positioning: and matching the detected obstacle boundary box with a map constructed by the ORB-SLAM2 system. In this embodiment, matching is performed based on the overlapping degree and the appearance characteristic rule, and the obstacle is associated with the characteristic point in the map, so that the position information of the obstacle is provided.
Positioning and building: the ORB-SLAM2 system is adopted to identify frame-by-frame images based on feature vectors created by RGB image key points, and the mapping information in a three-dimensional space is obtained through tracking, mapping and closed loop detection, so that the blind person knows where the blind person is and the surrounding environment, and a global navigation map is maintained through the ORB-SLAM2 system, and the optimal path of the current position to the destination is calculated in the global navigation map by using a path planning algorithm to carry out motion planning.
And the information output end is used for outputting target information, including outputting position information of the obstacle in the RGB image and outputting article information to the earphone through voice transmission.
In this embodiment, the ORB-SLAM2 system employs an algorithm for simultaneous vision-based localization and mapping, which uses feature points to track the position of RGB-D cameras in a three-dimensional environment and construct an environment map. The ORB-SLAM2 system in the embodiment mainly comprises four modules of a front-end visual odometer, a rear-end optimization, loop-back detection and graph construction, and as shown in FIG. 2, the relationship among each module in the conventional ORB-SLAM2 system is shown.
The specific design principle of the ORB-SLAM2 system in this embodiment 1 is as follows:
sensor (RGB-D camera) information reading: the RGB-D camera obtains optical signals through the optical element, the optical signals are converted into electric signals and projected onto the image sensor, finally digital image signals are obtained through an analog-to-digital electric conversion process, and a foundation is laid for subsequent operation through processing digital image data.
Initializing: to run the VSLAM architecture, a specific coordinate system needs to be defined for the mobile machine to make camera pose estimation and composition in an unknown environment. Therefore, at the time of initialization, a global coordinate system should be defined first, and at the beginning of system operation, an initial composition in the global coordinate system should be constructed.
Pose tracking: according to the data processing between the image frames, the local feature point information is acquired, then the feature matching between the image frames is carried out to acquire the 2D-3D corresponding relation between the image and the map, and then the camera pose of the current image frame is obtained by solving the PnP problem, wherein in the embodiment, the camera pose is usually an external parameter of translation and rotation of the camera in the global coordinate system.
Front-end visual odometer: the odometer utilizes the sensor to acquire the relative motion of the sensor, thereby achieving the purpose of estimating the incremental sequential change of the sensor position over time. The system uses an RGB-D camera as a sensor, called a visual odometer. This part is an integral part of the overall VSLAM system, and includes the two above-mentioned initialization and pose tracking modules, which support the overall system with the backend optimization module.
Backend (non-linear) optimization: optimization operations are typically performed in the hierarchy in order to suppress accumulated errors in the incremental operation of the VSLAM. In the process, the global optimization needs to consider the information consistency of the whole process, the rear end can capture the sensor information in the working process of the visual odometer again, the sensor information is compared with the reference information in the closed loop detection to inhibit errors, and finally the purpose of optimizing the moving target track and the composition is achieved.
And loop detection: a closed loop search is first performed by matching the current image frame with the selected image frame and calculating the similarity, and if a closed loop is detected, this indicates that the camera captured a previously passed view scene. By adopting loop detection, the accumulated error generated during the movement of the target can be effectively reduced.
And (3) building a diagram: in the mapping work, when the camera observes an unknown area where the patterning was not performed before, incremental patterning expansion is performed by calculating the three-dimensional structure of the surrounding environment, i.e., in the moving step, the pose of the robot is predicted using the robot motion model. In the correction step, observations of landmarks are used to improve the probabilistic gesture representation on the map, while updating the map with newly detected landmarks.
Since the ORB-SLAM2 system has a problem in a practical scene application: the movement of the RGB-D camera may be limited by the obstacle, in the ORB-SLAM2 system, only the camera is positioned, and the environment semantic information is lacking, so that the constructed map is a sparse point cloud map, only a part of characteristic points in the image are reserved as key points, the existence of the obstacle in the map is difficult to be depicted, and navigation obstacle avoidance cannot be performed. In many cases, the difficulty of navigation of visually impaired people is the positioning of the obstacle, especially the classification of the obstacle.
Therefore, the improved YOLOv7-ORBSLAM2 system framework diagram is adopted to improve the improved YOLOv7 model, as shown in fig. 3, the visually impaired people can encounter various obstacles in daily life, in many cases, the difficulty of visually impaired people navigation is that the positioning of the obstacles, especially the classification of the obstacles, is improved, and is combined with the ORB-SLAM2 system, so that real-time obstacle detection is realized while geometrical environment information is output by a SLAM algorithm, category and distance information of the obstacles are output by voice, the blind people can obtain environment semantic information comprising the category and coordinates of the obstacles, and a semantic point cloud map is established on the basis of a sparse point cloud map.
When the YOLOv7 model and the ORB-SLAM2 system are combined and applied to an actual scene, the condition that a target object in the scene is missed to be detected exists, and meanwhile, the whole SLAM framework structure is enlarged due to the fact that a target detection thread is added, and the real-time effect cannot be guaranteed during operation. For the problem, the method improves the YOLOv7 model, firstly, a double-layer attention mechanism module is embedded into a backbone network module of the YOLOv7 model, so that the feature learning capability of an algorithm is enhanced, and the algorithm pays more attention to useful information; aiming at the problems that the improved YOLOv7 model has large calculation amount and cannot meet the SLAM real-time processing requirement, the invention simplifies and improves the Neck module of the YOLOv7 model, changes the standard convolution of feature fusion of the Neck module into the lightweight convolution GSConv, and improves the model by using the Slim-Neck model based on the VoVGSCSPC, so that the network architecture is lighter and meets the SLAM real-time requirement.
In the present invention, as shown in fig. 5, the YOLOv7 model mainly includes four parts, i.e., input, backbone, neck, head, as shown in fig. 4 below. Firstly, preprocessing an RGB image through a series of operations such as data enhancement of an input part, and sending the RGB image into a backbone network, wherein the backbone network extracts characteristics of the processed image; then, the extracted features are fused through a network structure of PANet by a Neck module, 2 times of up sampling is carried out, and then 2 times of down sampling is carried out, so that the features with three sizes of large, medium and small are obtained through combination treatment from top to bottom; finally, the fused features are sent to a Head detection Head, and the result is obtained after detection.
In the process of building a network model of the YOLOv7-ORBSLAM2 indoor navigation, the invention provides an indoor positioning navigation method based on the improved YOLOv7 model applied to the ORBSLAM2 blind guiding robot.
Backbone network part: the backbone network part of the YOLOv7 model is mainly constructed by convolution, E-ELAN module, MPConv module and SPPCSPC module. The E-ELAN module changes the calculation block and simultaneously maintains the transition layer structure of the original ELAN on the basis of the original ELAN module, and the network learning enhancing capability is realized by utilizing the concept of expand, shuffle, mergecardinality under the condition of not damaging the original gradient path.
The SPPCSPC module adds parallel maxPool operation for multiple times in a series of convolutions, so that the problems of image distortion and the like caused by image processing operation are avoided, and the problem that the convolutional neural network extracts the repeated characteristics of the picture is solved.
The MPConv module expands the receptive field of the current feature layer by adopting MaxPool operation and then fuses the receptive field with the feature information after normal convolution processing, so that the generalization of the network is improved.
Inputting the RGB image into a backbone network model for feature extraction, wherein the extracted features are called feature layers; in the backbone network part, three feature layers are obtained for constructing a next network, and a coarse-to-fine dual-layer attention mechanism module is inserted behind the modules output by the three effective feature layers in the embodiment, as shown in fig. 4. Wherein the BRA module first filters out irrelevant key-value pairs at the coarse-region level, and then applies fine-grained Token-to-Token concerns in the remaining candidate region set. The two-layer attention mechanism modules in this embodiment all begin to use a 3x3 deep convolution to enhance local information exchange. A BRA module and a Multi-Layer Perceptron (MLP) module are then applied in sequence for cross-positional relationship modeling and each positional embedding, respectively. The coarse-to-fine double-layer attention mechanism module is added to better capture the context relation and semantic information of objects in the input image, so that the performance and accuracy of the model are improved, and the problem of missed detection of target objects in a scene is solved.
The Neck module: YOLOv7 adopts a traditional PAFPN structure, wherein FPN is a reinforced feature extraction network of YOLOv7, and three effective feature layers obtained in a trunk part can perform feature fusion in the part, and the purpose of feature fusion is to combine feature information of different scales. In the FPN part, the already obtained active feature layer is used to continue extracting features. The structure of Panet is still used in yolv 7, so that feature fusion can be realized by upsampling the features and downsampling the features again.
In this embodiment, the attention mechanism module is added, so that the overall frame structure becomes large, the calculation amount is increased, and the real-time effect cannot be ensured during operation. Therefore, the invention adopts the lightweight convolution GSConv to reduce the calculation cost and ensure the real-time performance. Lightweight convolution GSConv is a mixed convolution of SC (channel-dense convolution), DSC (channel-sparse convolution), and shuffle by uniformly exchanging local feature information over different channels, using shuffle to infiltrate SC-generated information into each part of DSC-generated information, by reducing the number of parameters and FLOPs using DSC operations. In the embodiment, the lightweight convolution GSConv is adopted to complete the task of reducing the computational complexity, so that the depth separable convolution achieves the effect close to the common convolution, and is more efficient.
Furthermore, based on GSConv, the invention provides a disposable aggregation module VoV-GSCSP, which replaces an ELAN-W module with large calculation amount and complex caused by multiple convolutions, and is simple, direct and quickens the reasoning speed.
In the Head detection Head part, a YOLOv7 model selects a detection Head which represents three target sizes of large, medium and small, and a RepConv module has a certain difference in structure during training and reasoning, thereby introducing the idea of structural re-parameterization.
The Yolo Head serves as a classifier and regressor for Yolo v7, and three enhanced layers of effective features can be obtained through the backbone network and FPN. Each feature layer has a width, a height and a channel number, and the feature map can be regarded as a set of feature points, each feature point has three prior frames, and each prior frame has a plurality of features of the channel. The Yolo Head works by judging the feature points and judging whether the prior frames on the feature points have objects corresponding to the prior frames.
The embodiment improves a deep learning target detection algorithm YOLOv7 model, and can be combined with an ORB-SLAM2 system to realize real-time obstacle detection in the ORB-SLAM2 system and accurately output the category and the position distance of the obstacle by using voice. And inputting the image acquired by the RGB-D camera into a system, adding a target detection thread, and detecting and identifying an object in the image through an improved YOLOv7 algorithm. In the mapping process, combining the key frames extracted by the ORB-SLAM2 system with the target recognition result, and generating a three-dimensional point cloud map with semantic information through point cloud processing and ICP point cloud matching.
Based on the improved indoor positioning navigation system of the YOLOv7-ORBSLAM2 blind guiding robot, a specific positioning navigation method of the blind guiding robot is shown in figure 6, and comprises the following steps:
s1: training an obstacle detection model, training by adopting an improved deep learning target detection algorithm YOLOv7 model to obtain the obstacle detection model, and identifying and positioning various different types of obstacles and predicting the categories and bounding boxes of the obstacles;
s2: acquiring an RGB image and a depth image, and acquiring the RGB image and the depth image of the front environment by adopting an RGB-D camera;
s3: detecting the type of the obstacle, taking the RGB image as the input of an obstacle detection model to carry out obstacle detection and identification, and outputting the identified type of the obstacle information and the pixel coordinate position information of the boundary frame through a deep learning target detection algorithm;
s4: establishing a semantic map, matching the detected obstacle target with a map constructed by an ORB-SLAM2 system, and generating a three-dimensional point cloud map with semantic position information through point cloud processing and ICP point cloud matching;
s5: calculating the obstacle distance: calculating the distance of the obstacle relative to the RGB-D camera based on the depth value of the area corresponding to the obstacle boundary frame in the depth image, obtaining the 3D coordinate of the obstacle in the camera coordinate system, and obtaining the position information of the obstacle target under the world coordinate system through the conversion relation between the camera coordinate system and the world coordinate system;
s6: man-machine interaction obstacle avoidance: the category information and the distance information of the obstacle are combined into voice prompts through voice, and the voice prompts and the vibration module work cooperatively to output. For example, a person moving 6 meters in front of the blind person can hear the blind person so as to avoid the blind person. Meanwhile, the vibration feedback is set for the blind person when the blind person encounters an obstacle so as to prevent the interference of external noise.
The present embodiment is only for explanation of the present invention and is not to be construed as limiting the present invention, and modifications to the present embodiment, which may not creatively contribute to the present invention as required by those skilled in the art after reading the present specification, are all protected by patent laws within the scope of claims of the present invention.

Claims (7)

1. An indoor positioning navigation system for a blind guiding robot is characterized in that: comprising the following steps:
the method comprises the steps that an acquisition input end is used for acquiring an RGB-D visual image of a front environment by adopting an RGB-D camera, and the RGB-D visual image comprises an RGB image and a depth image;
the core processing end is used for processing the RGB-D visual image acquired by the acquisition input end, and specifically comprises the following steps:
obstacle detection: inputting the RGB image into an improved deep learning object detection algorithm YOLOv7 model for object recognition, and detecting and recognizing the condition of an obstacle in the RGB image;
obstacle positioning: matching the detected obstacle boundary box with a map constructed by an ORB-SLAM2 system, and acquiring position information of the obstacle based on a depth image;
positioning and building: identifying frame-by-frame images by adopting an ORB-SLAM2 system based on feature vectors created by RGB image key points, acquiring map construction information in a three-dimensional space through tracking, map construction and closed loop detection, maintaining a global navigation map by adopting the ORB-SLAM2 system, calculating an optimal path for reaching a destination at the current position in the global navigation map by using a path planning algorithm, and performing motion planning;
and the information output end is used for outputting target information, including outputting position information of the obstacle in the RGB image and outputting article information to the earphone through voice transmission.
2. The indoor positioning navigation system and method for a blind guiding robot of claim, wherein: the ORB-SLAM2 system adopts matching based on overlapping degree and appearance characteristic rules, and associates the obstacle with characteristic points in the map, so that position information of the obstacle is provided.
3. The indoor positioning navigation system and method for a blind guiding robot of claim, wherein: and combining a deep learning target detection algorithm YOLOv7 model with an ORB-SLAM2 system to build a YOLOv7-ORBSLAM2 indoor navigation network model, realizing real-time obstacle detection while outputting geometric environment information, outputting the category and distance of the obstacle by using voice, so that the blind person obtains environment semantic information comprising the category and coordinates of the obstacle, and building a semantic point cloud map on the basis of a sparse point cloud map.
4. An indoor positioning navigation system and method for a blind guiding robot according to claim 3, characterized in that: in the process of building the Yolov7-ORBSLAM2 indoor navigation network model, a main network module and a Neck module of the Yolov7 model are improved, a double-layer attention mechanism module is embedded into the main network module of the Yolov7 model, standard convolution for feature fusion in a Neck module is changed into a lightweight convolution GSConv, and the improvement is carried out by using a Slim-Neck model based on VoV-GSCSPC.
5. The indoor positioning navigation system and method for blind guiding robot according to claim 4, wherein: the specific improvement method of the backbone network module of the YOLOv7 model is as follows: inputting the RGB image into a backbone network model for feature extraction, wherein the extracted features are called feature layers; a coarse-to-fine double-layer attention mechanism module is inserted behind the three effective feature layer output modules, deep convolution is adopted to strengthen local parts, a BRA module is applied to filter irrelevant redundant information in a rough area, and nonlinear transformation and feature extraction are carried out on attention output through a multi-layer perceptron module.
6. The indoor positioning navigation system and method for blind guiding robot according to claim 4, wherein: the lightweight convolution GSConv in the Neck module is mixed convolution formed by SC, DSC and shuffle, the information generated by the SC is permeated into each part of the information generated by the DSC by using the shuffle through uniformly exchanging local characteristic information on different channels, and parameters and the number of FLPs are reduced through DSC operation.
7. An indoor positioning navigation method for a blind guiding robot is characterized in that based on the indoor positioning navigation system for the blind guiding robot as set forth in any one of claims 1-6, the specific method is as follows:
s1: training an obstacle detection model, training by adopting an improved deep learning target detection algorithm YOLOv7 model to obtain the obstacle detection model, and identifying and positioning various different types of obstacles and predicting the categories and bounding boxes of the obstacles;
s2: acquiring an RGB image and a depth image, and acquiring the RGB image and the depth image of the front environment by adopting an RGB-D camera;
s3: detecting the type of the obstacle, taking the RGB image as the input of an obstacle detection model to carry out obstacle detection and identification, and outputting the identified type of the obstacle information and the pixel coordinate position information of the boundary frame through a deep learning target detection algorithm;
s4: establishing a semantic map, matching the detected obstacle target with a map constructed by an ORB-SLAM2 system, and generating a three-dimensional point cloud map with semantic position information through point cloud processing and ICP point cloud matching;
s5: calculating the obstacle distance: calculating the distance between an obstacle target and an RGB-D camera based on the depth value of a region corresponding to the obstacle boundary frame in the depth image, obtaining a 3D coordinate of the obstacle target in a camera coordinate system, and obtaining the position information of the obstacle target in the world coordinate system through the conversion relation between the camera coordinate system and the world coordinate system;
s6: man-machine interaction obstacle avoidance: the category information and the distance information of the obstacle are combined into voice prompts through voice, and the voice prompts and the vibration module work cooperatively to output.
CN202311522530.8A 2023-11-15 2023-11-15 Indoor positioning navigation system and method for blind guiding robot Pending CN117570960A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311522530.8A CN117570960A (en) 2023-11-15 2023-11-15 Indoor positioning navigation system and method for blind guiding robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311522530.8A CN117570960A (en) 2023-11-15 2023-11-15 Indoor positioning navigation system and method for blind guiding robot

Publications (1)

Publication Number Publication Date
CN117570960A true CN117570960A (en) 2024-02-20

Family

ID=89894759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311522530.8A Pending CN117570960A (en) 2023-11-15 2023-11-15 Indoor positioning navigation system and method for blind guiding robot

Country Status (1)

Country Link
CN (1) CN117570960A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118172514A (en) * 2024-03-25 2024-06-11 苏州科技大学 Robot dynamic SLAM method and system for intelligent storage dynamic environment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118172514A (en) * 2024-03-25 2024-06-11 苏州科技大学 Robot dynamic SLAM method and system for intelligent storage dynamic environment

Similar Documents

Publication Publication Date Title
Zhou et al. To learn or not to learn: Visual localization from essential matrices
CN108496127B (en) Efficient three-dimensional reconstruction focused on an object
Zhang et al. Instance segmentation of lidar point clouds
CA2950791C (en) Binocular visual navigation system and method based on power robot
CN109597087A (en) A kind of 3D object detection method based on point cloud data
CN114384920A (en) Dynamic obstacle avoidance method based on real-time construction of local grid map
CN112258618A (en) Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN110703747A (en) Robot autonomous exploration method based on simplified generalized Voronoi diagram
CN111080659A (en) Environmental semantic perception method based on visual information
CN112025729B (en) Multifunctional intelligent medical service robot system based on ROS
CN113516664A (en) Visual SLAM method based on semantic segmentation dynamic points
CN113903011B (en) Semantic map construction and positioning method suitable for indoor parking lot
CN113902860A (en) Multi-scale static map construction method based on multi-line laser radar point cloud
CN110728751A (en) Construction method of indoor 3D point cloud semantic map
CN112907625B (en) Target following method and system applied to quadruped bionic robot
CN117570960A (en) Indoor positioning navigation system and method for blind guiding robot
Sun et al. A review of visual SLAM based on unmanned systems
Saleem et al. Neural network-based recent research developments in SLAM for autonomous ground vehicles: A review
Zhu et al. A review of 6d object pose estimation
Li et al. An efficient image-guided-based 3D point cloud moving object segmentation with transformer-attention in autonomous driving
Tang et al. High-definition maps construction based on visual sensor: A comprehensive survey
CN115797397B (en) Method and system for all-weather autonomous following of robot by target personnel
CN116386003A (en) Three-dimensional target detection method based on knowledge distillation
CN115690343A (en) Robot laser radar scanning and mapping method based on visual following
Chen et al. Multiple-object tracking based on monocular camera and 3-D lidar fusion for autonomous vehicles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination