CN114488244A - Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning - Google Patents

Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning Download PDF

Info

Publication number
CN114488244A
CN114488244A CN202210143183.7A CN202210143183A CN114488244A CN 114488244 A CN114488244 A CN 114488244A CN 202210143183 A CN202210143183 A CN 202210143183A CN 114488244 A CN114488244 A CN 114488244A
Authority
CN
China
Prior art keywords
module
gnss
information
blind
positioning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210143183.7A
Other languages
Chinese (zh)
Inventor
王庆
孙杨
严超
黎露
谭镕轩
王怀虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210143183.7A priority Critical patent/CN114488244A/en
Publication of CN114488244A publication Critical patent/CN114488244A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/38Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system
    • G01S19/39Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system the satellite radio beacon positioning system transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • G01S19/42Determining position
    • G01S19/48Determining position by combining or switching between position solutions derived from the satellite radio beacon positioning system and position solutions derived from a further system
    • G01S19/485Determining position by combining or switching between position solutions derived from the satellite radio beacon positioning system and position solutions derived from a further system whereby the further system is an optical system or imaging system
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H3/00Appliances for aiding patients or disabled persons to walk about
    • A61H3/06Walking aids for blind persons
    • A61H3/061Walking aids for blind persons with electronic detecting or guiding means
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S19/00Satellite radio beacon positioning systems; Determining position, velocity or attitude using signals transmitted by such systems
    • G01S19/38Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system
    • G01S19/39Determining a navigation solution using signals transmitted by a satellite radio beacon positioning system the satellite radio beacon positioning system transmitting time-stamped messages, e.g. GPS [Global Positioning System], GLONASS [Global Orbiting Navigation Satellite System] or GALILEO
    • G01S19/42Determining position
    • G01S19/48Determining position by combining or switching between position solutions derived from the satellite radio beacon positioning system and position solutions derived from a further system
    • G01S19/49Determining position by combining or switching between position solutions derived from the satellite radio beacon positioning system and position solutions derived from a further system whereby the further system is an inertial position system, e.g. loosely-coupled

Landscapes

  • Engineering & Computer Science (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Pain & Pain Management (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Rehabilitation Therapy (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a semantic SLAM-based multi-sensor fusion wearable blind intelligent auxiliary navigation device, which is a multi-sensor fusion SLAM method considering the advantages and the limitations of a single sensor, and ensures the accuracy of indoor and outdoor positioning by fusing data of the multi-sensor. And the blind can be helped to sense the surrounding environment by combining the semantic SLAM, and the information is fed back to the blind. Abundant visual information is left for continuous development. In the invention, aiming at the problem that the depth of a three-dimensional point under a camera coordinate system cannot be obtained because feature point matching cannot be carried out in the environment with less texture in the conventional visual SLAM, the stability of the algorithm and the accuracy of the result can be greatly improved by adopting a visual and inertial SLAM algorithm, and the SLAM method is used for indoor positioning.

Description

Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning
Technical Field
The invention belongs to the field of aided navigation devices, and particularly relates to a wearable blind aided navigation device and method based on semantic VISLAM and GNSS positioning.
Background
In recent years, the population number of people with visual impairment in China is increasing, and attention and protection to the vulnerable groups gradually enter the visual field of more and more people. And at present, the social development is different day by day, and with the increasing complexity of traffic environment, urban road environment and indoor environment, great examination is provided for the daily trip of the blind, and more potential safety hazards exist. The blind auxiliary navigation device can be the most powerful supplement for the blind to perform daily activities besides a blind guiding walking stick and a blind guiding dog, and has the characteristics of small volume, easiness in putting on and off, high efficiency, accuracy, lower cost, wide application range, short production period and the like. The device has obvious advantages in obtaining information informing the blind about the surrounding environment, making navigation route suggestions and warning the blind about potential risks, no matter in indoor or outdoor environment.
For the navigation problem of the blind, the problems of indoor and outdoor positioning and environmental perception are mainly solved. Currently, Global Navigation Satellite System (GNSS) is the most used for outdoor positioning. GNSS can accomplish anywhere on the earth's surface or near-earth space to provide users with all-weather latitude and longitude and altitude information, as well as time information, etc. However, GNSS also has some problems, firstly, because the satellite navigation signals are weak, and because the satellite signals themselves have high frequency and short wavelength in the gigahertz range, the signals are easily blocked. The problems that the GNSS receiver is poor in anti-jamming capability, easy to cheat, low in data output frequency of the GNSS receiver and the like exist, the GNSS receiver can be influenced by dense buildings, trees and the like in dense urban areas, indoor environments, subways and complex environments of villages and small towns of high-rise buildings, and the GNSS receiver can be located and output is interrupted due to signal shielding, so that the GNSS receiver cannot work. In order to be able to compensate for the deficiencies of GNSS positioning, a combined outdoor positioning technique is usually formed in combination with an Inertial Measurement Unit (IMU). The IMU obtains three-dimensional acceleration information and angular velocity information of an object space respectively by means of an accelerometer and a gyroscope, and obtains navigation required information such as positions and postures in a space environment by integrating the acceleration data and the angular velocity data. The IMU is not limited by any external conditions, the operation is not limited by meteorological conditions, and it relies entirely on movement of the carrier for positioning. Therefore, the IMU may maintain a continuous positioning output when GNSS signals are occluded. But the greatest disadvantage is that the method has an error accumulation effect, and the positioning precision is continuously reduced as the positioning process is carried out. The high-precision IMU can keep higher positioning precision for a long time, but the price and the cost are higher, and the requirement of the blind navigation device on low cost is difficult to meet.
Instant positioning and mapping (SLAM), if the sensor used is a camera, is called a visual SLAM, or vSLAM. The camera can capture abundant information in a scene, and the IMU can obtain accurate estimation in a short time at a high frequency, so that the influence of a dynamic object on the camera is reduced. In turn, the camera data can also effectively correct for IMU drift in the readings, and to the extent that the two sensors can complement each other, better results can be obtained when used together. At the same time, SLAM has also gained popularity in a variety of potential applications due to the miniaturization in size and reduced cost of cameras and IMUs. The visual SLAM system using both binocular cameras + IMU is also referred to as VSLAM. The SLAM problem is a process of constructing a map of a strange environment by the free movement of a robot and sensor information carried by the robot in the strange environment with unknown prior conditions. The robot is required to perform accurate positioning when the map is built, and the map is required to be used as a reference for positioning, so that the problems of map building and positioning are complementary and can not be overcome. The visual SLAM may estimate the position and posture of the camera, the position and posture at each time, and thus form a trajectory of the camera movement by reading image information of a monocular camera, a binocular camera, or a depth camera. The current position of the blind person is positioned by adopting the method. Compared with other sensors, the camera can acquire enough environment details, can depict the appearance and the shape of an object, read marks and the like, and helps the carrier to realize environment cognition. Generally, a visual SLAM module is mainly divided into four parts, namely a front-end visual odometer, a rear-end nonlinear optimization, loop detection and mapping. The front end is also called a visual odometer, namely a process of continuously matching information between images to solve the pose so as to generate new map points and continuously expand the existing map. The front-end system of the camera + IMU is also known as (Visual-Inertial odometer, SLAM) Visual Inertial odometer. The back end is mainly used for optimizing the result obtained by the front end, and better overall consistency of the constructed graph is ensured.
Semantic SLAM, as a branch of visual SLAM, is mainly responsible for classifying and identifying surroundings by using image information in cameras through a deep learning method, such as roads, vehicles, traffic lights, pedestrians, animals, tables and chairs, and the like.
Related manufacturers at home and abroad have produced related integration products such as Alice band wearable devices designed by Microsoft corporation, which receive information from sensors installed on urban entities such as buildings or train cars through a receiving device on the head of a user. However, the device requires that the sensor is installed in advance and has not wide applicability; the backpack device Eyeronman of tactle Navigation Tool company integrates sensors such as laser radar, ultrasonic waves and infrared rays used by an automatic driving automobile, and signals of the sensors are converted into vibration. However, the above-mentioned devices are very expensive, have low popularity, and are also bulky and heavy, which is not easy to carry. Domestic like the AngelEye intelligent glasses that terahertz security protection company designed, can carry out simple discernment and navigation function, nevertheless because the device is less, do not contain GNSS module and treater, have the limitation in the aspect of discernment efficiency and navigation accuracy to the cost is higher, and the prevalence is difficult to promote.
The invention relates to a wearable blind-aided navigation device and method based on a fusion of multiple sensors, in particular to a wearable blind-aided navigation device and method based on semantic visLAM fusion. The wearable blind-person aided navigation device comprises a GNSS antenna, a GNSS/IMU combined positioning module, a color binocular camera, an earphone, a vibration module, a processor and a 4G communication module; the GNSS/IMU combined positioning module and the color binocular camera can realize the miniaturization of the device. The color binocular camera is mainly responsible for positioning and target detection, and can calculate the parallax of an image to obtain depth data after left and right eye matching; the image is input into a trained convolutional neural network model, and the class of the target in the image can be output. The vibration module is composed of a direct current motor module and a motor driving module thereof, and the rotating end of the motor is connected with a quarter circle to realize vibration; the processor can detect the distance of the object to be detected and output a control quantity which is in negative correlation with the distance so as to control the motor to rotate, and the closer the distance is, the larger the vibration is. The communication speed of the 4G communication module is LTECat 1, which can reach 10Mbps, and the functions of telephone calling and short message sending can be realized; the position coordinate information of the blind person is sent to the reserved mobile phone number of the family once every hour in the device. According to the wearable blind-person aided navigation method, under the environment that indoor or outdoor GNSS signals are easy to shield, aiming at the problem that the existing visual SLAM cannot perform feature point matching in the environment with few textures, so that the depth of a three-dimensional point under a camera coordinate system cannot be obtained, the visual and inertial SLAM algorithm is adopted, and the stability of the algorithm and the accuracy of a result can be greatly improved. In an outdoor open environment, the GNSS positioning result in the GNSS and IMU combined positioning module is loosely coupled with the GNSS positioning result to obtain the global position and pose in the GNSS coordinate system. And after receiving the differential data stream of the GNSS reference station, sending the position information to the family members through the 4G communication module. Meanwhile, an image obtained through a binocular camera is input to a MaskR-CNN semantic segmentation network which is trained, a two-dimensional semantic label obtained through output is attached to a binocular camera color image, and each pixel on the image can be classified. Different processing schemes are set for different identified objects, for a road, a blind road is identified, or voice is sent through an earphone according to the position of the image of the road to prompt a blind person to walk on the blind road or on the right side of the road; for the traffic signal lamp, color information can be read through a binocular camera, and after the traffic signal lamp is judged, a voice prompt is sent through an earphone; for pedestrians or vehicles approaching, the object distance information obtained through the binocular camera sends warning prompt voice through the earphone, and the vibration module vibrates; for obstacles such as tables, chairs, trees, walls and the like, object distance information obtained through the binocular camera can be sent to prompt voice through the earphone.
Disclosure of Invention
In order to solve the technical problems, the invention provides a semantic SLAM-based multi-sensor fusion wearable blind person intelligent auxiliary navigation device, which is provided with a multi-sensor fusion SLAM method by considering the advantages and the limitations of a single sensor, and ensures the accuracy of indoor and outdoor positioning by fusing data of the multi-sensor. And the blind can be helped to sense the surrounding environment by combining the semantic SLAM, and the information is fed back to the blind. Abundant visual information is left for continuous development.
The invention provides a wearable blind-person aided navigation device based on semantic VISLAM and GNSS positioning, which comprises a GNSS antenna, a GNSS/IMU combined positioning module, a color binocular camera, an earphone, a vibration module, a processor, a 4G communication module and a power module, wherein the GNSS antenna is connected with the GNSS/IMU combined positioning module; the method is characterized in that: the binocular camera is responsible for positioning and target detection, and after left and right eye matching, the parallax of the image is calculated to obtain depth data; the image is input into a trained convolutional neural network model, the category of a target in the image is output, the vibration module is composed of a direct current motor module and a motor driving module thereof, and the rotating end of the motor is connected with a quarter circle; the processor can detect the distance of an object to be detected and output a control quantity which is negatively related to the distance so as to control the motor to rotate, the closer the distance is, the greater the vibration is, the communication speed of the 4G communication module is LTECat 1 and can reach 10Mbps, and the functions of telephone calling and short message sending are realized; the position coordinate information of the blind person is sent to the reserved mobile phone number of the family once every hour in the device.
The invention provides a navigation method of a wearable blind-person aided navigation device based on semantic VISLAM and GNSS positioning, which comprises the following steps:
s1, a GNSS and IMU combined positioning module collects GNSS observation data, acceleration and angular velocity information and transmits the GNSS observation data, the acceleration and the angular velocity information to a processor, and displacement and attitude change of a carrier are calculated;
the step S1 includes the following steps:
s11, performing GNSS relative positioning;
s12, pose solving based on acceleration and angular velocity information;
s13, solving the combined pose of the GNSS and the IMU;
s2, transmitting the color image information acquired by the binocular camera and the camera position and posture information read by the IMU module to a processor, and resolving and optimizing the position and posture information of the carrier;
the step S2 includes the following steps:
s11, binocular image feature extraction;
the step S21 includes the following steps: establishing an image pyramid, extracting binocular image feature points, and matching the binocular image feature points;
s12, solving an attitude transformation matrix;
the step S22 includes the following steps: IMU pre-integration, estimating an initial pose and solving an attitude transformation matrix;
s3, inputting color image information acquired by a binocular camera into a trained MaskR-CNN semantic segmentation network in a processor, outputting a two-dimensional semantic label, and classifying each pixel of the binocular image according to the semantic label;
the step S3 includes the following steps:
s31, loading the weight of the training model;
s32, outputting the classified images by model reasoning;
s33, carrying out depth estimation on the classified objects;
and S4, according to the classification result, different processing is carried out on different objects, and processing information is sent to the voice synthesis module and the vibration module.
The step S4 includes the following steps:
s41, judging whether the indoor space or the outdoor space is available;
s42, indoor: judging and outputting prompt information such as the position, the distance and the like of an indoor article to a voice synthesis module;
s43, outdoor: judging and outputting warning and prompting information of outdoor articles to a voice synthesis module;
and S44, controlling the vibration frequency of the vibration motor according to the distance information.
As a further development of the invention, the outdoor articles comprise traffic light colour, vehicle distance and road position.
Has the advantages that:
the invention provides a semantic SLAM-based multi-sensor fusion wearable blind intelligent auxiliary navigation device, which is a multi-sensor fusion SLAM method considering the advantages and the limitations of a single sensor, and ensures the accuracy of indoor and outdoor positioning by fusing data of the multi-sensor. And the blind can be helped to sense the surrounding environment by combining the semantic SLAM, and the information is fed back to the blind. Abundant visual information is left for continuous development. In the invention, aiming at the problem that the depth of a three-dimensional point under a camera coordinate system cannot be obtained because feature point matching cannot be carried out in the environment with less texture in the conventional visual SLAM, the stability of the algorithm and the accuracy of the result can be greatly improved by adopting a visual and inertial SLAM algorithm, and the SLAM method is used for indoor positioning.
Drawings
FIG. 1 is a schematic structural diagram of an intelligent navigation aid for the blind;
FIG. 2 is a first design drawing of the intelligent navigation aid for the blind;
FIG. 3 is a second design drawing of the intelligent navigation aid for the blind according to the present invention;
FIG. 4 is a flow chart of a GNSS and IMU combined positioning;
FIG. 5 is a SLAM + BA optimized positioning flow diagram;
FIG. 6 is a flowchart of the system as a whole;
FIG. 7 is a schematic diagram of actual operation;
FIG. 8 is the MaskR-CNN training model in semantic SLAM.
Description of the accessories: 1. an earphone; a GNSS antenna; 3, an IMU module; 4. a color binocular camera; 5. a vibration module; 6. a speech synthesis module; a GNSS module; 8. a processor; 9. a power supply module; and (5) a 10.4G communication module.
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
as shown in fig. 1-3, a wearable blind-person aided navigation device based on semantic SLAM and GNSS positioning, includes a GNSS antenna 2, a GNSS/IMU combined positioning module, a color binocular camera 4, an earphone 1, a vibration module 5, a processor 8, a power module 9, and a 4G communication module 10; the GNSS module 7, the IMU module 3 and the color binocular camera can realize the miniaturization of the device. The GNSS positioning module 7, the IMU module 3, the color binocular camera 4, the vibration module 5 and the voice synthesis module 6 are all connected with the processor 8. The earphone is connected with the voice synthesis module.
The overall flow chart of the system is shown in fig. 6, the actual operation schematic diagram is shown in fig. 7, and under an indoor environment, the blind person obtains a color image of the surrounding environment through a color binocular camera in the motion process, and the motion transformation of the blind person is calculated by combining posture data provided by an IMU. Under the outdoor environment, the combined positioning is realized through the GNSS module and the IMU module to obtain the motion data of the blind person, and the data are fused to obtain the position of the blind person. And then, sensing surrounding environment information by using a color image, sensing surrounding objects by using a convolutional neural network MaskR-CNN, extracting object distance and color information, inputting the object distance and color information into a voice synthesis module, and acquiring surrounding information by the blind through an earphone so as to play a role in assisting the blind in navigation.
Position and orientation estimation based on GNSS and IMU combined positioning
As shown in fig. 4, the GNSS module and the IMU module respectively acquire GNSS observation data, GNSS differential data, acceleration data, and angular velocity data.
1. SS double-differenced relative positioning (GNSS-RTK). And establishing a pseudo-range and carrier double-difference observation equation by utilizing the GNSS observation data and the GNSS differential data. Let the instantaneous coordinate of the satellite s be (X)s,Ys,Zs) The approximate coordinates of the station r are
Figure BDA0003507370150000061
The corrected number is (dx, dy, dz) and is linearized, which is abbreviated as:
Figure BDA0003507370150000062
Figure BDA0003507370150000063
wherein the content of the first and second substances,
Figure BDA0003507370150000064
representing an interstation interplanetary double difference operator;
Figure BDA0003507370150000065
representing an interplanetary difference operator; s and r are serial numbers of the satellite and the survey station; j is a pseudorange or carrier observation type (j ═ 1, 2, 3); p is a pseudo-range observed value;
Figure BDA0003507370150000066
as geometrical distance between satellite and receiver
Figure BDA0003507370150000067
c is the speed of light in a vacuum environment; deltaion、δtropThe ionosphere refraction error and the troposphere refraction error are included; lambda [ alpha ]iA carrier wavelength;
Figure BDA0003507370150000068
Figure BDA0003507370150000069
the carrier phase observed value and the integer ambiguity are obtained; epsilonP
Figure BDA00035073701500000610
The pseudo range and the carrier phase are observed noises.
And (3) solving the integer ambiguity by using a Lambda algorithm, and then solving the correction numbers (dx, dy, dz) so as to determine the coordinates (X, Y, Z) of the carrier under the GNSS coordinate system:
Figure BDA00035073701500000611
2. based on acceleration and angular velocityAnd solving the pose of the degree information. First, an IMU coordinate system B and a world coordinate system W are set. Acceleration acquired by GNSS module and IMU module
Figure BDA00035073701500000612
And angular velocity
Figure BDA00035073701500000613
Figure BDA00035073701500000614
Figure BDA00035073701500000615
Wherein: a is aB、ωBTrue values of acceleration and angular velocity, respectively; ba、bgDrift values for acceleration and angular velocity, respectively; etaa、ηgTo measure noise; gWIs a gravity value; q. q.sWBRepresenting a rotational quaternion from the IMU coordinate system B to the world coordinate system W.
The acceleration and angular velocity data are integrated, and the corresponding pose can be calculated:
Figure BDA0003507370150000071
Figure BDA0003507370150000072
Figure BDA0003507370150000073
wherein:
Figure BDA0003507370150000074
respectively represent B under the world coordinate systemkPosition of time of daySpeed and rotational quaternion; Δ t is time Bk+1And BkThe time difference between them;
Figure BDA0003507370150000075
represents the time BtRelative to the BkThe attitude at the moment.
3. And (5) solving the GNSS/IMU loose combined pose. The loose combination is a position, velocity based GNSS/IMU combination. When the earth coordinate system is used as a navigation coordinate system, the integrated navigation system has 15 state parameters, and the attitude, the speed and the position, the gyro drift and the zero offset of the accelerometer are respectively used as state variables XGI:
Figure BDA0003507370150000076
Wherein: p, V is a position and velocity matrix; theta is a roll angle, a pitch angle and a yaw angle;
Figure BDA0003507370150000077
respectively, the accelerometer zero offset and the gyro drift along the direction of carrier motion.
The system state and measurement equations are established as follows:
Figure BDA0003507370150000078
ZGI(t)=HGI(t)XGI(t)+wGI(t) (11)
wherein: a. theGI(t) is a state equation coefficient matrix when GNSS/IMU is loosely combined; wGI(t) state equation system noise when GNSS/IMU is loosely combined; zGI(t) is an external measurement value; hGI(t) is a measurement equation coefficient matrix when GNSS/IMU loose combination; w is aGI(t) is the measurement noise in the GNSS/IMU loose combination.
And finally, solving by utilizing Kalman filtering to obtain the pose of the carrier.
(II) pose estimation based on binocular camera and IMU
The binocular camera adopts a binocular camera of ZED2i model.
1. And (5) feature extraction. Firstly, constructing an image pyramid, zooming the image for 7 times according to a proportion coefficient k to obtain eight layers of image pyramids in total, wherein the higher the pyramid level is, the lower the image resolution is. The resolution of each layer of image of the image pyramid is sigmai=ki-1σ0Where i denotes the number of layers, σ0Is the image initial resolution.
And 2, ORB feature point extraction. The ORB feature points comprise FAST corner points and BRIEF descriptors, but the FAST corner points do not have directionality, and the ORB calculates the centroid of a circular domain with 16 pixels of radius around FAST, and takes the connecting line of the feature points and the centroid as the direction of the key point, so that the key point has rotation and no deformation. The FAST corner extraction process is as follows:
let a certain pixel point be p and the brightness be IpSetting a threshold I according to 10% -20% of the brightnesst. Taking p as the center and p as the center, detecting the brightness of 16 pixel points on the circumference with the peripheral radius of 3, if the brightness of 9 continuous points is larger than Ip+ItOr less than Ip-ItThen, the pixel point p is considered as the FAST corner. Meanwhile, FAST does not have directionality, in order to detect FAST corners in images in different directions, the corners are improved in an ORB algorithm, a gray-scale centroid method is used for adding directional properties to the FAST corners, and the offset between the gray scale and the centroid of the corners is used as the direction of the corners. After determining the position of the FAST corner, ORB describes each key point using the modified BRIEF binary descriptor, and stores the comparison result as a vector with only elements 0 and 1 by comparing the gray values of the key points and its neighboring pixels. After obtaining the FAST corner and the BRIEF descriptor, the ORB features are obtained.
Pose optimization problem in VISLAM
The SLAM + BA optimization positioning flow chart is shown in figure 5, a camera assumed in the VISLAM algorithm is a pinhole model, and the camera pose T needing to be calculated and optimizedi=[Ri,pi]E SE (3), camera operation in world coordinate systemVelocity viBiasing of accelerometers and gyroscopes
Figure BDA0003507370150000081
And
Figure BDA0003507370150000082
assuming that they evolve by random motion, they together constitute a state vector:
Figure BDA0003507370150000083
for VISLAM, IMU pre-integration between two frames is required to obtain angle change, velocity change and position change, and an information matrix about all measurement vectors defines the inertial residual:
Figure BDA0003507370150000084
wherein, Δ Ri,i+1,Δvi,i+1,Δpi,i+1Respectively rotation, velocity and position measurements, pre-integrated by the IMU. Then define the reprojection error:
Figure BDA0003507370150000085
wherein pi represents a projection equation of the camera,
Figure BDA0003507370150000086
represents the conversion operation of the lie group SE (3) in the three-dimensional space. Visual inertia optimization objective function, i.e. minimizing residual, while also using robust Huber kernel ρHubA function to reduce the effect of spurious matches. And finally, optimizing the function to obtain the best pose of the camera.
Figure BDA0003507370150000087
(III) obtaining semantic information and issuing prompt information
Firstly, binocular image data are input into a trained MaskR-CNN network, and semantic label information of an object can be output. And calculating the parallax of the matching point according to binocular matching in VISLAM so as to acquire the depth of the object. Mask R-CNN is an example segmentation algorithm, and is mainly used for carrying out segmentation on the basis of target detection and predicting object types, candidate boxes and masks. The Mask R-CNN outputs class labels and frame offsets for each candidate target through a Faster R-CNN network, and a third branch of an output target Mask is added on the basis. However, the additional mask output is different from the output of the candidate box, which is then a pixel-level accurate classification. The Mask R-CNN Network has two stages, wherein a Region Proposal Network (RPN) in the first stage extracts a target candidate box. In the second stage, the image types in the frames and the offset of the candidate frame are synchronously extracted and a binary mask is output for each region of interest (roi, Rol) through parallel processing. During model training, the MaskR-CNN defines an overall loss function for each Rol's loss function from three dimensions:
L=Lcls+Lbox+Lmask (15)
wherein L isclsIs a loss function of the classifier, LboxAs a loss function of the candidate box, LmaskIs a loss function of the output mask. The mask branch, one of the three branches of the algorithm, will output one Km for each Rol2Data of the dimension: for a total of K classes, a K-bit binary mask with a resolution of m × m is generated. At the Sigmoid function level, L is addedmaskThe calculation is defined as the average binary cross entropy loss.
The MaskR-CNN network structure comprises two parts, wherein one part is a ResNets backbone network and used for extracting features, and the other part is a head network and used for classifying each Rol, extracting candidate boxes and predicting masks. A layer 4 network using ResNet. Rol is taken as first obtaining the Rol features of 7x7x1024, then upscaling them to 2048 channels. There are then two branches, the upper one responsible for classification and regression, and the lower one responsible for generating the corresponding mask. Since convolution and pooling are performed for multiple times in the network, the corresponding resolution is reduced, the mask branches start to improve the resolution by deconvolution, the number of channels is reduced at the same time, the channels are changed into 14x14x256, and finally a mask template of 14x14x80 is output. FIG. 8 is a schematic structural diagram of MaskR-CNN.
The speech release adopts SYN6288 Chinese speech synthesis chip, SYN6288 receives the text data to be synthesized through asynchronous serial port (UART) communication mode to realize the conversion from text to speech (or TTS speech). The command frame data is defined as 16-system, and the format of the command frame is as follows:
frame header (1 byte) + data zone length (2 bytes) + command word (1 byte) + command parameter (1 byte) + text to be sent (less than 200 bytes) + XOR check (1 byte)
The character string information to be transmitted is issued byte by setting the command word as a composite play command (0x01) in the processor and the encoding format as GBK (0x01) in the command parameter. The output speech text structure is:
when the identification object is a pedestrian/a vehicle/a table, a chair/a tree, etc.: right/left side + pedestrian/vehicle/table chair/tree + "less than five meters" + Current distance + "attention avoidance"
When the identification object is a traffic light: the "traffic light state ahead" + red light/green light + "no pass"/"can pass" where the "content is fixed text. When the identification object is a pedestrian/a vehicle/a table, a chair/a tree, the distance is considered to be safe more than five meters, and voice prompt is not carried out.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims (3)

1. The wearable blind-person aided navigation device based on semantic VISLAM and GNSS positioning comprises a GNSS antenna, a GNSS/IMU combined positioning module, a color binocular camera, an earphone, a vibration module, a processor, a 4G communication module and a power module; the method is characterized in that: the binocular camera is responsible for positioning and target detection, and after left and right eye matching, the parallax of the image is calculated to obtain depth data; the image is input into a trained convolutional neural network model, the category of a target in the image is output, the vibration module is composed of a direct current motor module and a motor driving module thereof, and the rotating end of the motor is connected with a quarter circle; the processor can detect the distance of an object to be detected and output a control quantity which is negatively related to the distance so as to control the motor to rotate, the closer the distance is, the greater the vibration is, the communication speed of the 4G communication module is LTECat 1 and can reach 10Mbps, and the functions of telephone calling and short message sending are realized; the position coordinate information of the blind person is sent to the reserved mobile phone number of the family once every hour in the device.
2. The method for navigating the wearable blind-aided navigation device based on semantic VISLAM and GNSS positioning according to claim 1, comprising the following steps:
s1, a GNSS and IMU combined positioning module collects GNSS observation data, acceleration and angular velocity information and transmits the GNSS observation data, the acceleration and the angular velocity information to a processor, and displacement and attitude change of a carrier are calculated;
the step S1 includes the following steps:
s11, performing GNSS relative positioning;
s12, pose solving based on acceleration and angular velocity information;
s13, solving the combined pose of the GNSS and the IMU;
s2, transmitting the color image information acquired by the binocular camera and the camera position and posture information read by the IMU module to a processor, and resolving and optimizing the position and posture information of the carrier;
the step S2 includes the following steps:
s11, binocular image feature extraction;
the step S21 includes the following steps: establishing an image pyramid, extracting binocular image feature points, and matching the binocular image feature points;
s12, solving an attitude transformation matrix;
the step S22 includes the following steps: IMU pre-integration, estimating an initial pose and solving an attitude transformation matrix;
s3, inputting color image information acquired by a binocular camera into a trained MaskR-CNN semantic segmentation network in a processor, outputting a two-dimensional semantic label, and classifying each pixel of the binocular image according to the semantic label;
the step S3 includes the following steps:
s31, loading the weight of the training model;
s32, outputting the classified images by model reasoning;
s33, carrying out depth estimation on the classified objects;
and S4, according to the classification result, different processing is carried out on different objects, and processing information is sent to the voice synthesis module and the vibration module.
The step S4 includes the following steps:
s41, judging whether the indoor space or the outdoor space is available;
s42, indoor: judging and outputting prompt information such as the position, the distance and the like of an indoor article to a voice synthesis module;
s43, outdoor: judging and outputting warning and prompting information of outdoor articles to a voice synthesis module;
and S44, controlling the vibration frequency of the vibration motor according to the distance information.
3. The method of claim 2, wherein the outdoor items comprise traffic light color, vehicle distance and road position.
CN202210143183.7A 2022-02-16 2022-02-16 Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning Pending CN114488244A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210143183.7A CN114488244A (en) 2022-02-16 2022-02-16 Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210143183.7A CN114488244A (en) 2022-02-16 2022-02-16 Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning

Publications (1)

Publication Number Publication Date
CN114488244A true CN114488244A (en) 2022-05-13

Family

ID=81482557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210143183.7A Pending CN114488244A (en) 2022-02-16 2022-02-16 Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning

Country Status (1)

Country Link
CN (1) CN114488244A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023245615A1 (en) * 2022-06-24 2023-12-28 中国科学院深圳先进技术研究院 Blind guiding method and apparatus, and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023245615A1 (en) * 2022-06-24 2023-12-28 中国科学院深圳先进技术研究院 Blind guiding method and apparatus, and readable storage medium

Similar Documents

Publication Publication Date Title
US11900536B2 (en) Visual-inertial positional awareness for autonomous and non-autonomous tracking
JP6763448B2 (en) Visually enhanced navigation
US10366508B1 (en) Visual-inertial positional awareness for autonomous and non-autonomous device
US10410328B1 (en) Visual-inertial positional awareness for autonomous and non-autonomous device
CN107505644B (en) Three-dimensional high-precision map generation system and method based on vehicle-mounted multi-sensor fusion
Vu et al. Real-time computer vision/DGPS-aided inertial navigation system for lane-level vehicle navigation
JP2022019642A (en) Positioning method and device based upon multi-sensor combination
CN108406731A (en) A kind of positioning device, method and robot based on deep vision
CN109405824A (en) A kind of multi-source perceptual positioning system suitable for intelligent network connection automobile
CN113916242B (en) Lane positioning method and device, storage medium and electronic equipment
JP2019527832A (en) System and method for accurate localization and mapping
CN103983263A (en) Inertia/visual integrated navigation method adopting iterated extended Kalman filter and neural network
Hu et al. Real-time data fusion on tracking camera pose for direct visual guidance
CN208323361U (en) A kind of positioning device and robot based on deep vision
CN114359744A (en) Depth estimation method based on fusion of laser radar and event camera
CN108613675B (en) Low-cost unmanned aerial vehicle movement measurement method and system
CN113140132A (en) Pedestrian anti-collision early warning system and method based on 5G V2X mobile intelligent terminal
CN114488244A (en) Wearable blind-person aided navigation device and method based on semantic VISLAM and GNSS positioning
CN112033422B (en) SLAM method for multi-sensor fusion
Jayasuriya et al. Leveraging deep learning based object detection for localising autonomous personal mobility devices in sparse maps
Hu et al. Fusion of vision, 3D gyro and GPS for camera dynamic registration
Wei Multi-sources fusion based vehicle localization in urban environments under a loosely coupled probabilistic framework
Tang et al. Environmental perception for intelligent vehicles
CN115574816B (en) Bionic vision multi-source information intelligent perception unmanned platform
Huang Wheel Odometry Aided Visual-Inertial Odometry in Winter Urban Environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination