CN114159281A

CN114159281A - Device and method capable of realizing stair detection

Info

Publication number: CN114159281A
Application number: CN202210007685.7A
Authority: CN
Inventors: 高瑞; 贺春秀; 王玉清; 刘丽丽
Original assignee: Yanan University
Current assignee: Yanan University
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-03-11

Abstract

The invention relates to the technical field of computer vision, and discloses a device and a method capable of realizing stair detection. The RGB-D camera is used for acquiring a three-dimensional environment image, inputting a real-time image into a trained deep learning model fast-rcnn-initiation-v 2-coco model, performing real-time detection, and judging whether the user goes upstairs or downstairs; the ultrasonic sensor is used for measuring the distance from the ground, presetting a threshold value, and judging whether the user goes upstairs or downstairs by comparing the measured height with the threshold value; the system realizes stair detection and identification by adopting a method of fusing the two, and the vibration sensor is used for identifying the stair category and giving a user warning. The blind guiding walking stick for stair detection by the hybrid method provided by the invention combines the image recognition of the deep neural network and the recognition function of the ultrasonic sensor, and provides effective assistance for visually impaired people.

Description

Device and method capable of realizing stair detection

Technical Field

The invention relates to the technical field of computer vision, in particular to a stair detection method based on machine vision.

Background

With the rapid development of world science and technology, new systems are being developed every day to make daily life more comfortable. Those with physical disabilities, however, require more assistance than the average. Technological advances provide solutions that allow them to live in society like other normal people. Nowadays, the development of visual impairment aids has become a prominent research area. Over the past decade, much work has been done in this area to help visually impaired people in different directions. To solve the travel problem of visually impaired people, we developed a stair detection framework using a hybrid approach, in which stair detection is accomplished by a pre-trained model and ultrasonic sensors.

The existing stair detection method has three-dimensional reconstruction based on binocular vision, and although the three-dimensional reconstruction method has high detection accuracy, the method has high requirements on equipment and relatively high cost; the bionic eye can help the visually impaired people to recover partial vision, but the bionic eye can only help the visually impaired people to see the low-resolution gray image, is difficult to distinguish stairs from other scenes, and is only suitable for blind people blindness caused by retinitis pigmentosa; the traditional interaction mode for the visually impaired people is mainly voice prompt and touch vibration. Semantic prompts generally broadcast short information, require a certain time to broadcast, cause delay and accident risk, and have a small amount of transmittable information. The touch vibration is hardware through vibrating a waistband or a vibrating vest to some azimuth information is prompted through vibration, the delay problem can be solved, burden is brought to the visually impaired, and wearing feeling of different people is different.

In view of the above-mentioned shortcomings in the prior art, the present invention provides a device and method for detecting stairs, which mainly combines image detection and ultrasonic sensor to detect and identify stairs.

The difficulty of solving the above-mentioned prior art problem: the three-dimensional reconstruction method has higher requirements on equipment and relatively higher cost; the bionic eye can only help the visually impaired people to see the low-resolution gray level image, and the stairs and other scenes are difficult to distinguish; semantic prompts generally broadcast short information, require a certain time to broadcast, cause delay and accident risk, and have a small amount of transmittable information. The touch vibration brings burden to the visually impaired people, and different people have different wearing feelings.

The significance of solving the above-mentioned prior art problems: the machine vision technology and the ultrasonic sensor are combined to realize the positioning of the stairs, so that the detection is more accurate; the detection and identification of going upstairs or downstairs are realized; the vibration sensor is arranged at the handle part of the walking stick, and reminds a user by the vibration of the walking stick body, so that the conventional auxiliary hardware equipment for vibrating a waistband or a vibration vest is avoided; the method has the advantages of reliability, strong real-time performance, low cost, high precision and the like.

Disclosure of Invention

In order to solve the problems, the invention provides a device and a method capable of realizing stair detection.

The invention is realized in this way, a device for detecting stairs, which comprises: comprises a blind guiding stick body, an ultrasonic sensor, an R-GBD camera, a processor, a vibration sensor and a buzzer, wherein the ultrasonic sensor, the R-GBD camera, the processor, the vibration sensor and the buzzer are connected with the blind guiding stick body;

the RGB-D camera is used for collecting three-dimensional environment images, the ultrasonic sensor is used for measuring the distance from the ground, a threshold value is set, and whether the user ascends or descends stairs is judged by comparing the measured distance with the threshold value;

the ultrasonic sensor is an HC-SR04 module, is arranged on the stick body and is arranged at a fixed position of the stick;

the camera is placed at the top of the walking stick, so that scene information can be conveniently acquired;

the processor adopts a raspberry pi main board;

the vibration sensor is arranged in the handle of the walking stick.

Furthermore, the device further comprises a blind guiding stick body, and an R-GBD camera, a processor, a vibration sensor and a buzzer which are connected with the blind guiding stick body;

the processor adopts a raspberry pi main board;

the vibration sensor is arranged in the handle of the walking stick.

Furthermore, the walking stick combines the image recognition of the deep neural network and the recognition function of the ultrasonic sensor, so that effective auxiliary information is provided for visually impaired people, and the condition of missed detection is reduced; and (5) carrying out boundary labeling by adopting a frame regression method, and marking out the confidence coefficient of the recognition.

Further, the walking stick combines the image recognition function of the depth neural network and the recognition function of the ultrasonic sensor, and provides effective assistance for visually impaired people.

Further, the deep neural network uses a deep convolutional neural network, the neural network is trained by adopting a data set with labels, and an initiation network with deeper layers and stronger expression capability replaces a VGG-16 network used for extracting image features in FasterRCNN to obtain a convolutional feature map; inputting the convolution characteristic graph into a regional network RPN; and meanwhile, a redundant anchor frame is optimized on the interested area layer by utilizing a soft-nms algorithm, the position information of the object to be detected is adjusted, and the condition of missing detection is reduced. And then, carrying out boundary labeling by adopting a frame regression method, and marking out the confidence coefficient of the recognition.

Another object of the present invention is to provide a method for enabling detection of stairs, which uses the device for enabling detection of stairs, and the method for enabling detection of stairs comprises the following steps:

(1) collecting images of stairs (including upstairs and downstairs) with different sizes and categories to form a data set;

(2) labeling the data set by adopting a labeling tool, wherein the data set is labeled as upstair and downstair;

(3) dividing the data set marked in the step 2 into a training set and a testing set;

(4) carrying out normalization processing on the data set;

(5) training the training set by adopting a fast-rcnn-initiation-v 2-coco-model to obtain an identification model, wherein in the training process, the initial learning rate is set to be 0.0002, and the maximum iteration number is set to be 200000;

(6) and inputting the test set into the recognition model, outputting the stair category in the test image, and the accuracy of the prediction result.

(7) The accuracy of the prediction result is to position the identified stairs, then to label the boundary by frame regression method, and to mark the confidence of the identification;

(8) the ultrasonic sensor is used for assisting detection and identification, the ground distance is measured, the fixed measurement value is set as a threshold value, the measurement data rule sets that 1000 data are taken per second, the average value is taken, then the average value is compared with the threshold value, if the measurement average value is lower than the threshold value, the user is determined to go upstairs, and if the measurement average value is higher than the threshold value, the user is determined to go downstairs.

Another object of the present invention is to provide a storage medium for storing a program for implementing the above method.

Another object of the present invention is to provide a data processing terminal, which executes the above-mentioned detection method.

The invention has the advantages and positive effects that:

(1) the machine vision technology and the ultrasonic sensor are combined to realize the positioning of the stairs, so that the detection is more accurate;

(2) the detection and identification of going upstairs or downstairs are realized;

(3) the vibration sensor is arranged at the handle part of the walking stick, and reminds a user by the vibration of the walking stick body, so that the conventional auxiliary hardware equipment for vibrating a waistband or a vibration vest is avoided;

(4) the method has the advantages of reliability, strong real-time performance, low cost, high precision and the like.

Drawings

FIG. 1 is a schematic structural view of the present invention;

FIG. 2 is a block diagram of the hardware module connections of the present invention;

FIG. 3 is a flow chart of the fast-RCNN-Iception-V2-COCO model training of the present invention;

FIG. 4 is a flow chart of fasterRCNN according to an embodiment of the present invention;

FIG. 5 is a graph of loss functions during a model training phase according to an embodiment of the present invention;

FIG. 6 is a graph of test results of an example of the present invention.

In fig. 1: 1. a cane body; 2. a handle; 3. an RGB-D camera; 4. a vibration sensor; 5. a wire slot 1; 6. an ultrasonic sensor; 7. a raspberry pi processor; 8. a wire slot 2; 9. a switch; 10. and a reminding module.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The application of the principles of the present invention will be further described with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, the device comprises a walking stick body 1 and a handle 2, wherein an RGB-D camera 3 is arranged in the handle 2, a vibration inductor 4 is connected in the handle 2, and the walking stick body 1 is provided with a

transparent wire slot

5 or 8, an ultrasonic sensor 6 and a raspberry sending processor 7; the ultrasonic sensor 6, the RGB-D camera 3 and the vibration sensor 4 are connected with the raspberry pi sensor 7 through the wire slot 5, and the detection result is transmitted to the vibration sensor 4 through the raspberry pi processor 7 and then transmitted to a user.

As shown in fig. 2, in an actual environment, images around a user are collected in real time through an RGB-D camera 3, and a trained model is used for detection and stair recognition; and then data fusion is carried out by combining the measured value of the ultrasonic sensor 6 through the raspberry pi processor 7, and the fused processing result is sent to the user through the reminding module 10.

As shown in fig. 3, the data set processing procedure:

data set

Collecting 510 stairs with different sizes and categories, 300 stairs from different buildings in a real environment, and 210 stairs from a website, wherein training sets and test sets are divided according to a ratio of 7: 3;

the data is normalized, and the resolution is uniformly adjusted to 720x960 pixels.

As shown in fig. 4, the method is run by fast-rcnn and comprises the following specific steps:

(1) inputting the image into a convolutional neural network, and extracting features to obtain a feature map, wherein the convolutional neural network adopts an inceptionV2 network architecture;

the inclusion v2 is a series of improvements made by Google team on the basis of inclusion v1 in month 2 of 2015, and the network is composed of 5 convolution layers of 3 × 3 convolution kernels and 2 pooling layers of 3 × 3 and 8 × 8, and finally, linear layers and classification layers are added to further process the features. The inclusion v2 introduced Batch-Normalization, which replaced the original 5 × 5 convolutional layer with a convolution of two layers 3 × 3, and has better feature extraction function and fewer parameters under the condition of unchanged sensing field of view. In the training stage, a deep convolutional neural network initiation is used for processing a part of the data set serving as a training set, and the length is converted into a feature vector with fixed degree of normality. And (3) convolution calculation process:

for input stair image data 1(w, h) with the size of w × h, the convolution kernel is K (m, n), the convolution result is S (w, h), and the corresponding convolution formula is

Wherein, w image length, h image width, m convolution kernel length, n convolution kernel width

Inputting the convolution characteristic graph into a regional network RPN, screening candidate frames by using a softening non-maximum suppression algorithm and a sampling technology, and selecting candidate frame characteristic information anchor with higher confidence coefficient; meanwhile, the feature map is transmitted to a region of interest pooling (ROIpooling) layer to be subjected to pooling operation, and a candidate region feature map with a fixed size is generated;

generating 9 Anchoroboxes for each pixel on the feature map 45 x 60 through convolution feature extraction, and filtering and marking the generated Anchoroboxes; the filtering and labeling rules are as follows:

pixel-by-pixel pair Anchors classification labels for feature information

Removing anchors beyond 720x960 original border

If the IoU value of anchorbox and grountruth is maximum, the sample is marked as positive, and label is 1

If IoU of achorbox and grountruth is more than 0.7, the label is positive sample, and label is 1

If IoU of achorbox and grountruth is less than 0.3, the label is negative, and label is 0

The rest are neither negative nor positive samples and are not used for final training, label-1

The IOU defines the overlapping degree of two bounding boxes. IoU is the ratio of the overlapping area of rectangular box A, B to the area of the union of rectangular boxes A, B;

pixel-by-pixel regression correction

In addition to tagging, the offset between the anchor and the grountruth is calculated

Set the center store position coordinate (x) of the grountruth calibration frame^*，y^*) And width and height w^*，h^*

Coordinate of center point of Anchorbox is x_a，y_aWidth and height are w_a，h_a

Offset calculation

Δx＝(x^*-x_a)/w_a

Δy＝(y^*-y_a)/h_a

Δw＝log(w^*/w_a)

Δh＝log(h^*/h_a)

And learning is carried out through the difference between the two frames, and the cross border elimination and the inhibition of a softening non-maximum value inhibition algorithm are further carried out on the prediction frame, so that the overlapped frame is eliminated.

Performing combined training on classification probability and frame regression by passing the feature map of the generated candidate region through a full connection layer and a linear regression layer to realize classification identification of going upstairs and downstairs and a stair target detection frame;

frame regression method

The input original window P is mapped to obtain a regression window which is closer to the real window G through translation and scale transformation

Given (P)_x，P_y，P_w，P_h) Find a mapping f such that

Softening non-maximum suppression algorithm (soft-NMS)

In target detection, the classifier will calculate a class score for each bounding box (bbox), that is, the probability that this bbox belongs to each class, and the NMS performs the main process according to these values:

for each class, first set a decay function for all scores < threshold's bbox score, giving a low score;

sorting all bboxes according to scores, selecting the highest score and the corresponding bboxes, traversing the rest bboxes, deleting the bboxes if the overlapping area (IoU) of the selected bboxes and the current highest score bb is larger than a certain threshold value, continuously selecting one highest-score bb from the unprocessed bboxes, and repeating the process; until all remaining bbox is found;

the final prediction was drawn from all classscore and classsolor that retained bbox.

As shown in fig. 5, the loss function used in training is divided into two parts, i.e., classification loss (clslos) and regression loss (bboxregressingloss). During model training, loss occurs at each training step, usually the loss starts from high and decreases as training progresses. The design loss continues to drop below 0.05, taking approximately 600 steps, and training is stopped.

The loss function equation is as follows:

wherein L is_clsIs the classification loss and L_regIs the regression loss, i is the index of the chosen anchor, and p is the predicted probability of i.

L_clsLogarithmic loss, N, is calculated for each anchor_clsIs the total number of anchors, here 256, p_iTo predict the target probability for the anchor,

is the logarithmic loss of two classes (stairs and non-stairs), t_iIs a vector, which is the predicted offset in the training phase of the RPN;

is and t_iThe same dimensional vector, representing the anchor, the actual offset of the RPN training phase,

λ is 1, so that the classification and regression weights are the same.

Both loss terms are normalized by Ncls and Nreg, which are the number of samples classified and regressed, respectively.

The working principle of the invention is as follows: open the switch, start the camera, gather the image in real time, the system detects whether there is the stair and whether go up or down the stair through the identification model earlier, then opens ultrasonic sensor 6, carries out the secondary through the distance difference and judges, compares twice result and realizes accurate detection, gives raspberry group treater 7 after fusing with the two measuring result, and treater 7 sends vibration signal to the user. If the walking stick goes upstairs, the walking stick vibration sensor 4 vibrates at intervals; if the walking stick goes down the stairs, the walking stick vibration sensor 4 accelerates continuous vibration; if no staircase is detected, the vibration sensor 4 does not react.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A device capable of realizing stair detection is characterized by comprising:

the RGB-D camera is used for acquiring a three-dimensional environment image, inputting a real-time image into a trained deep learning model fast-rcnn-initiation-v 2-coco model, performing real-time detection, and judging whether the user goes upstairs or downstairs;

the ultrasonic sensor is used for measuring the distance from the ground, presetting a threshold value, and judging whether the user goes upstairs or downstairs by comparing the measured height with the threshold value; the ultrasonic sensor is arranged on the stick body and is arranged at a fixed position of the stick;

the vibration sensor is used for identifying the stair type and giving a user warning.

2. The device for detecting stairs of claim 1, further comprising a blind guiding stick body and an R-GBD camera, a processor and a vibration sensor and a buzzer connected thereto;

the processor adopts a raspberry pi main board;

the vibration sensor is arranged in the handle of the walking stick.

3. The device for detecting the stairs according to claim 1, wherein the walking stick combines the image recognition of the deep neural network and the recognition function of the ultrasonic sensor, so as to provide effective auxiliary information for visually impaired people and reduce the condition of missing detection; and then, carrying out boundary labeling by adopting a frame regression method, and marking out the confidence coefficient of the recognition.

4. A method for detecting stairs by using the device for detecting stairs of claim 1, wherein the method for detecting stairs comprises the following steps:

(1) collecting images of upstairs and downstairs stairs with different sizes and types to form a data set;

(3) dividing the data set labeled in the step (2) into a training set and a testing set;

(4) carrying out normalization processing on the data set;

(6) inputting the test set into an identification model, outputting the stair category in the test image and the accuracy of the prediction result;

5. The method for detecting the staircase according to claim 2, wherein the deep neural network uses a deep convolutional neural network, and the neural network is trained by using a labeled data set, and an initiation network with a deeper layer and a stronger expression ability replaces a VGG-16 network for extracting image features in fastern to obtain a convolutional feature map; inputting the convolution characteristic graph into a regional network RPN; and meanwhile, a redundant anchor frame is optimized on the interested area layer by utilizing a soft-nms algorithm, the position information of the object to be detected is adjusted, and the condition of missing detection is reduced. And then, carrying out boundary labeling by adopting a frame regression method, and marking out the confidence coefficient of the recognition.

6. A storage medium for storing a program for implementing the method of claim 4.

7. A data processing terminal, which executes the method of claim 4.