CN109902592B

CN109902592B - Blind person auxiliary walking method based on deep learning

Info

Publication number: CN109902592B
Application number: CN201910094124.3A
Authority: CN
Inventors: 周泓; 杨利娟
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-01-30
Filing date: 2019-01-30
Publication date: 2021-03-30
Anticipated expiration: 2039-01-30
Also published as: CN109902592A

Abstract

The invention discloses a blind person auxiliary walking method based on deep learning. Firstly, a camera is started to carry out real-time data acquisition. When the vehicle travels outdoors, dangerous objects and traffic lights in the environment are detected through a deep convolutional neural network, if the vehicle encounters the dangerous objects, Kalman filtering tracking is carried out, the motion states of the objects within a period of time are compared and calculated, the motion trend of the objects is analyzed, and danger reminding is carried out; and if the signal lamp is encountered, identifying the state of the signal lamp. When the vehicle travels indoors, the markers in the environment are detected in real time, the relevant areas of the markers are extracted, and key information such as characters is extracted. The method adopts deep learning to detect the object, has the characteristics of high robustness, high accuracy, high speed and the like, and has strong practicability.

Description

Blind person auxiliary walking method based on deep learning

Technical Field

The invention relates to the field of video detection and analysis technology and blind person auxiliary walking, in particular to a blind person auxiliary walking method based on deep learning.

Background

China is the country with the largest number of blind people, along with social development and technical progress, all the people pay more attention to the life of the blind people, and the travel activity is an extremely serious challenge in the life of the blind people. General travel activities can be divided into outdoor walking and indoor walking. In outdoor walking, because most cities lack corresponding infrastructure and have no traffic system for the blind to go out, the blind is severely limited to go out; meanwhile, due to the urbanization process, roads are criss-cross, the traffic condition is very complex, and if the blind people go out without accompanying with normal people, the blind people are in a very dangerous road situation. When the blind person walks indoors, because blind guiding facilities in most indoor places are not available basically, for example, in buildings such as teaching buildings and shopping malls for the blind person, due to the complex internal structure, the target position is difficult to find if the blind person is not guided by normal people.

At present, researchers put forward a series of auxiliary walking methods to help the blind to walk better. However, most of the current auxiliary methods only focus on the area to walk, such as only indoors or outdoors, and the requirement of the user for accurate blind guiding in the whole process is difficult to meet. Most methods only have a static obstacle reminding function, and have no real-time alarm function for dynamic dangerous objects such as oncoming vehicles, so that the life safety of users is difficult to guarantee.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a blind person auxiliary walking method based on deep learning.

The invention is realized by the following technical scheme: a blind person auxiliary walking method based on deep learning comprises the following steps:

(1) when the blind walks, pictures are collected in real time, and the blind enters an outdoor blind guiding mode or an indoor blind guiding mode according to the current situation;

(2) the indoor blind guiding mode in the step 1 comprises marker recognition and text recognition; the marker recognition is used for training a deep convolutional neural network through a data enhancement method, and then the trained network is used for detecting the indoor marker;

(3) the text recognition in the step 2 is to perform text analysis after identifying the key mark, identify the character information on the mark and help the blind user to position;

(4) the outdoor blind guiding mode in the step 1 comprises dangerous object detection and traffic signal identification; the method comprises the steps of performing object detection through deep learning, performing Kalman filtering tracking if the object is detected as a dangerous target, judging the motion trend of the object by analyzing the tracking result of the object within a period of time, and performing alarm prompt if the object is continuously close to the dangerous target;

(5) and 4, identifying the traffic signal, namely if the traffic signal lamp is detected, converting the area image from the RGB color space to the HSV color space, counting the number of pixels of the hue H in red, green and yellow by using a statistical histogram, and calculating the ratio of the pixels so as to judge the color information of the traffic signal lamp.

Further, in the step (2), the marker identification is realized as follows:

(1) data preparation and preprocessing

Firstly, constructing an original marker data set, analyzing the problems of pictures shot by the blind when the blind moves, establishing a degradation model, and carrying out corresponding processing on the original data set through the degradation model to obtain a training set after data enhancement;

(2) network model training

Training an object detection network based on a deep convolutional neural network by using the data enhanced data set to obtain a network model for actual detection;

(3) marker identification

And detecting the indoor marker by using the trained network model, and distinguishing the marker from the background according to a preset threshold value to obtain a final result.

Further, in step 2, the data enhancement method is as follows:

(1) motion blur: the normal image is f (x, y), the blurred image is g (x, y), and since the motion blur is caused by the images overlapping each other on the imaging plane, the motion blur is caused by the images overlapping each other on the imaging plane

(2) And (3) radioactive transformation: as shown in formula 1.1, the pixel coordinates of the original image are (v, w), and the original image is subjected to space coordinate transformation through an affine matrix T to obtain coordinates (x, y);

(3) noise: when the blind people take a picture, a Gaussian noise formula 1.2 and a salt and pepper noise formula 1.3 are introduced frequently.

Further, in step 3, the implementation manner of the character recognition is as follows:

the character recognition is carried out by a convolution circulation network, the network consists of a convolution layer, a circulation layer and a transcription layer, an image to be detected is transmitted into the network, the convolution layer in the network automatically extracts a characteristic sequence from an input image, then the circulation network predicts each frame of the characteristic sequence output by the convolution layer, and finally the transcription layer converts each frame of prediction of the circulation layer into a label sequence to obtain a final recognition result.

Further, in step 4, the kalman filtering tracking is implemented as follows:

kalman filtering mainly adopts a state space model of signals and noise, and continuously predicts and corrects by calculating a state and a measurement equation by using a last estimated value and a current measured value, so as to gradually reduce errors and obtain state parameters closest to an ideal condition;

now, assume that the state and measurement equations of a discrete dynamic system are:

the state equation is as follows: x is the number of_k＝Ax_k-1+w_k-1 (1.4)

The measurement equation: y is_k＝Cx_k+v_k (1.5)

Wherein x is_kState vector of n x 1 dimension at k time of discrete system; a is a discrete system n multiplied by n dimensional state transition matrix; w is a_k-1Random noise in dimension n x 1 at k-1 of the discrete system; c is an m multiplied by n dimensional observation matrix; y is_kIs an observation vector of m × 1 dimension at the k-th time; v. of_kIs the m x 1 dimensional random noise present as measured by the discrete system at kth; ideally, a random noise vector w is assumed_kAnd v_kThe white Gaussian noise vectors are zero-mean white Gaussian noise vectors and are independent of each other, and covariance matrixes of the white Gaussian noise vectors are respectively shown as formulas 1.6-1.8:

further, in step 4, the kalman filtering tracking is implemented as follows:

(1) best estimate of state at time k-1 for known discrete systems

Obtaining the prior estimated value at the k time through the matrix A

This estimate is the state prediction result at the kth time, and is given by the equation 1.9:

(2) the a priori estimation error is obtained by equation 1.10

And obtaining a covariance matrix according to the formula 1.11

(3) Using measured observation data y_kCorrecting a priori estimates

The calculation formula is as shown in formula 1.12:

wherein,

called a posteriori estimate, is a priori estimate

The correction value of (1); k_kIs a Kalman gain matrix;

(4) similarly, a posteriori estimation error e can be obtained from equation 1.13^kObtaining the corresponding covariance matrix P by the formula 1.14_k：

P_k＝E[e^k(e^k)^T] (1.14)

(5) The key point of design of the Kalman filter is to determine a system gain matrix K_kSuch that the covariance P of the a posteriori estimation error_kAnd minimum. Gain matrix K proved to be available, minimizing the covariance of the a posteriori estimates_kAnd the covariance matrix P of the corresponding sum a posteriori estimation errors_kAs can be determined by equations 1.15 and 1.16, respectively:

wherein the covariance matrix of the a priori estimation error

The calculation formula of (a) is as follows 1.17:

further, in step 4, the conversion formula for converting from the RGB color space to the HSI color space is as follows:

wherein H is a chromatic value, S is a saturation value, and I is a brightness value; in the formula, r, g and b are respectively red, green and blue components used for describing colors in an RGB color space; max and min represent the maximum and minimum values, respectively; through the above transformation, the candidate region can be converted from the RGB color space to the HSI color space.

Further, in the step 4, the identification of the traffic signal lamp state is implemented as follows:

according to the detected candidate area of the traffic signal lamp, counting the number N of pixels with the hues H within the color range of 3 hues of red, green and yellow by using a color histogram_R，N_G，N_BAnd the total number of pixels N of the candidate area, calculating the corresponding ratio:

assuming that the threshold value is T, the color of the traffic signal lamp is judged as follows:

the invention has the following beneficial effects:

1. according to the invention, indoor and outdoor blind guiding modes are integrated according to the actual walking requirements of the blind, and complete fine blind guiding is carried out, so that the blind can more accurately reach the destination;

2. the indoor blind guiding mode adopts an object detection algorithm based on deep learning, reduces the laying cost of traditional RFID hardware, and is beneficial to the blind to know the surrounding environment more timely and make more accurate walking judgment by utilizing the rapidity and high accuracy of the algorithm

3. According to the deep learning detection algorithm, simulation is carried out on pictures shot by the blind when the blind walks, and special data enhancement processing is carried out during model training, so that the generalization and robustness of the model are better;

4. the outdoor blind guiding mode has unique dangerous object detection, can detect the continuously approaching dangerous objects such as vehicles and the like, helps the blind avoid encountering unexpected dangers, and can greatly improve the accuracy and the robustness of the algorithm by creatively combining the deep learning-based object detection algorithm and the Kalman filtering algorithm.

5. The outdoor blind guiding mode has a traffic signal lamp identification function, the color of the traffic signal lamp is determined by converting the color space of the candidate area and counting the pixel proportion of the hue H in three colors of red, yellow and green by using the color histogram, and the user can be ensured to pass through the red road lamp intersection without obstacles.

6. The core of the invention is a detection algorithm which can be carried out on a remote server, so that in the actual production, the requirement on the intelligent interactive terminal is not high, and the method can be applied to any android mobile phone or embedded terminal capable of carrying out photos and networking.

Drawings

FIG. 1 is a flow chart of an outdoor-internal mode;

FIG. 2 is a model of an object detection network based on deep learning;

FIG. 3 is a flow chart of the outdoor mode;

FIG. 4 is a flow chart of a Kalman filtering tracking algorithm.

FIG. 5 is a flow chart of a threat object detection algorithm of the present invention

Detailed Description

The invention is further illustrated by the following examples and figures. The blind person auxiliary walking method based on deep learning comprises the following steps:

step 1: when the blind person walks, the traditional auxiliary walking method generally only conducts indoor blind guide or outdoor blind guide, integrates the two modes, collects pictures in real time, and enters an outdoor blind guide mode or an indoor blind guide mode according to the current situation.

Step 2: indoor blind guiding mode, including marker recognition and text recognition, as shown in fig. 1. The marker recognition is used for training the deep convolution neural network through a specific data enhancement method, and then the trained network is used for detecting the indoor markers.

The network training process is as follows:

1) data preparation and preprocessing

An original marker data set is constructed first, the problems of pictures shot by the blind when the blind moves are analyzed, a degradation model is established, the original data set is correspondingly processed through the degradation model, and a training set with enhanced data is obtained.

2) Network model training

And training an object detection network based on the deep convolutional neural network by using the data enhanced data set, as shown in fig. 2, to obtain a network model for actual detection.

3) Marker identification

By constructing the degeneration model training network, the generalization performance and the detection accuracy of the network can be improved.

And step 3: and 2, performing text recognition, namely performing text analysis after recognizing the key mark, recognizing character information on the mark and helping the blind user to position.

Character recognition is carried out by a convolution circulation network, and the network consists of a convolution layer, a circulation layer and a transcription layer. The image to be detected is transmitted into a network, a convolutional layer in the network automatically extracts a characteristic sequence from an input image, then the cyclic network predicts each frame of the characteristic sequence output by the convolutional layer, and finally, a transcription layer converts each frame of prediction of the cyclic layer into a label sequence to obtain a final recognition result. Compared with the traditional OCR character recognition, the character recognition accuracy based on the neural network greatly improves the accuracy.

And 4, step 4: and an outdoor blind guiding mode comprising dangerous object detection and traffic signal identification, as shown in figure 3. The method comprises the following steps of performing object detection through deep learning, performing Kalman filtering tracking if the object is detected to be a dangerous target, judging the motion trend of the object by analyzing the tracking result of the object within a period of time, and performing alarm prompt if the object is continuously close to the dangerous target, wherein the Kalman filtering flow is shown in figure 4, the dangerous object detection flow is shown in figure 5, and the specific flow is as follows:

the initial dangerous target is obtained by using the detection result of the Yolov2, the centroid coordinate of the current target is predicted by using a Kalman filtering algorithm according to the centroid coordinate of the moving object in the previous image, the centroid coordinate obtained by using the detection result of the Yolov2 is compared with the predicted value obtained by using the Kalman filtering algorithm, and the centroid position of the moving target and the position and the size of the next frame detection target are finally determined according to the result, so that the moving speed is deduced, and a blind user is prompted.

If i represents a frame sequence in a video sequence, the major steps of the moving object detection algorithm based on the combination of the Yolov2 object detection algorithm and the kalman filter are described as follows:

1) yolov2 detects a dangerous object in the video, and takes the size and position of the search box as the position of the initialization window. The centroid coordinates of the window are extracted and are noted as (x)_i，y_i) Where i is 1.

2) Prediction of dangerous targets using kalman filtering algorithmCentroid coordinates in the i +1 th image

3) Detecting the coordinate position in the (i + 1) th image according to Yolov2, and obtaining the coordinates of the center of mass

4) Comparing the centroid position detected by the Kalman filtering algorithm with the centroid position detected by Yolov2, and if the condition is met:

is absent or present

Or

(T is a set threshold) (3.1)

The Yolov2 detection algorithm is considered to be interfered, problems such as missing detection or being shielded occur, and the Kalman filtering algorithm is selected to predict the centroid coordinate at the moment

As the centroid position of the current moving object. Namely:

and meanwhile, the size of the previous image is defaulted to be the size of a moving target in the current image, the center of mass is set to be the central position, the speed relation between the previous image and the current image is calculated, the movement trend is obtained, and the blind user is prompted.

On the contrary, if the condition shown in equation 3.18 is not satisfied, the centroid coordinates detected by Yolov2 are selected as the centroid position of the moving object of the current frame. Namely:

meanwhile, a target window obtained by a Yolov2 detection algorithm is set as a central position, the speed relation between the previous frame and the current frame is calculated, the movement trend is obtained, and the blind user is prompted.

5) And (5) jumping to the step (2) and sequentially executing the steps (3), (4) and (5). And repeating the steps until the dangerous target disappears.

And 5: and 4, identifying the traffic signal, namely if the traffic signal lamp is detected, converting the area image from the RGB color space to the HSV color space, counting the number of pixels of the hue H in red, green and yellow by using a statistical histogram, and calculating the ratio of the pixels so as to judge the color information of the traffic signal lamp.

Firstly, converting the candidate region from RGB color space to HSI color space, using color histogram to count the number N of pixels in the color range of 3 hues of red, green and yellow_R，N_G，N_BAnd the total number N of pixels in the candidate area, calculating the corresponding ratio Radio_R，Radio_G，Radio_BAssuming that its threshold is T, if some ratio is greater than T, the corresponding color state is identified.

The invention carries out corresponding auxiliary help on indoor and outdoor environments of the blind when the blind travels through deep learning and video tracking analysis. The method has the advantages of high accuracy, high detection speed, good robustness and good universality.

Claims

1. A blind person auxiliary walking method based on deep learning is characterized by comprising the following steps:

step (1), when the blind walks, pictures are collected in real time, and the blind enters an outdoor blind guiding mode or an indoor blind guiding mode according to the current situation;

step (2), the indoor blind guiding mode in step (1) comprises marker recognition and text recognition; the marker recognition is used for training a deep convolutional neural network through a data enhancement method, and then the trained network is used for detecting the indoor marker;

step (3), after the text recognition in the step (2), namely the key mark is recognized, text analysis is carried out, and the character information on the mark is recognized to help the blind user to position;

step (4), the outdoor blind guiding mode in the step (1) comprises dangerous object detection and traffic signal identification; the method comprises the steps of performing object detection through deep learning, performing Kalman filtering tracking if the object is detected as a dangerous target, judging the motion trend of the object by analyzing the tracking result of the object within a period of time, and performing alarm prompt if the object is continuously close to the dangerous target;

and (5) identifying the traffic signal in the step (4), namely if the traffic signal is detected, converting the candidate area of the detected traffic signal from the RGB color space to the HSV color space, counting the number of pixels of the hue H in red, green and yellow by using a statistical histogram, and calculating the ratio of the pixels so as to judge the color information of the traffic signal.

2. The blind person assistant walking method based on deep learning as claimed in claim 1, wherein in the step (2), the marker recognition is implemented as follows:

(1) data preparation and preprocessing

(2) network model training

(3) marker identification

3. The blind person assistant walking method based on deep learning as claimed in claim 1, wherein in the step (2), the data enhancement method is as follows:

；

(2) Affine transformation: as shown in formula (1.1), the pixel coordinates of the original image are (v, w), and the original image is subjected to space coordinate transformation through an affine matrix T to obtain coordinates (x, y);

(1.1)

(3) noise: when the blind people take a picture, a Gaussian noise formula (1.2) and a salt and pepper noise formula (1.3) are often introduced;

(1.2)

(1.3)

4. the blind person assistant walking method based on deep learning as claimed in claim 1, wherein in the step (3), the text recognition is implemented as follows:

the method comprises the steps of carrying out text recognition by a convolution cycle network, wherein the network consists of a convolution layer, a cycle layer and a transcription layer, transmitting an image to be detected into the network, automatically extracting a feature sequence from an input image by the convolution layer in the network, then predicting each frame of the feature sequence output by the convolution layer by the cycle network, and finally converting each frame of prediction of the cycle layer into a tag sequence by the transcription layer to obtain a final recognition result.

5. The blind person assistant walking method based on deep learning as claimed in claim 1, wherein in the step (4), the kalman filtering tracking is implemented as follows:

the state equation is as follows:

（1.4）

the measurement equation:

（1.5）

wherein,

at k time for discrete system

A state vector of dimensions; a is a discrete system

A dimensional state transition matrix;

is discrete system at k-1

Random noise of the dimension; c is

An observation matrix of dimensions;

is at k time

An observation vector of dimensions;

is that the discrete system measures the presence at kth

Dimensional random noise; ideally, a random noise vector is assumed

And

the covariance matrixes of the white Gaussian noise vectors are represented by formulas (1.6) to (1.8) respectively, wherein the white Gaussian noise vectors are zero mean values and are independent of each other:

（1.6）

（1.7）

（1.8）。

6. the blind person assistant walking method based on deep learning as claimed in claim 1, wherein in the step (4), the kalman filtering tracking is implemented as follows:

(1) best estimate of state at time k-1 for known discrete systems

Obtaining the prior estimated value at the k-th time through the matrix A

This estimated value is the state prediction result at the k-th time, and the formula is shown as (1.9):

（1.9）

(2) the prior estimation error is obtained by the formula (1.10)

And obtaining a covariance matrix according to the formula (1.11)

：

（1.10）

（1.11）

(3) Using measured observation data

Correcting a priori estimates

The calculation formula is as shown in formula (1.12):

）（1.12）

wherein,

called a posteriori estimate, is a priori estimate

The correction value of (1);

is a Kalman gain matrix;

(4) similarly, the a posteriori estimation error can be obtained from equation (1.13)

Obtaining the corresponding covariance matrix by the formula (1.14)

：

（1.13）

（1.14）

(5) The key point of design of the Kalman filter is to determine a system gain matrix

Such that the covariance of the a posteriori estimation error

Minimum; gain matrix that proves to be available such that the covariance of the a posteriori estimate is minimized

And the covariance matrix of the corresponding sum a posteriori estimation errors

Can be determined by equations (1.15) and (1.16), respectively:

（1.15）

（1.16）

wherein the covariance matrix of the a priori estimation error

The calculation formula of (1.17) is as follows:

（1.17）。

7. the blind person assistant walking method based on deep learning as claimed in claim 1, wherein in the step (5), the conversion formula for converting from RGB color space to HSI color space is as follows:

； (1.18)

； (1.19)

； (1.20)

8. The blind person assistant walking method based on deep learning as claimed in claim 1, wherein in the step (5), the traffic light recognition is implemented as follows:

according to the detected candidate area of the traffic signal lamp, counting the number of pixels of which the hue H is within 3 hue ranges of red, green and yellow by using a color histogram

,

,

And the total number of pixels N of the candidate area, calculating the corresponding ratio:

(1.21)

(1.22)。