CN112674998A

CN112674998A - Blind person traffic intersection assisting method based on rapid deep neural network and mobile intelligent device

Info

Publication number: CN112674998A
Application number: CN202011531293.8A
Authority: CN
Inventors: 何坚; 刘新远; 魏鑫; 王钦
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-04-20
Anticipated expiration: 2040-12-23
Also published as: CN112674998B

Abstract

A blind person traffic intersection assisting method based on a rapid deep neural network and a mobile intelligent device belongs to the field of electronic information. The method designs the neural network Q-DNN suitable for the mobile intelligent equipment by referring to other rapid deep neural networks, and by introducing an excellent network structure into the network, the feature extraction capability of the Q-DNN is enhanced, and redundant network parameters are reduced. The network realizes the accurate identification of important targets at the traffic intersection and the requirement of real-time operation on the mobile intelligent equipment. And secondly, guiding the blind to go to the correct position of the zebra crossing by combining the GPS and the detection result. The walking direction of the blind person is effectively assisted to be adjusted by detecting the zebra crossing direction, and the blind person can smoothly pass through the zebra crossing. A traffic light signal classification method based on an HSI space is provided according to a traffic light detection result, and the problem of traffic light signal identification is solved. Finally, the invention combines the methods, designs the traffic intersection blind person auxiliary flow based on the mobile intelligent device, and effectively assists the blind persons to safely and smoothly pass at the traffic intersection.

Description

Blind person traffic intersection assisting method based on rapid deep neural network and mobile intelligent device

Technical Field

The invention belongs to the field of electronic information, and relates to an auxiliary method capable of assisting blind people in identifying traffic intersection signal lamps and zebra crossings based on a rapid deep neural network.

Background

According to statistics, about 1730 tens of thousands of blind people exist in China, and the safe trip of the blind people is concerned. The current auxiliary tools for the blind to go out mainly comprise a blind guiding stick and a blind guiding dog. The blind guiding stick has a single function and a narrow detection range, and information acquisition only depends on the feelings of the blind, so that effective information help such as zebra crossings and traffic lights is difficult to provide in complex scenes such as traffic intersections; the blind guiding dog can guide the blind to avoid obstacles, but the blind guiding dog needs a series of long training and has the problems of high price, scarce quantity and the like.

With the rapid development of information technology, the development of Electronic Travel Aids (ETA) devices designed for the blind is a trend. At present, ETA mainly has two forms of blind guiding glasses and electronic blind sticks, but the ETA needs the blind to buy expensive special equipment separately, and most blind people cannot bear the high price.

Along with the development of mobile communication in recent years, mobile intelligent devices such as smart phones and pads are increasingly popularized, and become necessary equipment for daily life of blind people gradually. Sensing devices such as cameras, Beidou/GPS and inertial sensors equipped on the mobile terminals provide a new idea for developing a novel blind person traffic intersection auxiliary technology. In recent years, deep neural networks have achieved great success in the field of computer vision, but their complex network structures and large number of calculation parameters require powerful computational support. The mobile intelligent equipment is limited in computing resources, and accurate and efficient algorithm support is needed for identifying the zebra crossing and the signal lamp of the traffic intersection. In contrast, a fast deep neural network capable of running efficiently on a mobile intelligent device is designed, and the blind is assisted to quickly and accurately identify zebra crossings, traffic light signals, obstacles and the like at traffic intersections. Mobile phones have become popular among blind people as fully functional smart mobile devices. The mobile phone is worn in front of the chest of the blind person to serve as mobile wearable equipment for shooting traffic intersection scene images. And analyzing and processing the traffic intersection scene by using the rapid deep neural network, and finally guiding the blind person to safely pass through the traffic intersection by voice.

Disclosure of Invention

The invention provides a blind person traffic intersection assisting method based on a rapid deep neural network and a mobile intelligent device, aiming at solving the problems that equipment is expensive, complex and difficult to use in the blind person traffic intersection assisting method. The method comprises the steps that the mobile intelligent equipment is worn in front of the chest of the blind person, walking position and direction information of the blind person are sensed by means of the integrated Beidou/GPS, data such as zebra crossings and traffic lights at traffic intersections are collected through a camera, and the blind person is guided to accurately wait for the green lights to pass outside the zebra crossings; secondly, referring to a Convolutional Neural Network (CNN) architecture, a fast Deep Neural network Q-DNN (quick Deep Neural network) is designed, and the purpose that the mobile intelligent device accurately detects traffic marks such as zebra stripes, traffic lights and the like in traffic intersections in real time is achieved; the blind person is guided to safely pass through the traffic intersection in a voice or vibration mode.

The present invention relates to the following 4 aspects:

(1) obtaining walking position and orientation information of the blind by using a Beidou/GPS position sensor integrated with mobile intelligent equipment; the blind people are guided to accurately wait for green light to pass outside the zebra crossing by combining the information of the blind road and the zebra crossing collected by the mobile intelligent equipment;

(2) by referring to a convolutional neural network architecture, a fast deep neural network Q-DNN is designed and realized by improving a convolutional layer structure. The Q-DNN requires less computing resources, has good operation efficiency while ensuring the accuracy of the algorithm, enables the blind to detect barriers, zebra crossings and traffic light marks of the traffic intersection through mobile intelligent equipment, and provides decision support for assisting the blind to pass through the traffic intersection.

(3) And extracting a traffic light region in the image according to the zebra crossing and traffic light images detected by the Q-DNN, performing HSI (high speed infrastructure) conversion on the region, and counting the H value to realize accurate detection of the traffic light signal.

(4) According to the traffic light detection result, the walking position and direction information of the blind person is sensed by combining the Beidou/GPS, and the blind person is guided to safely pass through the traffic intersection in a voice/vibration mode.

Core algorithm of the invention

(1) Mobile phone for guiding blind person to wait green light at zebra crossing position

A flow chart of guiding the blind to the zebra crossing position by the mobile phone is shown in figure 1, firstly, the accurate position of the blind is obtained by a GPS/Beidou sensor arranged in the mobile phone, the set destination of the blind is prompted, and the passing route of the traffic crossing is determined in an electronic map. The walking direction of the blind is determined by using a magnetic sensor arranged in the mobile phone, whether the walking direction is along a planned route or not is judged, and if not, the blind is guided to adjust the direction in a voice mode. The Q-DNN network can effectively detect the position of the zebra crossing in the picture, the front area of the blind person is divided into a trapezoidal area, and if the zebra crossing is not located in the trapezoidal area, the blind person is guided to go to the zebra crossing by voice.

And finally, calculating the angle between the walking direction of the blind and the zebra crossing direction, and judging whether the blind faces the zebra crossing. Firstly, a zebra crossing area detected by the Q-DNN network is intercepted into a single picture for analysis. The zebra stripes are characterized by being equally spaced and parallel, and the edge color contrast is strong, so that the Canny edge detection algorithm and Hough transformation are adopted to realize the direction detection of the zebra stripes.

The Canny algorithm is used for extracting the edge of the zebra crossing, and the method comprises 4 steps:

1, an RGB image is converted into a gray scale image using a weighted average method. And carrying out weighted average on the RGB components by different weights. The calculation formula is as follows:

R＝G＝B＝(ω_RR+ω_GG+ω_BB)/3 (1)

wherein, ω is_R、ω_G、ω_BThe weights R, G, B, respectively, take different values to form different gray scale images. According to the experiment, the general value of omega_R＝0.299、ω_G＝0.587、ω_BGrayscale map obtained at 0.114The image effect is the best.

And 2, denoising by using Gaussian filtering so as to reduce the influence of noise on detection. And multiplying each pixel point and the neighborhood thereof by a Gaussian matrix, and taking the weighted average value of the pixel points as the final gray value. The calculation formula is as follows:

wherein f (m, n) represents the gray value of the pixel point at the position (m, n), and σ is the standard deviation of the gaussian distribution, represents the discrete degree of data, and determines the generation of the final gaussian filter template. The invention adopts a Gaussian filter template with the size of 3 x 3 and sigma 0.8 to denoise the gray-scale image.

3, calculating image gradient, namely the position with the strongest pixel value change, and adopting the difference G of the first order partial derivative in the horizontal and vertical directions_xAnd G_yThe amplitude G and direction θ of the gradient are calculated as shown in (3) and (4).

4, hysteresis threshold: by setting two threshold values (according to experimental results, the general setting is V ₁300 and V₂400) when the gradient amplitude is higher than V₂While remaining as a definite boundary when the gradient magnitude is less than V₁If so, the boundary is not considered and the selection is discarded. If it is between V₁And V₂Then, it is necessary to determine whether the edge pixel of the pixel value has more than V₂If yes, the data is reserved, otherwise, the data is discarded. This step determines which of the extracted boundaries are true boundaries.

The longer straight line in the edge of the zebra crossing is then extracted by hough transform, with the purpose of obtaining the slope, length, number of stripes and interval of the zebra crossing. The hough transform is based on point-line duality, i.e. a point on the same straight line (y ═ kx + b) in the image space (original space) corresponds to a straight line intersecting in the parameter space (k-b space), intersecting at the point (k, b). Through Hough transform, the detection problem of straight lines in an image space can be converted into the detection problem of a midpoint in a parameter space. And the detection of the midpoint in the parameter space can be completed by only carrying out simple accumulation statistics. In actual operation, the steps are as follows:

1. obtaining edges in the image through a Canny algorithm;

2. drawing a straight line in k-b space for each point of the edge;

3. for each point on the line, we adopt the method of "voting" (vote), i.e. adding up: there is a straight line passing through this point, the value of this point plus 1;

4. and traversing the k-b space to find local maximum value points, wherein the coordinates (k, b) of the points are the slope and the intercept of a possible straight line in the original image.

And obtaining a plurality of approximately parallel zebra crossing straight lines according to the straight lines extracted by Hough transform, as shown in fig. 2. By making the vertical with these parallel straight lines, a plurality of straight lines with an approximate angle (the angle is the angle with the blind walking direction) can be obtained. Therefore, the direction included angle theta between the acquired zebra crossing walking direction and the current blind walking direction can be acquired. Calculating the average angle according to equation (5)

According to the angle

The blind person can be guided to adjust the direction of the blind person, and the blind person can be ensured to walk in the correct direction of the zebra crossing.

Wherein the angle theta_iIs the ith vertical line and the blindThe angle of the walking direction, m, represents the number of vertical lines.

(2) Fast deep neural network Q-DNN architecture

The architecture is shown in fig. 3 and 4, and the whole structure is divided into 4 parts in total. Part 1 is an Input that adjusts a picture taken by the mobile smart device to a specified size of pixels. And part 2 is a Backbone network of the Backbone, which is responsible for extracting picture features, and the network uses a CSP structure to enhance the extraction capability of the features. Part 3 is the neutral, which adopts a spatial Pyramid pooling SPP (spatial Pyramid Pooling) module and an FPN (feature Pyramid network) + PAN (path Aggregation network) structure to realize the multi-scale extraction and fusion of Backbone characteristics. And finally, a Prediction module of Prediction outputs 3 feature maps with different scales, and the CIOU _ Loss is used as a Loss function to solve the optimal parameters of the network. The detailed description of the 4 sections is as follows:

1) the Input end firstly adjusts the size of the image in a self-adaptive mode when the Input end inputs the neural network, and the image is adjusted to be square. For example, the invention, when implementing the network, will adjust the picture to a 3-channel RGB image of 608 x 608 pixels.

2) The Backbone network of the Backbone takes the CSP (Cross Stage initiative) structure and the residual error network (ResNet) idea as reference, and consists of 3 conv1_ X convolution modules. X in the conv1_ X convolution block represents the number of residual components (Res units) in the block, 1 conv1_1 and 2 conv1_3 in the Backbone, 1 Res unit in the conv1_1 block, and 2 Res units in the conv1_2 block. The specific structure of conv1_ X is: and the convolutional layer is followed by a group of Batch Normalization (BN for short in the figure) and an activation function Relu, and all the activation functions Relu in the neural network designed by the invention adopt Leaky _ Relu. Then, by taking the CSP structure as a reference, X residual error components Res unit are designed to connect 1 conv and another conv to perform concat tensor splicing operation. Tensor stitching is to expand the dimensions of two tensors, e.g., 26 × 256 and 26 × 512, resulting in 26 × 768. And after output, 1 group of BN + Relu is connected. The CSP structure is adopted to divide the feature mapping of the basic layer into two parts, and then the two parts are combined through a cross-stage hierarchical structure, so that the accuracy can be ensured while the calculated amount is reduced. The backhaul network is connected with an SPP module finally, and the SPP module realizes the fusion of local features and global features by using the SPP-Net thought for reference. SPP uses a plurality of maximum pooling layers with different scales to extract features, and finally concat operation is carried out to integrate the features together. The Backbone network of the backhaul designed by the invention uses the classic network framework and the excellent network design thought in the convolutional neural network for reference, and can greatly reduce the operation time while ensuring the efficient extraction of the image characteristics.

The Neck part mainly consists of 2 modules of FPN + PAN. The semantic features of the picture are extracted from the FPN layer in the FPN + PAN module in a top-down mode, the dimensionality of deep features is expanded to the dimensionality of shallow features through the up-sampling operation, and then tensor splicing is carried out, so that fusion of the deep features and the shallow features is achieved. The PAN layer extracts the position features in the picture from bottom to top. And (4) completing fusion by splicing dimensionality reduction of the high-dimensional feature and the low-dimensional feature concat in a downsampling mode. In this combination, the FPN layer conveys strong semantic features from top to bottom, while the PAN conveys strong localization features from bottom to top. The conv2 component uses the CSP structure to improve the network feature fusion capability like the conv1 component of Backbone, except that the conv2 component has no residual structure.

And a Prediction part, and finally outputting three groups of matrixes with different dimensionalities N. The specific meaning is the same as that of YOLO V3, for example, the matrix dimension is N × N, which means that the picture is divided into N × N meshes, and each mesh generates B predicted bounding boxes to detect the target in the mesh, that is, each mesh has B bounding boxes generated by anchors value prediction and a confidence CS indicating whether the mesh contains the target, so as to comprehensively reflect the possibility of the target existing in the bounding box based on the current model and the accuracy of the predicted target position.

Wherein Pr (object) indicates whether the center point of the object is contained in the mesh, and if so, is 1; otherwise is0，

The intersection ratio is used for representing the intersection ratio of the bounding box generated by grid prediction and the real bounding box area of the object;

each boundary prediction box contains 5 parameters [ x, y, w, h, confidence [ ]]，[x,y]Represents the coordinates of the center point of the target within the grid, [ w, h ]]Representing the width and height of the predicted boundary box, while confidence represents the intersection ratio of the predicted boundary box and the real boundary box of the object, and each grid corresponds to a predicted value C for predicting whether a certain type of target condition is contained_iThe expression is

C_i＝Pr(Class_i|Object) (7)

Wherein, Class_iRepresenting the ith category, the formula means that the probability of the ith category is judged on the basis of Pr (object), and when Pr (object) of the box is 0, the predicted value C is_iIs simply 0. If 1, then the probability that the box object belongs to the ith class is predicted.

(3) Traffic light status determination

The shape of the traffic lights and the position information thereof can be recognized by using the Q-DNN network, but the colors of the traffic lights cannot be recognized. Therefore, the invention provides a traffic light classification algorithm based on the HSI color space. The classification of traffic lights is performed in the H (hue) subspace of the HSI space. The actually photographed traffic light image is not pure red or pure green due to the influence of illumination and the like, so that a color statistic needs to be performed on the positive sample set to obtain a relatively accurate setting of the color threshold range during detection. When the statistics is carried out, a sample needs to be converted into an HSI color space from an RGB color space, and the conversion formula is as follows:

wherein H is a chromatic value, S is a saturation value, and I is a brightness value; r, G, B are the red, green, and blue components used to describe color in the RGB color space, respectively; through the above transformation, the candidate region can be converted from the RGB color space to the HSI color space.

Counting red information in actual shooting samples, converting each sample picture into HSI color space, and acquiring the H color value of each pixel in the picture; secondly, setting a color counter, and accumulating the counters of corresponding color values respectively; and finally, counting a histogram of the color number, wherein the color number in the histogram is mainly concentrated in which range, and then judging that the corresponding color in the red and green signal lamps is on.

Effects of the invention

The blind traffic intersection assisting method based on the mobile intelligent device is simple and easy to implement and easy to realize. The fast deep neural network is introduced to quickly and accurately identify barriers such as zebra stripes, traffic lights, vehicles and pedestrians at the traffic intersection, and the blind can be assisted to safely pass through the traffic intersection in a mode suitable for the perception characteristics of the blind according to the walking position and orientation information of the blind collected by the GPS/Beidou.

Invention difficulties

(1) A Q-DNN network based on a convolutional neural network is designed, and considering that the neural network needs to operate on mobile equipment (the calculation power is weak), the difficulty is that the designed Q-DNN network meets the requirements of detection result accuracy and real-time performance.

(2) The difficulty in guiding the blind to the zebra crossing is how to guide the blind to be over against the zebra crossing. The invention designs a method for calculating the walking direction of the zebra crossing and solves the guiding problem by combining the walking direction of the blind.

(3) The overall system design of the secondary approach is also a difficulty. The judgment process of the whole system needs to be designed, including what needs to be judged, the judgment process, what problems occur and need to be reminded, what way to remind, and the like.

Drawings

FIG. 1 is a flow chart of guiding a blind person to a zebra crossing location by a mobile phone

FIG. 2 is a schematic view of the zebra crossing direction

FIG. 3 is the first half of a Q-DNN network architecture, including Input and backhaul.

Fig. 4 is the second half of the Q-DNN network structure, including Neck and Prediction.

Fig. 5 shows the histogram distribution of H values in HSI when the traffic light is red.

Fig. 6 is a flow chart of a traffic intersection blind person assisting method.

Detailed Description

The present invention uses Microsoft's COCO dataset to pre-train the Q-DNN network. In addition, 2000 groups of traffic intersection scene training data are separately collected, obstacles, zebra crossings and traffic lights in the traffic intersection scene training data are manually marked, and the pre-trained Q-DNN network is subjected to reinforced training by adopting the data set. In training, dividing training samples into a training set and a verification set according to a ratio of 9: 1; during training, setting the value of batch size to 64; and adopting CIOU _ loss as a loss function of the Bounding box. CIOU _ loss is scale information considering the bounding box aspect ratio based on DIOU. The CIOU _ loss calculation method is shown in formula (12):

wherein IOU is the intersection ratio of the prediction frame and the target frame, Distance₂Is the Euclidean Distance, of the center points of the predicted frame and the target frame_cIs the minimum diagonal distance of the bounding matrix. v is a parameter for measuring the uniformity of the aspect ratio, and the calculation method is shown as formula (13):

wherein w^gtIs the target frame width, h^gtIs the target frame length, w^pIs the predicted frame width, h^pIs the prediction box length.

The specific implementation steps of each module of Q-DNN are described in detail below, with reference to fig. 3 and 4:

(1) the Input inputs the picture size 608 × 3.

(2) The backhaul Backbone network, the first conv component is a convolutional layer, the number of convolutional kernels is 64, the size is 3 × 3, stride step size is 2, the layer is a down sample, and the size of the final output feature matrix is 304 × 64. Next are 1 conv1_1 and 2 conv1_3 components, the conv1_ X component, which uses the CSP structure as a reference, first is a conv convolutional layer followed by BN + Relu combination, which is the only convolutional layer in conv1_ X responsible for the downsampling operation, with a convolutional kernel size of 3 and a step size of 2. The Res Unit component follows, which is designed using a residual structure, the first convolution layer in the component, the convolution kernel size is 1 x 1, the step size is 1, and the Padding size is 1. The second convolution layer uses convolution kernels with a size of 3 x 3 and step size of 1. The remaining two convolutional layers in the conv1_ X component are each a convolutional kernel with a size of 3X 3 and a step size of 1. The output of the conv1_1 component is 152 × 128, the output of the first conv1_3 component is 76 × 256, and the output of the second conv1_3 component is 38 × 512. The specific network structure is shown in table 1.

TABLE 1

The SPP module firstly carries out padding operation, then carries out maximum pooling by respectively adopting k ═ {1 × 1,5 × 5,9 × 9,13 × 13}, with the step length of 1, and finally carries out concat splicing operation on feature maps with different scales.

(3) The Neck section, all convolutional layers are in the conv2 component, which differs from the conv1_ X component in that the residual structure Res Unit is eliminated. The convolution kernel size of the second conv convolution layer in the conv2 component is 1 x 1, the convolution kernel sizes of the remaining three conv convolution layers are 3 x 3, and the step size is 1. But the first conv2 component after the SPP component has the effect of downsampling, the convolution kernel step size in the first conv convolutional layer of this component is 2. The up-sampling module uses bilinear interpolation (bilinear interpolation), which calculates the to-be-interpolated value according to the 2 × 2 known values closest to the to-be-interpolated value, the weight of each known value is determined by the distance to the to-be-interpolated value, and the weight is larger when the distance is closer. Suppose we want to get the value of the unknown function f at point P ═ x, y, suppose we know that the function f is known at Q₁₁＝(x₁,y₁)、Q₁₂＝(x₁,y₂)、Q₂₁＝(x₂,y₁) And Q₂₂＝(x₂,y₂) Values of four points. Firstly, linear interpolation is carried out in the x direction to obtain:

then linear interpolation is performed in the y direction to obtain f (x, y):

the downsampling module uses convolution layer implementation, the size of convolution kernel is 3 x 3, and the step size is 2. The number of convolution kernels in the upper downsampling block in fig. 4 is 256, and the processed output feature matrix size is 38 × 256. The number of convolution kernels in the down-sampling module is 512, and the processed output is 76 × 512.

In the Prediction part, the convolution kernels in all 3 conv convolution layers have the size of 1 × 1, the number of convolution kernels is 255, and the step size is 1. 3 kinds of feature maps with the sizes of 76 × 76 × 255, 38 × 38 × 255 and 19 × 19 × 255 are output. Dimension 255 represents 3 85-dimensional (i.e., 3 × 85) grids, so that each grid of each feature map has 3 prior boxes, each prior box contains a prior box position (4 dimensions), a detection confidence (1 dimension), and a category (80 dimensions, since the total of 80 categories in the used training data set) and is formed by 85 dimensions in total, and the

other dimensions

76, 38, and 19 of the feature map represent the size of the feature map.

(2) The invention shoots traffic light pictures at crossroads in different places for 2000 in total. The photographing environment conditions include day, night, close range, remote range, front, and non-front. Wherein the non-frontal angle is controlled to be in the range of plus or minus 20 deg.. 1800 collected traffic light pictures are used as training samples, and the remaining 200 traffic light pictures are used as test samples. And finally obtaining the correct red and green histogram distribution conditions through picture statistical training. As shown in fig. 5, the numbers of colors are mainly distributed in the regions of [240, 300] and [0, 50], so that the H values of the red light colors can be defined in the corresponding ranges marked by the red dotted lines, i.e., [240, 300] and [0, 50 ]. Similarly, the corresponding H value of the green color may be defined at [120, 210] according to the statistical histogram of the green light color.

(3) The invention adopts voice synthesis software to convert instructions for assisting the blind to walk into voice. The blind person can quickly know the information of various barriers and objects around the position of the blind person through a voice broadcasting mode, the sense richness of the blind person is improved, and the blind person can feel the world around the blind person as normal people.

The flow of the auxiliary method is described in detail below with reference to fig. 6.

(1) When the blind person needs to pass through the traffic intersection, the mobile intelligent device is manually turned on first, and the auxiliary function is turned on. The GPS/Beidou is used for positioning to the current position of the blind person, the blind person is helped to set a destination, and a walking route is automatically planned in an electronic map. The walking direction of the blind person is judged by using the magnetic sensor, and the blind person is guided to move forward along the correct direction by combining the walking direction and the planned route.

(2) And starting a camera to acquire traffic intersection scene information in a video stream mode, wherein the video stream is 10fps, the pixel size of a video frame is 1280 x 1080, and each frame in the video stream is transmitted to the Q-DNN network for analysis. The Q-DNN network is trained, runs on mobile intelligent equipment, can identify important targets such as barriers, traffic lights, zebra crossings and the like, and marks corresponding positions.

(3) And intercepting the traffic light region picture according to the detection result, and detecting the current specific display condition of the traffic light. If the blind person is in the red light and the inertial sensor detects that the blind person is walking, the blind person is reminded to stop waiting in a voice and vibration mode. If the blind person is in the green light, whether the zebra crossing is in the front walking area or not is judged, and the blind person is guided to the zebra crossing position through voice.

(4) Whether obstacles such as vehicles, pedestrians and the like exist in the blind person walking area is detected in real time. If the abnormity is detected, the blind person is reminded of the existence of the obstacle in front of the blind person to avoid through two modes of voice and vibration.

(5) And finally, determining that the blind smoothly passes through the intersection according to the accurate position information given by the GPS/Beidou, and finishing the assistance.

Claims

1. The blind traffic intersection assisting method based on the rapid deep neural network and the mobile intelligent device is characterized by comprising the following steps of:

Firstly, obtaining the accurate position of the blind by a GPS/Beidou sensor arranged in a mobile phone, prompting the blind to set a destination, and determining a passing route of a traffic intersection in an electronic map; determining the walking direction of the blind by using a magnetic sensor built in the mobile phone, judging whether the walking direction is along a planned route, and if not, guiding the blind to adjust the direction by using a voice mode; the Q-DNN network effectively detects the position of the zebra crossing in the picture, divides the front area of the blind into a trapezoidal area, and guides the blind to go to the zebra crossing by voice if the zebra crossing is not in the trapezoidal area;

finally, calculating the angle between the walking direction of the blind and the zebra crossing direction, and judging whether the blind faces the zebra crossing; firstly, intercepting a zebra crossing area detected by a Q-DNN network into an independent picture for analysis; the zebra stripes are characterized by being equally spaced and parallel, and the edge color contrast is strong, so that the Canny edge detection algorithm and Hough transformation are adopted to realize the direction detection of the zebra stripes;

1, changing an RGB image into a gray map by using a weighted average method; carrying out weighted average on the RGB components by different weights; the calculation formula is as follows:

R＝G＝B＝(ω_RR+ω_GG+ω_BB)/3 (1)

wherein, ω is_R、ω_G、ω_BR, G, B, different values are taken to form different gray level images; wherein ω is_R＝0.299、ω_G＝0.587、ω_B＝0.114；

2, denoising by using Gaussian filtering so as to reduce the influence of noise on detection; multiplying each pixel point and the neighborhood thereof by a Gaussian matrix, and taking the average value with the weight as the final gray value; the calculation formula is as follows:

f (m, n) represents the gray value of the pixel point at the position (m, n), sigma is the standard deviation of Gaussian distribution, and a Gaussian filter template with the size of 3 x 3 and the sigma of 0.8 is adopted to denoise the gray image;

3, calculating image gradient, namely the position with the strongest pixel value change, and adopting the difference G of the first order partial derivative in the horizontal and vertical directions_xAnd G_yThe amplitude G and the direction theta of the gradient are calculated, and the calculation formulas are shown as (3) and (4);

4, hysteresis threshold: by setting two thresholds V₁300 and V₂400, when the gradient amplitude is higher than V₂While remaining as a definite boundary when the gradient magnitude is less than V₁If the boundary is not the boundary, selecting to discard; if it is between V₁And V₂Then, it is necessary to determine whether the edge pixel of the pixel value has more than V₂If yes, the data is reserved, otherwise, the data is discarded; this step determines which of the extracted boundaries are true boundaries;

then extracting a longer straight line in the edge of the zebra stripes through Hough transform, wherein the purpose is to obtain the slope, the length, the number and the interval of the zebra stripes; the hough transform is based on point-line duality, i.e. a point on the same straight line y ═ kx + b in the image space corresponds to a straight line intersecting in the parameter space, intersecting at point (k, b);

obtaining a plurality of zebra crossing straight lines which are approximately parallel according to straight lines extracted by Hough transform, and obtaining a plurality of straight lines with approximate angles by being vertical to the parallel straight lines; obtaining a direction included angle theta between the acquired zebra crossing walking direction and the current blind walking direction by utilizing the acquired zebra crossing walking direction and the current blind walking direction;

calculating the average angle according to equation (5)

According to the angle

The blind person is guided to adjust the direction of the blind person, and the blind person is ensured to walk in the correct direction of the zebra crossing;

wherein the angle theta_iIs the angle between the ith vertical line and the walking direction of the blind, and m represents the number of the vertical lines;

(2) fast deep neural network Q-DNN architecture

The Q-DNN architecture is divided into a total of 4 parts; the part 1 is an Input end, and the part adjusts the pixels of the pictures acquired by the mobile intelligent equipment; part 2 is a Backbone network of the backhaul, which is responsible for extracting picture features, and the network uses a CSP structure to enhance the extraction capability of the features; part 3 is the neutral, which adopts a spatial Pyramid pooling SPP (spatial Pyramid Pooling) module and an FPN (feature Pyramid network) + PAN (path Aggregation network) structure to realize the multi-scale extraction and fusion of Backbone characteristics; finally, a Prediction module of Prediction outputs 3 feature graphs with different scales, and the CIOU _ Loss is used as a Loss function to solve the optimal parameters of the network;

the detailed description of the 4 sections is as follows:

1) the Input end firstly self-adaptively adjusts the size of an image when the Input end inputs the neural network, and adjusts the image into a square;

2) the backhaul Backbone network is formed by 3 conv1_ X convolution modules by taking advantage of CSP (Cross Stage initiative) structure and residual error network (ResNet) thought; x in the conv1_ X convolution module represents the number of residual components (Res units) in the module, 1 conv1_1 and 2 conv1_3 in the Backbone, 1 Res unit in the conv1_1 module, and 2 Res units in the conv1_2 module; the specific structure of conv1_ X is: the system comprises a group of convolution layers conv, wherein a group of Batch Normalization and an activation function Relu are arranged behind the convolution layers, and the activation function Relu adopts Leaky _ Relu;

then, by taking the CSP structure as a reference, designing X residual error components Res unit to connect 1 conv and another conv to perform concat tensor splicing operation; tensor splicing is to expand the dimensionality of two tensors, and 1 group of BN + Relu is connected after output; the CSP structure is adopted to divide the feature mapping of the basic layer into two parts, and then the two parts are combined through a cross-stage hierarchical structure, so that the calculated amount is reduced and the accuracy is ensured; finally, the backhaul network is connected with an SPP module to extract features, and finally concat operation is carried out to integrate the features together;

3) a tack section comprising FPN + PAN 2 modules; the FPN layer in the FPN + PAN module extracts semantic features of the picture in a top-down mode, expands the dimensionality of deep features to the dimensionality of shallow features through an up-sampling operation, then performs tensor splicing,

the PAN layer extracts position features in the picture from bottom to top; the dimensionality reduction of the high-dimensional features and the concat of the low-dimensional features are spliced to complete fusion in a down-sampling mode;

4) a Prediction part finally outputs three groups of matrixes with different dimensionalities N; b predicted boundary frames are generated by each grid to detect the target in the grid, namely, each grid is provided with B boundary frames generated by the anchors value prediction and confidence CS indicating whether the grid contains the target or not, so that the possibility of the target existing in the boundary frame based on the current model and the accuracy of the predicted target position are comprehensively reflected;

wherein Pr (object) represents whether the center point of the object is contained in the grid, and if so, is 1; on the contrary, the number is 0,

C_i＝Pr(Class_i|Object) (7)

Wherein, Class_iIndicating the ith category, the formula is that the boundary prediction box is judged to belong to the ith category on the basis of Pr (object)i categories, when Pr (object) of the box is 0, then the value C is predicted_iIs 0; if the frame object is 1, predicting the probability that the frame object belongs to the ith class;

identifying the shape of the traffic light and position information thereof by using a Q-DNN network;

(3) traffic light status determination

The classification of the traffic lights is carried out in the H subspace of the HSI space;

counting red information in actual shooting samples, converting each sample picture into HSI color space, and acquiring the H color value of each pixel in the picture; secondly, setting a color counter, and accumulating the counters of corresponding color values respectively; and finally, counting a histogram of the color quantity, and judging the range in which the color quantity is most concentrated in the histogram, wherein the brightness of the corresponding color in the red and green signal lamps is judged.