CN114995450A

CN114995450A - Intelligent navigation method and system for blind people by using multi-eye stereoscopic vision

Info

Publication number: CN114995450A
Application number: CN202210705969.3A
Authority: CN
Inventors: 王智博; 王志健; 王斌
Original assignee: Shanghai Tuowang Data Technology Co ltd
Current assignee: Shanghai Tuowang Data Technology Co ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2022-09-02

Abstract

The invention provides a blind intelligent navigation method and a system by using multi-eye stereo vision, wherein the method comprises the following steps: acquiring image data by a multi-view image acquisition device, acquiring a camera calibration result by utilizing a preset calibration method and acquiring positioning information; uploading the image data, the positioning information and the camera calibration result to a preset data processing center; acquiring a three-dimensional model, and performing obstacle detection, moving object tracking, face detection and traffic light detection according to the image data, the positioning information and the camera calibration result so as to plan the traveling route of the blind; planning a blind person traveling route according to traffic light prompt information, character identification data, fixed obstacle positions and moving obstacle tracks; processing the traffic light prompt information, the character recognition data, the fixed obstacle position, the moving obstacle track and the blind person traveling route to obtain prompt signals, and generating and sending out the blind person sensory early warning. The invention solves the technical problems of large blind guiding error, poor portability and high use cost.

Description

Intelligent navigation method and system for blind people by using multi-eye stereoscopic vision

Technical Field

The invention relates to the field of machine vision navigation, in particular to an intelligent navigation method and system for blind people by using multi-view stereoscopic vision.

Background

1. The traditional blind guiding products in the market comprise a blind guiding dog and a blind guiding stick. The blind guide dog can effectively help the blind to solve the traveling problem, but the selection and training process of the blind guide dog is very strict, and the training period is quite long, so that most blind people are difficult to bear. The blind guiding stick can be used for detecting the road condition in front of the blind person, is convenient to operate, but has single function and limited detection range, and cannot properly ensure the traveling safety of the blind person. Some intelligent blind sticks add obstacle avoidance and blind road detection functions on the basis of common blind sticks, but the camera is low in installation position and frequently swings along with the sticks in the advancing process, so that the visual field range is small, and the image quality is poor. The intelligent blind guiding stick disclosed in the prior patent application publication No. CN114569417A comprises a blind guiding stick body, wherein a handle is fixedly connected to the top end of the blind guiding stick body, a control module is arranged on the blind guiding stick body, a distance measuring mechanism and an obstacle avoiding mechanism are arranged at one end of the blind guiding stick body, a power supply module is further arranged on the blind guiding stick body, the power supply module is electrically connected with the control module, a loudspeaker is fixedly mounted on the blind guiding stick body, the distance between the blind guiding stick body and an obstacle is measured through an ultrasonic distance measuring module, the obstacle avoiding module detects the obstacle according to an infrared reflection principle, and then the distance information of the obstacle is broadcasted through the loudspeaker to remind the blind to pay attention. The prior application document does not disclose the technical scheme of the application, and the technical effect of the application cannot be achieved.

2. For the scheme of realizing blind guiding by using the laser radar, the three-dimensional laser scanning technology can provide three-dimensional point cloud data of the surface of an object, so that the method can be used for acquiring a high-precision and high-resolution digital scene model. However, the method is limited by high cost, is difficult to produce in quantity, and is not suitable for large-scale popularization and application across the country. Meanwhile, three-dimensional point clouds generated by the laser radar are sparse, and for a long-distance object or a small object, the number of reflection points is very small, so that the processing of data is not facilitated. The patent application document CN111427343A of the prior invention, namely an intelligent blind guiding method and an intelligent wheelchair, comprises the steps of synchronously acquiring data at regular time; a step of data transmission; performing operation judgment; and (5) navigation of the wheelchair. The intelligent blind guiding system comprises a satellite navigation system, a camera system, a voice recognition system and a laser radar system, wherein the memories are respectively distributed to the satellite navigation system, the camera system, the voice recognition system and the laser radar system, and the data processing system processes data output by the satellite navigation system, the camera system, the voice recognition system and the laser radar system to form an instruction for guiding the movement direction of the movement control system adopts the method and the intelligent wheelchair. The prior application document does not disclose the technical scheme of the application, and the technical effect of the application cannot be achieved.

3. In the market, a binocular scheme is used for obtaining the depth information, but in an actual case, the binocular depth information has a very large error in a z coordinate, so that an accurate result is difficult to obtain, and a plurality of problems which cannot be improved in application exist. Patent document CN108245385A discloses a device for helping visually impaired people to go out, which includes: the system comprises a depth image calculation unit, a point cloud construction unit, a ground detection unit and an object segmentation unit, wherein the depth image calculation unit is used for removing point clouds on the ground and below and carrying out point cloud segmentation based on a clustering method to obtain each object; the object tracking unit is used for tracking each object and calculating the size, the movement direction, the movement track and the movement speed of the object in a three-dimensional space; the object recognition unit is used for projecting each object to obtain an image area, extracting an RGB image and recognizing the object based on the RGB image; and a voice synthesizing and outputting unit for synthesizing the type, position, moving direction and speed of the object into voice and outputting the synthesized voice to inform the visually impaired. The specification of the prior patent document also discloses that the parallax calculation method obtains an initial parallax; then, constructing a graph model for all pixels in the image, wherein nodes of the graph are disparity values of the pixels, and edges of the graph are similarity measurement between the pixels; the disparity values of the pixels are propagated through multiple iterations in the graph model to achieve global optimization. And converting the parallax information into depth information according to external parameters and internal parameters of the camera. The prior patent application document does not fully disclose the specific implementation logic and related devices in the technical solution of the present application, and the technical effects of the present application cannot be achieved.

In conclusion, the prior art has the technical problems of large blind guiding error, poor portability and high use cost.

Disclosure of Invention

The invention aims to solve the technical problems of large blind guiding error, poor portability and high use cost in the prior art.

The invention adopts the following technical scheme to solve the technical problems: a blind intelligent navigation method using multi-view stereo vision comprises the following steps:

s1, collecting image data by a multi-view image collecting device, obtaining a camera calibration result by utilizing a preset calibration method and collecting positioning information;

s2, uploading the image data, the positioning information and the camera calibration result to a preset data processing center;

s3, acquiring a three-dimensional model, and performing obstacle detection, moving object tracking, face detection and traffic light detection according to the image data, the positioning information and the camera calibration result so as to plan the blind person traveling route, wherein the step S3 comprises the following steps:

s31, extracting image feature points in the image data, obtaining feature point pairs by solving the logic matching processing image feature points according to a preset Hamming distance, obtaining image depth information through fusion processing of a parallax computation principle to obtain object surface three-dimensional point clouds, and obtaining a blind person peripheral scene three-dimensional stereo model through point cloud meshing processing;

s32, uploading the image data to a preset remote blind guiding platform, so as to utilize a preset traffic light filtering and identifying logic and identify the traffic light state according to the image data;

s33, acquiring a face image from the image data and aligning the face, and using the preset face recognition logic to recognize the face according to the face image and the face information stored in the preset database so as to obtain the figure recognition data;

s34, identifying and classifying barriers according to a three-dimensional model of a scene surrounding the blind person to obtain barrier types and barrier orientations, processing image depth information, barrier types and barrier orientations by using a preset multi-target monitoring model to obtain motion information, extracting eigenvectors by using a Deepsort algorithm, performing association processing on the motion information by using a Kalman filtering algorithm, performing association processing on the eigenvectors by using a Hungary algorithm, and obtaining fixed barrier positions and moving barrier tracks;

s4, planning a blind person traveling route according to the traffic light prompt information, the person identification data, the fixed obstacle position and the moving obstacle track;

s5, processing the traffic light prompt information, the person identification data, the fixed obstacle position, the moving obstacle track and the blind person traveling route to obtain a prompt signal, and generating and sending out the blind person sensory early warning.

Aiming at the defects of the prior art and the prior art, the invention provides navigation equipment and a system for helping the blind to safely go out by utilizing a multi-view camera array and a corresponding navigation algorithm. Compared with the existing navigation system based on the laser radar, the navigation device and the navigation system reduce the dependence of the navigation system on the laser radar, realize the high availability of the multi-camera array device in the field of navigation for the blind, and can effectively guarantee the travel safety of blind people. Meanwhile, the system can cover more blind people due to lower cost, and helps more blind people to realize safe and convenient travel. Through 5G communication and distributed calculation, a navigation algorithm for the travel of the blind is designed by utilizing a multi-eye stereoscopic vision three-dimensional reconstruction technology, a multi-target detection and tracking technology, a traffic light detection and identification technology and a face detection and identification technology, real-time data acquisition and real-time data processing are achieved, and the data are transmitted back to a remote blind guiding platform in real time. The high availability of the multi-view camera array equipment in the field of blind person navigation is realized, and the travel safety of blind person groups can be effectively guaranteed.

In a more specific embodiment, step S1 includes:

s11, receiving a processor acquisition command by a multi-view image acquisition device to acquire image data of a peripheral scene through a multi-view camera array;

s12, calibrating the multi-view camera array by the following logic by using a preset calibration method to obtain a camera calibration result of the multi-view camera array, wherein the camera calibration result comprises: internal and external parameters:

let two-dimensional point m ═ u, v] ^T And the three-dimensional point M ═ X, Y, Z corresponding thereto] ^T The augmented vector of (a) is expressed as:

m′＝[u,v,1] ^T and M' ═ X, Y, Z,1] ^T ；

S13, obtaining a general formula of Zhangzhengyou scaling method according to preset camera model processing:

sm′＝A[R|t]M′

wherein

Is the camera internal reference matrix, [ R | t]The external parameter matrix of the camera is used, and s is a scaling coefficient;

s14, solving an equation according to a general formula of Zhangyingyou calibration method to obtain internal parameters and external parameters;

s15, optimizing the internal parameters and the external parameters by utilizing the following logics and combining a nonlinear optimization method of maximum likelihood estimation:

wherein n is the number of calibration planes, and m is the number of calibration plane angular points;

s16, obtaining the current positioning information by using the preset Beidou satellite positioning device, and sending out vibration by the vibration navigation device to prompt the current position of the blind.

The invention can position the position of the blind in real time and inform the blind through vibration prompt, and meanwhile, the family members of the blind can acquire the current position of the blind through the electronic map of the remote blind guiding platform in real time.

In a more specific technical solution, in step S2, the image data, the positioning information and the camera calibration result are uploaded to the data processing center through 5G communication.

The invention transmits data through 5G communication, and realizes high-bandwidth and low-delay real-time return of the terminal equipment. The navigation system realizes the functions of image and three-dimensional point cloud data acquisition and processing, and greatly reduces the data processing time of the remote blind guiding platform.

In a more specific embodiment, step S31 includes:

s311, judging the difference degree between the pixel and the surrounding neighborhood according to a preset threshold value T, judging and acquiring a pixel corner point according to the difference degree, using the pixel corner point as a first version detection feature point, utilizing non-maximum value to inhibit and process the first version detection feature point so as to reserve and respond to the maximum value feature point, selecting n pairs of p and q pixel pairs around the maximum value feature point, and obtaining an n-dimensional feature point description vector according to the n pairs of p and q pixel pairs.

S312, calculating and sequencing description vector distances between the feature point description vectors and all the rest feature point description vectors, taking the image feature point with the closest description vector distance as a matched point, and taking the matched point pair with the Hamming distance being more than or equal to two times of the minimum distance as correct matching data by the following logic:

d(x,y)＝Σx[i]⊕y[i]

where i is 0,1, …, n-1, and x and y are n-bit codes, ≧ indicates either.

S313, using the parallax calculation principle to process the correct matching data according to the following logic to obtain the left camera imaging point P _L Imaging point P with right camera _R Distance between:

P _L P _R ＝b-(X _L -X _R )

wherein b is the projection center O of the left and right cameras _L 、O _R Distance of line of (1), X _L And X _R Respectively the distance from the imaging point of the left camera and the imaging point of the right camera to the left imaging surface;

s314, processing the distance from the imaging point of the left camera to the imaging surface of the left camera by the following logic to obtain parallax data:

d＝|X _L -X _R |；

s315, processing the parallax data according to the following logic to obtain image depth data Z:

s316, processing not less than 2 initial disparity maps in combination according to a preset fusion criterion, disparity data and image depth data to obtain an applicable precision disparity map, and converting the disparity data into image depth information to obtain three-dimensional point cloud on the surface of the object;

s317, defining the three-dimensional point cloud P { (P) on the surface of the object ₁ ,n ₁ ),…,(p _n ,n _N ) Converting the reconstructed curved surface S into reconstructed x according to the following indication function _M To obtain a gridded point cloud:

and

s318, utilizing Stokes formula to connect gridded point cloud, point cloud normal vector and indication function chi _M And reconstructing a three-dimensional point cloud grid of the surrounding scene according to the data to obtain a three-dimensional model of the surrounding scene of the blind.

The method comprises the steps of acquiring image data in real time through a multi-view image acquisition device, extracting image characteristic points, acquiring characteristic point matching pairs by using a matching algorithm, acquiring depth information through a parallax calculation principle, thereby acquiring a three-dimensional point cloud on the surface of an object, and acquiring a three-dimensional model of a scene surrounding the blind after point cloud meshing.

In a more specific embodiment, step S32 includes:

s321, according to the three-dimensional model of the scene around the blind, performing color space conversion by using the following logics:

s322, extracting pixel values corresponding to red and green under a Cb channel, finding a preliminary color region, highlighting the red region and the green region, carrying out binarization processing on the red region and the green region, filtering and traversing all contours in a binary image according to the following logic and filtering conditions to obtain the areas of all contours, and screening out signal lamp feature conforming regions, wherein the filtering mode comprises the following steps: area filtration, shape filtration, and density filtration:

s3221, area filtering is performed on all the contours in the binary image according to the following logic and filtering conditions to obtain areas of all the contours, so as to screen out a region where the signal lamp features conform to:

wherein N is the number of candidate image regions, and each region is marked as R _i I-1, 2, …, N, a denotes the area of the contour, and the filtering conditions include: width-to-height ratio of circumscribed rectangle:

R＝R _i (W)/R _i (H)

wherein, W and H are the width and height of the circumscribed rectangle respectively;

s3222, performing density filtering on all the contours in the binary map according to the following logic and filtering conditions to obtain a signal lamp feature conforming area:

where ρ represents density, f (x, y) represents a pixel value of a point (x, y) in the binary map;

s3223, processing the signal lamp feature matching area and density according to the following logic, so as to perform area filtering on all the contours in the binary image:

extracting the central coordinates (X, Y) of the reserved color blocks, the width W and the height H of the circumscribed rectangle;

s323, selecting a detection window area block according to the central coordinates (X, Y), the width W and the height H of the circumscribed rectangle to extract HOG characteristics, and obtaining gradient values of the horizontal direction and the vertical direction of the image by utilizing the following logic processing:

G _x (x,y)＝H(x+1,y)-H(x-1,y)

G _y (x,y)＝H(x,y+1)-H(x,y-1)

wherein G is _x (x, y) represents gradient values;

s324, processing the gradient values of the image in the horizontal direction and the vertical direction by the following logics to obtain gradient amplitudes of pixel points: g (x, y) ═ G _x (x,y)-G _y (x, y) and directions

S325, uniformly dividing the image into a plurality of cells to obtain HOG characteristics of the cells, forming a block by adjacent cells, and normalizing the block to obtain the HOG characteristics of the block;

s326, classifying the HOG characteristics of the block by using a linear SVM classifier to obtain traffic light prompt information.

The invention can automatically detect and identify the state of the traffic lights at the traffic intersection, and prompt the blind in a mode suitable for the blind to sense the sense through vibration and the like, so as to guide the blind to safely pass through the road.

In a more specific embodiment, step S33 includes:

s331, constructing a multi-scale face image pyramid by using a preset deep learning network model, scanning image data, eliminating a detection window with a preset proportion, and adjusting the size and the position of the detection window to be close to a potential face image;

s332, suppressing and combining the detection windows by using a non-maximum value to obtain candidate detection windows, and normalizing image areas corresponding to the candidate detection windows;

s333, correcting the candidate detection window, and inhibiting the candidate detection window by utilizing the non-maximum value to obtain an applicable detection window;

s334, normalizing the applicable detection window to a preset specification, classifying to obtain face candidate windows, performing window combination by using non-maximum suppression, and correcting the face candidate windows to obtain a face detection result;

s335, predicting feature points of five sense organs by using a convolutional neural network in a preset feature point positioning algorithm, connecting each convolutional neural network through an averaging method, and carrying out position averaging;

s336, predicting and processing a face detection result by utilizing each convolutional neural network to carry out face alignment so as to obtain an aligned face image;

s337, graying the aligned face image, and calculating the gradient value of each pixel in the aligned face image by using the following logics:

G(x,y)＝dx(i,j)+dy(i,j)

wherein the content of the first and second substances,

for the x-axis image gradient,

for the y-axis image gradient, l is the value of the image pixel, (i, j) is the coordinate of the pixel;

s338, dividing and aligning the face images, counting a gradient histogram according to the face description vectors to obtain face description vectors, combining the face description vectors to obtain the face image feature vectors, comparing the face image feature vectors with feature vectors stored in a preset database, and calculating the cosine distance between the face image feature vectors and the feature vectors stored in the preset database by using the following logic so as to judge and obtain character recognition data:

wherein A, B is an n-dimensional vector, and θ is an included angle between the two.

In a more specific embodiment, step S34 includes:

s341, obtaining a data sample according to the three-dimensional model of the blind peripheral scene, calculating a real boundary frame in the data sample by using a K-means algorithm to obtain a sample containing the real boundary frame, and loading the data sample of the preset target real boundary frame to obtain a preset multi-target detection model, wherein the preset multi-target detection model comprises: a YOLO multi-target detection model based on DenseNet;

s342, comparing the real boundary frame serving as priori knowledge with the output of the multi-target detection model to optimize the current multi-target detection model, adding a CBAM (cubic boron nitride) module into the multi-target detection model to fuse shallow features and deep features to obtain a difference size feature map, and acquiring a prediction boundary frame according to the difference size feature map;

s343, obtaining a true prediction boundary frame intersection ratio by utilizing the following logic processing to be used as a loss function:

wherein A is a real bounding box and B is a predicted bounding box;

s344, training a preset multi-target detection model according to the loss function, and taking the current multi-target detection model as an applicable multi-target detection model when the conformity of the predicted boundary box and the real boundary box meets the loss function;

s345, inputting the image data into an applicable multi-target detection model, inhibiting and reserving a detection boundary frame with the confidence coefficient higher than a preset threshold value through a non-maximum value to output a video frame sequence containing the detection boundary frame, and identifying the position of a fixed obstacle;

s346, inputting a video frame sequence to a pre-training feature extraction network to extract and detect the feature vector of the image at the position of the bounding box by using a Deepsort algorithm;

s347 obtains a first video frame in the sequence of video frames, and obtains an applicable bounding box according to a preset confidence threshold and a preset bounding box non-maximum suppression logic screening. Initializing a series of motion variables in a Kalman filter, and creating a tracking boundary box corresponding to a detection boundary box;

s348, for the subsequent video frames, according to the positions of the detection boundary frames, performing motion information association by using a Kalman filtering algorithm, according to the feature vectors of the position images of the detection boundary frames, performing feature association by using the following Hungary algorithm logic to obtain the Mahalanobis distance between the detection boundary frame of the previous frame and the tracking boundary frame of the previous frame, and accordingly acquiring the moving obstacle track:

wherein d is _j To detect the bounding Box Pixel coordinates, y _i In order to track the bounding box pixel coordinates,

the inverse of the ith tracking bounding box coordinate covariance matrix is represented.

The invention detects and identifies the surrounding obstacles to obtain the types and the directions of the surrounding obstacles, combines the distance of the obstacles within 3.5 meters of the depth information acquisition equipment, and identifies and tracks the moving objects by utilizing the multi-target tracking technology to judge the motion trail of the moving objects. The position of the barrier can be effectively ascertained, the blind can be helped to avoid the barrier, and the safety of the blind in going out is guaranteed.

In a more specific embodiment, step S4 includes:

s41, identifying acquaintances according to the person identification data to obtain names, person orientation information and person distances of the acquaintances;

s42, planning the blind person traveling route according to the three-dimensional model of the scene surrounding the blind person, the traffic light state, the moving obstacle track, the fixed obstacle position, the character direction information and the character distance, and generating and sending the real-time positioning electronic map of the blind person to the monitoring terminal.

According to the invention, when the blind person meets an emergency, the blind person can send a signal to the remote blind guiding platform by using the emergency help-seeking key arranged in the equipment for seeking help, and meanwhile, the equipment uploads the position information of the blind person to the platform. The platform timely informs the blind of family members or a place which is closest to the blind in emergency contact according to the meaning of the help seeking signal so as to guarantee the traveling safety of the blind.

In a more specific embodiment, step S5 includes:

s51, receiving traffic light prompt information, character identification data, fixed obstacle positions, moving obstacle tracks and blind person traveling routes;

s52, sending out obstacle avoidance vibration early warning according to the fixed obstacle position and the moving obstacle track;

s53, informing the blind of the crossing passing indication information according to the traffic light prompt information vibration;

s54, vibrating to prompt acquaintance information around the blind according to the character identification data;

and S55, indicating the blind person traveling information according to the blind person traveling route vibration.

The invention utilizes the face recognition technology to carry out face recognition on the characters appearing in the surrounding scene, helps the blind to identify the acquaintances and provides humanized service for the blind group.

In a more specific technical solution, a blind intelligent navigation system using multi-view stereo vision includes:

the data acquisition module is used for acquiring image data by the multi-view image acquisition device, acquiring a camera calibration result by utilizing a preset calibration method and acquiring positioning information;

the data transmission module is used for uploading the image data, the positioning information and the camera calibration result to a preset data processing center and is connected with the data acquisition module;

the data processing module is used for acquiring the three-dimensional model, and performing obstacle detection, moving object tracking, face detection and traffic light detection according to the image data, the positioning information and the camera calibration result so as to plan the traveling route of the blind, the data processing module is connected with the data transmission module, and the step S3 comprises the following steps:

the three-dimensional model acquisition module is used for extracting image characteristic points in the image data, solving logic matching processing image characteristic points according to a preset Hamming distance to obtain characteristic point pairs, obtaining image depth information through fusion processing of a parallax computation principle to acquire three-dimensional point clouds on the surface of an object, and obtaining a three-dimensional stereo model of a blind person surrounding scene through point cloud meshing processing;

the traffic light identification processing module is used for uploading the image data to a preset remote blind guide platform, so that a preset traffic light filtering identification logic is utilized, and the traffic light state is identified according to the image data;

the figure recognition processing module is used for acquiring a face image from the image data and aligning the face, and using a preset face recognition logic to recognize the face according to the face image and face information stored in a preset database so as to obtain figure recognition data;

obstacles and obstacles are identified and classified according to a three-dimensional model of a scene around the blind person, so that the types and the directions of the obstacles are obtained, a preset multi-target monitoring model is utilized to process image depth information, the types and the directions of the obstacles so as to obtain motion information, a Deepsort algorithm is utilized to extract eigenvectors, Kalman filtering algorithm is utilized to process the motion information in an associated manner, Hungary algorithm is utilized to process the eigenvectors in an associated manner, and therefore fixed obstacle positions and moving obstacle tracks are obtained;

the remote blind guiding module is used for planning a blind person traveling route according to traffic light prompt information, character identification data, fixed barrier positions and moving barrier tracks and is connected with the data processing module;

the vibration navigation module is used for processing traffic light prompt information, figure identification data, fixed obstacle positions, moving obstacle tracks and blind person traveling routes so as to obtain prompt signals and generate and send out blind person sensory early warning, and the vibration navigation module is connected with the remote blind guiding module.

Aiming at the defects of the prior art and the prior art, the invention provides navigation equipment and a system for helping the blind to safely go out by utilizing a multi-view camera array and a corresponding navigation algorithm. Compared with the existing navigation system based on the laser radar, the navigation device and the navigation system reduce the dependence of the navigation system on the laser radar, realize the high availability of the multi-camera array device in the field of navigation for the blind, and can effectively guarantee the traveling safety of blind people. Meanwhile, the system can cover more blind people due to lower cost, and helps more blind people to realize safe and convenient travel. Through 5G communication and distributed calculation, a navigation algorithm for the travel of the blind is designed by utilizing a multi-eye stereoscopic vision three-dimensional reconstruction technology, a multi-target detection and tracking technology, a traffic light detection and identification technology and a face detection and identification technology, real-time data acquisition and real-time data processing are achieved, and the data are transmitted back to a remote blind guiding platform in real time. The high availability of the multi-view camera array equipment in the field of blind person navigation is realized, and the travel safety of blind person groups can be effectively guaranteed.

The method comprises the steps of acquiring image data in real time through a multi-view image acquisition device, extracting image characteristic points, acquiring characteristic point matching pairs by using a matching algorithm, acquiring depth information through a parallax calculation principle, thereby acquiring three-dimensional point cloud on the surface of an object, and acquiring a three-dimensional model of a blind person surrounding scene through point cloud meshing.

The invention can enable the blind to send a signal to the remote blind guiding platform for help by using the emergency help-seeking key arranged in the equipment when the blind meets an emergency, and meanwhile, the equipment uploads the position information of the blind to the platform. The platform timely informs the blind of family members or a place which is closest to the blind in emergency contact according to the meaning of the help seeking signal so as to guarantee the traveling safety of the blind. The invention utilizes the face recognition technology to carry out face recognition on the characters appearing in the surrounding scene, helps the blind to identify the acquaintances and provides humanized service for the blind group. The invention solves the technical problems of large blind guiding error, poor portability and high use cost in the prior art.

Drawings

Fig. 1 is a basic flow diagram of a blind intelligent navigation method using multi-view stereo vision in embodiment 1 of the present invention;

fig. 2 is a schematic algorithm flow diagram of a blind intelligent navigation method using multi-view stereo vision according to embodiment 1 of the present invention;

fig. 3 is a schematic view of a parallax calculation principle in embodiment 1 of the present invention;

FIG. 4 is a schematic connection diagram of the intelligent navigation system for blind people using multi-view stereo vision in embodiment 2 of the present invention;

fig. 5 is a schematic diagram of a setup position of the multi-view camera array according to embodiment 2 of the present invention;

fig. 6 is a schematic view of a first view structure of a navigation device according to embodiment 2 of the present invention;

fig. 7 is a second view structural diagram of the navigation device in embodiment 2 of the present invention;

fig. 8 is a schematic structural diagram of a third view angle of the navigation apparatus according to embodiment 2 of the present invention;

fig. 9 is a fourth view structural diagram of a navigation device in embodiment 2 of the present invention;

fig. 10 is a front view schematically illustrating a navigation device according to embodiment 2 of the present invention;

fig. 11 is a schematic view of a back surface of a navigation device in accordance with embodiment 2 of the present invention;

fig. 12 is a schematic connection diagram of subsystems of a blind intelligent navigation system using multi-view stereo vision according to embodiment 3 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1, the intelligent navigation method for the blind using multi-view stereo vision includes the following steps:

s1, the processor sends out a command;

s2, the navigation equipment collects and stores image data, positioning information and camera calibration results;

s3, uploading acquired data to a data processing center through 5G communication, in the embodiment, after the navigation equipment is started and the processor sends an acquisition command, the multi-view image acquisition device acquires image data of surrounding scenes through the multi-view camera array, the Beidou satellite positioning device acquires positioning information of the current equipment and sends a vibration prompt to inform a blind person of the current position through the vibration navigation device, the equipment finishes calibrating the multi-view camera array, and a calibration result of the multi-view camera array is obtained, wherein the calibration result comprises internal parameters and external parameters. Uploading the acquired data to a data processing center through 5G communication;

s4, acquiring a three-dimensional model, and performing obstacle detection, moving object tracking, face detection and traffic light detection;

s5, uploading the acquired data to a remote blind guiding platform through 5G communication, in the embodiment, extracting image feature points by a data processing center, obtaining feature point matching pairs by using a matching algorithm, obtaining depth information by using a parallax calculation principle, thereby obtaining a three-dimensional point cloud on the surface of an object, and obtaining a three-dimensional stereo model of a blind person surrounding scene after point cloud meshing. Meanwhile, surrounding obstacles are detected and identified to obtain the types and the directions of the obstacles, the distance of the obstacles within 3.5 meters of the depth information acquisition equipment is combined, a multi-target tracking technology is utilized to identify and track moving objects, and the motion tracks of the moving objects are judged. In addition, the data processing center detects and identifies the states of traffic lights at the traffic intersection, and a face image appearing in a scene is detected by using a face identification technology. Uploading the acquired data to a remote blind guiding platform through 5G communication;

s6, judging whether a traffic light exists or not;

s7, if yes, identifying the traffic light state;

s8, if not, carrying out face recognition on the acquired face image and the face information stored in the database;

s9, judging whether an acquaintance exists or not;

and S10, if yes, acquiring the name, the direction and the distance of the opposite party, in the embodiment, the remote blind guiding platform carries out face recognition on the acquired face image and face information stored in the database, identifies acquaintances, ignores if not, and acquires the name, the direction information and the distance of the person if the acquaintances do not exist. And finally planning a traveling route according to the three-dimensional model, the traffic light state, the moving object motion track, the direction information and the distance of the barrier and the figure. In addition, the remote blind guiding platform provides an electronic map for blind persons to position in real time for the blind persons, so that the blind persons can obtain the current positions of the blind persons in real time;

and S11, if not, the acquisition platform plans a travel route according to the three-dimensional model, the traffic light state, the motion track of the moving object, the orientation and the distance of the obstacle and the person, and in the embodiment, the navigation equipment receives the peripheral scene information, the obstacle information, the motion information of the moving object, the traffic light state, the judgment result of the face recognition and the travel route sent back by the remote blind guiding platform through 5G communication and sends out corresponding vibration prompts. The vibration navigation device sends out vibration early warning of obstacles within 3.5 meters to the blind person through a plurality of vibration motors according to signals; the vibration informs the blind of the motion track of the moving object at present, and reminds the blind to avoid danger; the vibration informs the blind whether a traffic intersection has a traffic light or not, and helps the blind to safely pass through the intersection; the blind person is informed of whether acquaintances exist around the blind person or not through vibration, and the names, the directions and the distances of the acquaintances; the vibration informs the blind of the next available travel route;

s12, after receiving the indication signal of the remote blind guiding platform through 5G communication, the navigation device completes navigation through vibration lifting, in the embodiment, when an emergency situation occurs, the blind person can use an emergency help-seeking key arranged in the device to send a signal to the remote blind guiding platform through 5G communication to seek help, and meanwhile, the device uploads the position information of the blind person to the platform. The blind person can send help seeking signals to inform the platform through corresponding preset keys according to specific emergency conditions, and the platform can timely inform the blind person of family members or a place which is closest to the blind person in emergency contact according to the meaning of the help seeking signals.

As shown in fig. 2, the implementation algorithm of the blind intelligent navigation method using multi-view stereo vision includes the following steps:

s1', camera calibration, in this embodiment, the camera calibration may adopt a calibration plane-based camera calibration method, such as a zhangnyou calibration method. For two-dimensional points m ═ u, v] ^T And the three-dimensional point M ═ X, Y, Z corresponding thereto] ^T Their augmented vector can be expressed as m' ═ u, v,1] ^T And M' ═ X, Y, Z,1] ^T . General formula sm' of Zhangyingyou calibration method obtained according to camera model]M' wherein

Is the camera internal reference matrix, [ R | t]Is the camera external parameter matrix, s is the scaling factor. Nonlinear optimization method capable of obtaining camera intrinsic parameters and extrinsic parameters through equation solution and combining maximum likelihood estimation

Optimizing parameters, wherein n is the number of calibration planes, and m is the number of angular points of the calibration planes;

s2', extracting image feature points, in this embodiment, whether a pixel is an angular point can be determined according to the difference between the pixel and the surrounding neighborhood. A pixel can be considered a corner if it differs significantly from the surrounding neighbourhood. Selecting a pixel p in the image having a grey value I _p . Setting a threshold T, e.g. taking T as I _p 20% of the total weight of the steel. M pixels around p with radius r are selected as comparison pixels. Selected circles having N successive onesPixel is greater than I _p + T or I _p T, then the pixel p can be considered as a corner point, i.e. a feature point. After the first version detection is finished, non-maximum value suppression is used, only the feature points responding to the maximum value are reserved in a certain area, and the condition that the feature points are concentrated is avoided. And a binary string represented by 0 and 1 is used as a description vector to represent the gray values of p and q pixels in the neighborhood of the feature point. If p is greater than q, then 1 is taken, otherwise 0 is taken. Selecting n pairs of p and q pixel pairs around the feature points to obtain n-dimensional vectors consisting of 0 and 1;

s3', feature point matching, in this embodiment, the distances between a feature point description vector and all other feature point description vectors are calculated, then the obtained distances are sorted, the closest one is taken as a matching point, and a point pair that has been matched is selected as a criterion with a hamming distance less than twice the minimum distance, if the distance is less than the value, it is considered as an incorrect matching, and the matching is filtered out. Otherwise, it is considered as a correct match. Wherein, the hamming distance formula is d (x, y) ═ Σ x [ i ], ] y [ i ], where i ═ 0,1, …, n-1, x, y are both n-bit codes, and ^ indicates either or;

as shown in fig. 3, S4' acquires depth information according to the principle of parallax calculation, in this embodiment, the projection centers O of the left and right cameras _L 、O _R Is b, i.e. the baseline. The imaging point of any point P in the three-dimensional space on the left camera is P _L The imaging point of the right camera is P _R 。X _L And X _R The distances from the imaging points of the left and right cameras to the left imaging plane, respectively, the parallax of the point P in the left and right cameras can be defined as d ═ X _L -X _R L. Two imaging points P _L And P _R Is P from one another _L P _R ＝b-(X _L -X _R ) Based on the principle of similar triangles

Thereby obtaining

I.e. depth information of point P;

s5', obtaining a three-dimensional point cloud on the surface of the object, in this embodiment, synthesizing a plurality of obtained initial disparity maps into a disparity map according to a fusion criterion to improve the accuracy of the disparity map, and converting the disparity into depth information according to the relationship between disparity and depth, thereby obtaining the three-dimensional point cloud on the surface of the object;

s6', meshing point clouds, and in the present embodiment, with respect to the acquired three-dimensional point cloud P { (P) ₁ ,n ₁ ),…,(p _n ,n _N ) The measured object M and the curved surface S to be reconstructed are obtained according to the indication function

And

converting the problem of reconstructing the surface S into a reconstructed χ _M The point cloud and its normal vector and indicating function chi can be obtained by using Stokes formula _M Performing association to realize the mesh reconstruction of the three-dimensional point cloud of the surrounding scene;

s7', performing obstacle detection and moving object tracking, in this embodiment, acquiring data samples, and calculating a real bounding box in the data samples by using a K-means algorithm to obtain data samples including the real bounding box. And loading the data sample labeled with the target real bounding box in advance into a pre-established densinet-based YOLO multi-target detection model. The real bounding box is used as prior knowledge in model training and is compared with the output of the model to optimize the existing model. In the multi-target detection model, a CBAM module for fusing shallow features and deep features is added into a feature fusion network of a DenseNet neural network architecture to obtain a feature extraction network architecture. DenseNet utilizes a characteristic pyramid structure and combines a CBAM module to perform weight distribution on the characteristics of each channel of the extracted characteristic diagram so as to learn the characteristic correlation among different channels, and utilizes the characteristics of different scales to fuse the shallow characteristics and the deep characteristics. And (4) obtaining a prediction boundary frame by carrying out 1 x 1 convolution block on the feature maps with different sizes obtained by fusion. Each predicted edgeThe bounding box contains 4 location coordinates, confidence and class probabilities. And training the multi-target detection model according to a preset loss function, and obtaining the trained multi-target detection model when the coincidence degree of the predicted boundary frame and the real boundary frame meets the preset loss function. Wherein the loss function is the intersection ratio of the real bounding box and the predicted bounding box corresponding to the video frame sequence. The cross-over ratio is defined as

Where A is the true bounding box and B is the predicted bounding box. Inputting the video data to be detected into a trained multi-target detection model, inhibiting and retaining a detection boundary box with the confidence coefficient higher than a preset threshold value through a non-maximum value, and outputting a video frame sequence containing the detection boundary box. Therefore, the detection and identification of the obstacle are realized. And inputting the video frame sequence containing the detection boundary box into a trained feature extraction network, and performing feature extraction on data in the detection boundary box to obtain a corresponding feature vector which can be used for tracking a moving object.

And tracking the moving object, namely acquiring a first video frame in the video frame sequence by adopting a Deepsort algorithm, reading the position of the detection boundary frame and detecting the characteristic vector of the image in the boundary frame. And preliminarily screening candidate detection boundary frames according to a preset confidence threshold, and further screening the candidate boundary frames by utilizing non-maximum value inhibition to eliminate the condition that a plurality of detection boundary frames exist on one target. A series of motion variables in the kalman filter are initialized and a tracking bounding box corresponding to the detection bounding box is created. And for subsequent video frames, performing motion information association based on the position of the detection boundary box and performing feature association based on the feature vector. The motion information association is mainly performed through a Kalman filtering algorithm, and covariance prediction of a tracking boundary box in a previous video frame and a detection boundary box in a current frame is obtained through a Kalman filter. The feature association is realized through a depth feature extraction network and a Hungarian algorithm, the trained feature extraction network carries out feature vector extraction on a detection bounding box of a current video frame, then the Hungarian algorithm is used for matching, and the detection bounding box of the current frame and a tracking bounding box of a previous frame are calculatedMahalanobis distance between frames:

the inverse of the ith tracking bounding box coordinate covariance matrix is represented. Thereby finally realizing the tracking of a plurality of objects;

s8', obtaining a three-dimensional model, a motion track of a moving object, and the direction and distance of an obstacle;

s9', traffic light detection, in this embodiment, the RGB space is first converted into the YCbCr space, and the conversion formula is as follows:

and after color space conversion, extracting pixel values corresponding to red and green under a Cb channel to find a preliminary color area. After the preliminary color regions are determined, the color regions of red and green are highlighted by color enhancing the preliminary color regions and color suppressing the non-relevant regions. After binarization processing is carried out, all contours in the binary image are traversed in sequence, region filtering is carried out according to the contour characteristics, and regions which accord with the characteristics of the traffic signal lamps are screened out. Wherein, the region filtering method comprises area filtering, shape filtering and density filtering. Area filtering traverses all the outlines in the binary image and calculates the areas of all the outlines;

s10', judging whether there is traffic light, in this embodiment, it is set that the image has N candidate regions, each region is marked as R _i I-1, 2, …, N, a denotes the area of the contour,

shape filtering is carried out on the basis of each contour, external rectangles of each contour are sequentially obtained, the width W and the height H of the external rectangles are obtained, and the filtering condition is that the width-height ratio R of the external rectangles is R _i (W)/R _i (H)，

And requires less than 100 pixels in both width and height. Density Filter the density of each contour is calculated from its area

Where f (x, y) represents the pixel value of point (x, y) in the binary image. After the filtration of the area, the mixture is filtered,

s11', if yes, performing traffic light recognition, and in this embodiment, selecting a detection window area block according to the obtained central coordinate, the width and height frame of the circumscribed rectangle. Extracting HOG characteristics in the detection window, and calculating gradient values of the image in the horizontal direction and the vertical direction, which are G respectively _x (x, y) ═ H (x +1, y) -H (x-1, y) and G _y (x, y) ═ H (x, y +1) -H (x, y-1), and then further calculating gradient amplitude G (x, y) ═ G of pixel point _x (x,y)-G _y (x, y) and directions

The image is uniformly divided into a plurality of cells to obtain HOG characteristics of the cells, adjacent cells form a block, and the block is normalized to obtain the HOG characteristics of the block. Classifying the extracted HOG characteristics by combining a linear SVM, and obtaining the type, color and direction information of the traffic signal lamp through an SVM classifier;

s12', if not, performing face detection, in this embodiment, the face detection may adopt a detection algorithm based on deep learning, such as Cascade CNN. Cascade CNN firstly constructs a multi-scale face image pyramid, 12-net scans the whole image densely, 90% of detection windows are removed quickly, and the rest detection windows are sent to 12-registration-net to adjust the size and the position of the detection windows to be closer to the vicinity of a potential face image. Then, by using the non-maximum value to suppress and merge the detection windows with high overlapping, the corresponding image area of the remaining candidate detection window will be normalized to 24 × 24 as the input of 24-net, which will further eliminate the remaining nearly 90% of the detection windows. Correcting the residual detection windows through 24-calibretion-net, and further reducing the number of the detection windows by utilizing non-maximum value inhibition;

s13', judging whether a face appears, in the embodiment, normalizing the detection window to 48 x 48, sending the detection window into 48-net for classification to obtain a face candidate window, carrying out window combination by using non-maximum suppression, sending the face candidate window into a 48-calibrration-net correction detection window, and obtaining a final face detection result;

s14 ', if yes, performing face alignment, otherwise, executing step S19' to plan a travel route according to the three-dimensional model, the traffic light state, the motion trajectory of the moving object, the orientation and distance of the obstacle and the person. And the DCNN completes the feature point positioning through three parts of level 1, level 2 and level 3. First, level 1 is composed of three convolutional neural networks, F1, EN1, NM 1. The input of F1 is the whole face image, the input size is 39 x 39, and a 10-dimensional feature vector is output through a convolutional neural network and is used for predicting five feature points of eyes, a nose, left and right mouth corners. Similarly, EN1 is used to predict three feature points of both eyes and nose, and NM1 is used to predict three feature points of left and right mouth corners and nose. Then, F1, EN1, and NM1 are connected by an averaging method, and feature points repeatedly predicted are averaged, so that an excessive deviation of the feature point positions is prevented, and the accuracy is improved. Then, taking five predicted feature points obtained by level 1 as the center, cutting the area where each feature point is located, wherein the cutting size is 15 × 15, and the cutting size is used as the input of level 2. level 2 is composed of 10 convolutional neural networks, every two convolutional neural networks are paired, one feature point is predicted, and the prediction result is also averaged. level 3 also consists of 10 convolutional neural networks, and every two convolutional neural networks predict one characteristic point. And (3) cutting again on the basis that the level 2 obtains the position of the predicted characteristic point, wherein the cutting size is 15 multiplied by 15, and predicting again. Therefore, the face alignment can be finally realized;

s15' and representing the human face feature, in this embodiment, the image is grayed, the image is normalized by using a gamma correction method in a color space, so as to adjust the contrast of the image, reduce the influence caused by the local shadow and illumination change of the image, and then calculate the gradient value of each pixel of the image. The image gradient G (x, y) ═ dx (i, j) + dy (i, j). The median difference can be used to calculate the image gradient:

where l is the value of an image pixel and (i, j) is the coordinates of the pixel. Dividing the image into cells, and counting the histogram of gradients of each cell may form a description vector for each cell. And forming a block by every several cells, and combining the description vectors of all the cells in each block to obtain the description vector of the block. Combining the description vectors of all blocks of the image to obtain a characteristic vector of the image;

s16', face recognition, in this embodiment, the extracted feature vector of the face is compared with the feature vectors of the face stored in the database, and the cosine distance between the two feature vectors is calculated, and the closer the included angle between the two feature vectors is, the more similar the features are;

s17', determining whether there is an acquaintance, in this embodiment, determining the identity information of the face according to the similarity between the acquaintance and the acquaintance. Wherein the cosine distance is

Wherein A, B is an n-dimensional vector, and theta is an included angle between the two;

s18', if yes, acquiring the name, the azimuth and the distance of the person;

s19', if not, planning a traveling route according to the three-dimensional model, the traffic light state, the motion trail of the moving object, the orientation and the distance of the obstacle and the person;

s20', an indication signal is sent out.

Example 2

As shown in fig. 4, the intelligent navigation device for the blind using the multi-view stereo vision provided by the invention comprises a power supply device 1, a multi-view image acquisition device 2, a data control device 3, a Beidou satellite positioning device 4, a 5G communication device 5 and a vibration navigation device 6.

The starting key in the power supply device 1 is used for starting equipment, the lithium battery serves as a power supply, a 3.5mm power interface is reserved for charging, and portable wearing and normal work of the navigation equipment are guaranteed.

The multi-view image acquisition device 2 is composed of a multi-view camera array and is used for acquiring image data of surrounding scenes and realizing real-time acquisition of the conditions of the surrounding scenes.

As shown in fig. 5, the multi-view camera array of the present device mainly includes four-view cameras, and the selection of the multi-view cameras can be determined according to actual requirements. The four-eye camera array comprises four cameras, the four cameras are approximately arranged in a kite shape and are respectively located on the head, the chest and the shoulders of a wearer, the head camera 21 is located in the center of the forehead, the chest camera 23 is located in the chest position, and the shoulder cameras 22 are respectively inclined inwards. The larger the focal length of the camera is, the farther the visible distance is, and the smaller the field angle is, so that the shooting range of the equipment for the peripheral scene of the blind is preferably within 20 meters for the three-dimensional reconstruction of the peripheral scene of the blind.

The data control device 3 includes an edge calculation main board 31, a memory 32, and a processor 33. The processor 33 is configured to send an acquisition command to control the multi-view image acquisition device 2 to acquire image data, and control the Beidou satellite positioning device 4 to acquire positioning information. The memory 32 is used for storing multi-view image data, Beidou positioning information and calibration results of the multi-view camera array, wherein the calibration results comprise internal parameters and external parameters.

The Beidou satellite positioning device 4 comprises a Beidou module 41 and a Beidou antenna 42, and positioning information of equipment is acquired through a Beidou satellite navigation system, so that the positioning information of the equipment can be accurately acquired by a remote blind guiding platform.

The 5G communication device 5 includes a 5G module 51 and a 5G antenna 52, and implements low-delay, high-bandwidth, real-time data return to the remote blind guiding platform.

The vibration navigation device 6 comprises a vibration motor group 61 and a vibration communication interface 62, the vibration communication interface is used for realizing data signal transmission in the equipment, and the vibration motor generates a corresponding vibration prompt according to a received signal to realize accurate navigation and help the blind person to go out safely.

As shown in fig. 6 to 9, the navigation device includes a vibration communication interface 1 ', a heat dissipation hole 2 ', an RG45 interface 3 ', a USB camera interface 4 ', an upper cover 5 ', a 3.5 power supply port 6 ', a start button 7 ', a data bus 8 ', and a shock-absorbing bracket 9 '. Fig. 6 to 9 also show the installation positions of the 5G module 51, the 5G antenna 52, the beidou module 41, the beidou antenna 42, the edge calculation main board 31 and the power supply device 1 on the navigation device.

As shown in fig. 10 and 11, the navigation device further includes a plurality of emergency help keys 10 ', a data bus 8', a hat 12 'and a strap 13'. The RG45 interface 3' serves as a network card interface for network data transmission. The USB camera interface 4' is connected with the multi-view image acquisition device 2 and is used for transmitting image data acquired by the multi-view image acquisition device 2. The shock absorption support 9' is used for playing a shock absorption and buffering role on internal components of the equipment. The emergency help-seeking key 10' is used for helping the blind person to send help-seeking signals to the remote blind-guiding platform in time under emergency, and the safety of the blind person is guaranteed.

In the navigation device, the vibration motor set 61 is distributed on the inner side of the hat 12 ', the multi-view camera is fixed on the outer side of the hat 12 ' or on the shoulder strap 13 ' according to the acting part, the width of the shoulder strap 13 ' can be 4 cm and is in a strip shape, the clamp and the buckle on the shoulder strap 13 ' play a fixing role, the main body part of the device is fixed on the waist through the clamp of the shell, and the device main body 101, the shoulder strap 13 ' and the hat 12 ' are connected together. During wearing, the blind person can fix the device main body 101 at the waist through the clamp, then wear the hat 12 ' connected with the camera and the vibrating motor group 61, fix the braces 12 ' at the two sides of the shoulder one by one, finally press the starting button 7 ' to start the device, and after the device is started, the blind person confirms that the device starts to work normally according to the vibration prompt of the vibrating motor group 61 in the hat.

Example 3

As shown in fig. 12, the intelligent navigation system for blind people using multi-view stereo vision provided by the present invention comprises a data acquisition system 101 ', a data transmission system 102 ', a data processing system 103 ', a remote blind guiding system 104 ' and a vibration navigation system 105 '.

The data acquisition system 101' is used for acquiring image data shot by the multi-view image acquisition device 2 on a peripheral scene after a navigation command is triggered, acquiring positioning information of current equipment through the Beidou satellite navigation system, completing calibration on the multi-view camera array and acquiring a calibration result of the multi-view camera array, wherein the calibration result comprises internal parameters and external parameters.

The data transmission system 102 'is used for uploading the acquired image data, the Beidou positioning information and the calibration result of the multi-view camera array to the data processing system 103' through 5G communication.

The data processing system 103' extracts image feature points from the acquired image data, obtains feature point matching pairs by using a matching algorithm, obtains depth information by using a parallax computation principle, thereby obtaining three-dimensional point cloud on the surface of the object, and obtains a three-dimensional model of the blind surrounding scene after point cloud meshing. Meanwhile, surrounding obstacles are detected and identified, the types and the directions of the obstacles are obtained, the distance of the obstacles within 3.5 meters of the depth information obtaining equipment is combined, a multi-target tracking technology is utilized to identify and track moving objects, and the motion tracks of the moving objects are judged. In addition, the data processing center detects and identifies the states of traffic lights at the traffic intersection, and a face image appearing in a scene is detected by using a face identification technology. And uploading the acquired data to the remote blind guiding system 104' through 5G communication.

The remote blind guiding system 104 'is used for further analyzing and processing the data uploaded by the data processing system 103', performing face recognition on the obtained face image and the face information stored in the database, identifying acquaintances, ignoring if not, and acquiring the name, the azimuth information and the distance of a person if the acquaintances do not exist. And finally planning a traveling route according to the three-dimensional model, the traffic light state, the moving track of the moving object, the direction information and the distance of the obstacle and the character. In addition, the remote blind guiding system 104' will generate an electronic map for locating the blind in real time, so that the family members of the blind can obtain the current position of the blind in real time.

The vibration navigation system 105' receives the peripheral scene information, the obstacle information, the moving object motion information, the traffic light state, the judgment result of the face recognition and the travelling route sent back by the remote blind guiding platform through 5G communication, and sends out corresponding vibration prompts. The vibration navigation system 105' sends out vibration early warning of obstacles within 3.5 meters to the blind according to the signals; the vibration informs the blind of the motion track of the moving object at present, and reminds the blind to avoid danger; the vibration informs the blind whether a traffic intersection has a traffic light or not, and helps the blind to safely pass through the intersection; the blind person is informed of whether acquaintances exist around the blind person or not through vibration, and the names, the directions and the distances of the acquaintances; the vibration informs the blind of the next available travel route. Therefore, accurate navigation is realized, and the blind can safely go out.

The invention can automatically detect and identify the states of traffic lights at the traffic intersection, and prompt the blind in a way suitable for the blind to sense by vibration and the like, so as to guide the blind to safely pass through the road.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A blind intelligent navigation method utilizing multi-view stereo vision is characterized by comprising the following steps:

s3, acquiring a three-dimensional model, and performing obstacle detection, moving object tracking, face detection and traffic light detection according to the image data, the positioning information and the camera calibration result so as to plan the traveling route of the blind, wherein the step S3 comprises the following steps:

s31, extracting image feature points in the image data, solving logic matching according to a preset Hamming distance to process the image feature points to obtain feature point pairs, obtaining image depth information through fusion processing of a parallax computation principle to obtain three-dimensional point clouds on the surface of an object, and obtaining a three-dimensional stereo model of a blind person surrounding scene through point cloud meshing processing;

s32, uploading the image data to a preset remote blind guiding platform, and identifying the traffic light state according to the image data by utilizing a preset traffic light filtering and identifying logic;

s33, acquiring a face image from the image data and aligning the face, and using a preset face recognition logic to recognize the face according to the face image and face information stored in a preset database so as to obtain character recognition data;

s34, identifying and classifying barriers according to the three-dimensional model of the blind person peripheral scene to obtain the types and the orientations of the barriers, processing the image depth information, the types and the orientations of the barriers by using a preset multi-target monitoring model to obtain motion information, extracting eigenvectors by using a Deepsort algorithm, performing association processing on the motion information by using a Kalman filtering algorithm, and performing association processing on the eigenvectors by using a Hungary algorithm to obtain fixed barrier positions and moving barrier tracks;

s4, planning a blind person traveling route according to the traffic light prompt information, the character recognition data, the fixed obstacle position and the moving obstacle track;

s5, processing the traffic light prompting information, the character recognition data, the fixed obstacle position, the moving obstacle track and the blind person traveling route to obtain a prompting signal, and generating and sending out a blind person sense early warning.

2. The intelligent navigation method for the blind using multi-view stereo vision as claimed in claim 1, wherein said step S1 includes:

s11, receiving a processor acquisition command by a multi-view image acquisition device to acquire the image data of the surrounding scene through a multi-view camera array;

s12, calibrating the multi-view camera array by the following logic by using the preset calibration method to obtain the camera calibration result of the multi-view camera array, wherein the camera calibration result comprises: internal and external parameters:

m′＝[u,v,1] ^T and M' ═ X, Y, Z,1] ^T ；

S13, obtaining a general formula of a Zhang Zhengyou scaling method according to the preset camera model processing:

sm′＝A[R|t]M′

wherein

Is the camera internal reference matrix, [ R | t]The method comprises the steps of (1) obtaining a camera external parameter matrix, wherein s is a scaling coefficient;

s14, solving an equation according to the general formula of the Zhangyingyou scaling method to obtain the internal parameters and the external parameters;

and S16, acquiring the current positioning information by using a preset Beidou satellite positioning device, and vibrating by using a vibration navigation device to prompt the current position of the blind person.

3. The intelligent navigation method for the blind using multi-view stereo vision according to claim 1, wherein in step S2, the image data, the positioning information and the camera calibration result are uploaded to a data processing center via 5G communication.

4. The intelligent navigation method for the blind using multi-view stereo vision as claimed in claim 1, wherein said step S31 includes:

s311, judging the difference degree between the pixel and the surrounding neighborhood according to a preset threshold value T, judging and acquiring a pixel corner point according to the difference degree, using the pixel corner point as a first version detection feature point, utilizing a non-maximum value to inhibit and process the first version detection feature point so as to reserve and respond to the maximum value feature point, selecting n pairs of p and q pixel pairs around the maximum value feature point, and obtaining an n-dimensional feature point description vector according to the n pairs of p and q pixel pairs.

where i is 0,1, …, n-1, and x and y are n-bit codes, ≧ indicates either.

S313, processing the correct matching data by the following logic by using the parallax calculation principle to obtain a left camera imaging point P _L Imaging point P with right camera _R Distance between:

P _L P _R ＝b-(X _L -X _R )

wherein b is the projection center O of the left and right cameras _L 、O _R Distance of line of (A), X _L And X _R Respectively the distance from the imaging point of the left camera and the imaging point of the right camera to the left imaging surface;

s314, processing the distance from the imaging points of the left camera and the right camera to the left imaging surface by the following logic to obtain parallax data:

d＝|X _L -X _R |；

s315, processing the disparity data according to the following logic to obtain the image depth data Z:

s316, processing not less than 2 initial disparity maps in combination according to a preset fusion criterion, the disparity data and the image depth data to obtain an applicable precision disparity map, and converting the disparity data into the image depth information to obtain the three-dimensional point cloud on the object surface;

s317, three-dimensional point cloud P { (P) of the surface of the object ₁ ,n ₁ ),…,(p _n ,n _N ) Converting the reconstruction curved surface S into a reconstruction x according to the following indication function _M To obtain a gridded point cloud:

s318, linking the gridded point cloud, the point cloud normal vector and the indicating function chi by utilizing the Stokes formula _M And reconstructing a three-dimensional point cloud grid of the peripheral scene according to the data to obtain the three-dimensional stereo model of the peripheral scene of the blind person.

5. The intelligent navigation method for the blind using multi-view stereo vision as claimed in claim 1, wherein said step S32 includes:

s321, according to the three-dimensional model of the blind peripheral scene, performing color space conversion by using the following logics:

R＝R _i (W)/R _i (H)

s3222, performing density filtering on all the contours in the binary map according to the following logic and filtering conditions to obtain the signal lamp feature conforming area:

s3223, processing the signal lamp feature conforming region and the density according to the following logic, and performing region filtering on all contours in the binary image according to the region filtering:

s323, selecting a detection window region block according to the central coordinates (X, Y), the width W and the height H of the circumscribed rectangle to extract the HOG feature, and obtaining the horizontal and vertical gradient values of the image by using the following logic processing:

G _x (x,y)＝H(x+1,y)-H(x-1,y)

G _y (x,y)＝H(x,y+1)-H(x,y-1)

wherein G is _x (x, y) represents gradient values;

s324, processing the image horizontal and vertical gradient values with the following logic,to obtain gradient amplitude of pixel point: g (x, y) ═ G _x (x,y)-G _y (x, y) and directions

S325, uniformly dividing the image into a plurality of cells to obtain HOG characteristics of the cells, forming adjacent cells into a block, and normalizing the block to obtain the HOG characteristics of the block;

and S326, classifying the HOG characteristics of the block by using a linear SVM classifier to obtain the traffic light prompt information.

6. The intelligent navigation method for the blind using multi-view stereo vision as claimed in claim 1, wherein said step S33 includes:

s331, constructing a multi-scale face image pyramid by using a preset deep learning network model, scanning the image data, eliminating a detection window with a preset proportion, and adjusting the size and the position of the detection window to approach a potential face image;

s332, restraining and merging the detection windows by using a non-maximum value to obtain candidate detection windows, and normalizing image areas corresponding to the candidate detection windows;

s333, correcting the candidate detection window, and inhibiting the candidate detection window by utilizing a non-maximum value to obtain an applicable detection window;

s334, normalizing the applicable detection window to a preset specification, classifying to obtain face candidate windows, carrying out window combination by using non-maximum suppression, and correcting the face candidate windows to obtain a face detection result;

s336, predicting and processing the face detection result by utilizing each convolutional neural network to carry out face alignment so as to obtain an aligned face image;

G(x,y)＝dx(i,j)+dy(i,j)

wherein the content of the first and second substances,

for the x-axis image gradient,

s338, dividing the aligned face image, obtaining a face description vector according to a statistical gradient histogram, combining to obtain a face image feature vector, comparing the face image feature vector with feature vectors stored in the preset database, and calculating the cosine distance between the face image feature vector and the feature vectors stored in the preset database by using the following logic so as to judge and obtain person identification data:

7. The intelligent navigation method for the blind using multi-view stereo vision as claimed in claim 1, wherein said step S34 includes:

s341, obtaining a data sample according to the three-dimensional model of the blind peripheral scene, calculating a real boundary frame in the data sample by using a K-means algorithm to obtain a sample containing the real boundary frame, and loading a preset target real boundary frame data sample into a preset multi-target detection model, wherein the preset multi-target detection model comprises: a YOLO multi-target detection model based on DenseNet;

s343, obtaining the intersection ratio of the real prediction boundary box by using the following logic processing as a loss function:

wherein A is a real bounding box and B is a predicted bounding box;

s344, training the preset multi-target detection model according to the loss function, and taking the current multi-target detection model as an applicable multi-target detection model when the conformity of the prediction boundary box and the real boundary box meets the loss function;

s345, inputting the image data into the applicable multi-target detection model, inhibiting and reserving a detection boundary frame with a confidence coefficient higher than a preset threshold value through a non-maximum value to output a video frame sequence containing the detection boundary frame, and identifying a fixed obstacle position according to the video frame sequence;

s346, inputting the video frame sequence to a pre-training feature extraction network to extract and detect the feature vector of the image at the position of the bounding box by using a deep sort algorithm;

s347 obtains a first video frame in the video frame sequence, and obtains an applicable bounding box according to a preset confidence threshold and a preset bounding box non-maximum suppression logic. Initializing a series of motion variables in a Kalman filter, and creating a tracking boundary box corresponding to a detection boundary box;

s348, for the subsequent video frames, performing motion information association by using the Kalman filtering algorithm according to the positions of the detection boundary frames, performing feature association by using the following Hungarian algorithm logic according to the feature vectors of the position images of the detection boundary frames to obtain the Mahalanobis distance between the detection boundary frame of the previous frame and the tracking boundary frame of the previous frame, and accordingly acquiring the movement obstacle track:

8. The intelligent navigation method for the blind using multi-view stereo vision as claimed in claim 1, wherein said step S4 includes:

s41, identifying acquaintances according to the character identification data, and acquiring names, character azimuth information and character distances of the acquaintances;

s42, planning the blind person advancing route according to the three-dimensional model of the blind person surrounding scene, the traffic light state, the moving obstacle track, the fixed obstacle position, the character direction information and the character distance, generating and sending the real-time positioning electronic map of the blind person to the monitoring terminal.

9. The intelligent navigation method for the blind using multi-view stereo vision as claimed in claim 1, wherein said step S5 includes:

s51, receiving the traffic light prompt information, the person identification data, the fixed obstacle position, the moving obstacle track and the blind person traveling route;

s54, vibrating to prompt acquaintance information around the blind according to the character recognition data;

and S55, the blind person traveling information is indicated according to the blind person traveling route vibration.

10. An intelligent navigation system for the blind using multi-view stereo vision, the system comprising:

the data transmission module is used for uploading the image data, the positioning information and the camera calibration result to a preset data processing center, and the data transmission module is connected with the data acquisition module;

a data processing module, configured to obtain a three-dimensional model, and perform obstacle detection, moving object tracking, face detection, and traffic light detection according to the image data, the positioning information, and the camera calibration result, so as to plan a travel route for a blind person, where the data processing module is connected to the data transmission module, and step S3 includes:

the three-dimensional model acquisition module is used for extracting image characteristic points in the image data, solving logic matching processing of the image characteristic points according to a preset Hamming distance to obtain characteristic point pairs, obtaining image depth information through fusion processing of a parallax computation principle to acquire three-dimensional point clouds on the surface of an object, and obtaining a three-dimensional model of a blind person surrounding scene through point cloud meshing processing;

the figure recognition processing module is used for acquiring a face image from the image data and aligning the face, and using a preset face recognition logic to perform face recognition according to the face image and face information stored in a preset database so as to obtain figure recognition data;

obstacles and obstacles are identified and classified according to the three-dimensional stereo model of the blind person peripheral scene, so that the types and the directions of the obstacles are obtained, the preset multi-target monitoring model is utilized to process the image depth information, the types and the directions of the obstacles to obtain motion information, a Deepsort algorithm is utilized to extract a characteristic vector, a Kalman filtering algorithm is utilized to process the motion information in an associated manner, a Hungary algorithm is utilized to process the characteristic vector in an associated manner, so that the fixed obstacle position and the moving obstacle track are obtained;

the remote blind guiding module is used for planning a blind person traveling route according to the traffic light prompt information, the figure identification data, the fixed barrier position and the moving barrier track, and is connected with the data processing module;

the vibration navigation module is used for processing the traffic light prompt information, the person identification data, the fixed barrier position, the moving barrier track and the blind person traveling route so as to obtain a prompt signal and generate and send out blind person sensory early warning, and the vibration navigation module is connected with the remote blind guiding module.