CN106874830B

CN106874830B - A kind of visually impaired people's householder method based on RGB-D camera and recognition of face

Info

Publication number: CN106874830B
Application number: CN201611140457.8A
Authority: CN
Inventors: 于红雷; 赵向东; 杨恺伦; 胡伟健; 汪凯巍
Original assignee: Hangzhou Vision Krypton Technology Co Ltd
Current assignee: Hangzhou Vision Krypton Technology Co Ltd
Priority date: 2016-12-12
Filing date: 2016-12-12
Publication date: 2019-09-24
Anticipated expiration: 2036-12-12
Also published as: CN106874830A

Abstract

Visually impaired people's householder method based on RGB-D camera and recognition of face that the invention discloses a kind of.This method comprises: carrying out the tracking of face using the collected color image of RGB-D and depth image, and label is assigned for these faces automatically；The label passes through microphone input, the including but not limited to name of face, personal information, telephone number etc. by user；Facial image is corrected by frontization, is adapted to identification in different positions；Facial image after the correction is used for the training human face recognition model in neural network；Facial image to be identified is input into the trained human face recognition model, and the recognition result of model output passes to user by the stereosonic mode of 3D；The stereo information that can be prompted of the 3D includes: the distance of the orientation of face and face apart from user in depth image.

Description

A kind of visually impaired people's householder method based on RGB-D camera and recognition of face

Technical field

The present invention relates to pattern classifications, machine learning, and recognition of face, dysopia crowd's ancillary technique field especially relates to And a kind of visually impaired people's householder method based on RGB-D camera and recognition of face.

Background technique

According to the data of the World Health Organization (WHO), there are 2.85 hundred million visually impaired peoples in the whole world, wherein having 39,000,000 is blind person. In the daily life of visually impaired people, identify that the identity of people around is demand outstanding.It is visually impaired under the auxiliary of not other equipment Personage can only judge that this is largely limited to the familiarity of visually impaired people Yu its people around by distinguishing sound, Distance, the factors such as environment noisy degree.Traditional face identification method generally shoots facial image using color camera, and needs Guarantee positive face and uniform illumination, this requires acquisition face sample during, face as close as possible to camera simultaneously And front is towards camera.Therefore, one kind is designed specially towards the face identification system of visually impaired people, and uses natural friendship Mutual mode will be provided largely conveniently for visually impaired people.

Summary of the invention

The purpose of the present invention is utilizing RGB-D camera and face recognition technology, solve visually impaired people recognizes people and knows people side Inconvenience existing for face, it is intended to provide a kind of easy to use, householder method of interactive mode hommization for visually impaired people.

The present invention is achieved through the following technical solutions: a kind of visually impaired people based on RGB-D camera and recognition of face is auxiliary Aid method, the specific steps are as follows: (1) foundation of the typing of face and face database；(2) correction of facial image；(3) neural Network training；(4) face is identified；(5) interaction of the stereo result for identification of 3D.

The step (1) specifically: for each identification object, acquire the continuous color image of multiframe and depth respectively Image further detects facial image by the color image channel in RGB-D, the facial image detected using in first frame as The initialization starting point of face tracking.If occurring face missing inspection or detection mistake in n-th frame, face tracking mould can star Formula detects the region of face.The face image data of all identification objects of typing and corresponding name, establish face database. The face tracking mode the following steps are included:

First, in the face detection of the (n-1)th frame, calculate separately the straight of the human face region in cromogram and depth map Fang Tu.The abscissa of color histogram is chromatic value, and ordinate is the number of the corresponding pixel of each chromatic value；Depth histogram Abscissa be depth value, ordinate be the corresponding pixel of each depth value number.

Second, in n-th frame, calculate the back projection figure of cromogram and depth map.The corresponding back projection figure of cromogram It is that the chromatic value of each pixel in cromogram is replaced with into corresponding ordinate in color histogram and is obtained；Depth map pair The back projection figure answered is that the depth value of each pixel in depth map is replaced with corresponding ordinate in depth histogram And it obtains.After two back projection figures are merged, the human face region more to be tallied with the actual situation is predicted.

Third, using mean shift algorithm MeanShift, calculates n-th frame face in fused back projection figure Region.

The step (2) specifically:

First, the format of facial image is adjusted as unified size, i.e. 100 pixel *, 100 pixel.

Second, the characteristic point of human face region is detected, the characteristic point includes cheek profile, eyes, eyebrow, nose, mouth. The detection of the characteristic point is based on color image.

Third, using the three-dimensional face model of point as characterized above as benchmark coordinate system, according to the feature in color image Point position carries out coordinate calibration to RGB-D, obtains camera coordinates system.

4th, all the points in threedimensional model are projected in the camera coordinates system.

5th, the RGB information of each point in the threedimensional model under colour image projection to camera coordinates system, will be assigned；

6th, front projection is carried out to the threedimensional model after assignment, the facial image after being corrected.

7th, turning colorized face images is grayscale image, and does histogram equalization processing.

The step (3) specifically: the facial image corrected, size are unified for 100 pixel *, 100 pixel, can regard Make the vector of one 10000 dimension.Then dimension-reduction treatment is carried out by principal component analysis PCA.

Each face corresponds to a data label being made of 0 and 1, and the data label of m-th of face is [a₁,a₂,… a_m,…a_k], wherein a_m=1, remaining is that 0, k is face sum；Using the data after dimensionality reduction as input, data label is as defeated Out, with back-propagation algorithm BP training neural network model.

Further, it is identified by the following method:

Facial image to be identified is acquired, through overcorrection, dimension-reduction treatment, then trained neural network is inputted, is exporting In each element of vector, if only one is greater than threshold value 0.5, the element vector thus of classification belonging to input data is determined The corresponding class of element；If having, the value of more than one element is greater than threshold value or the value of all elements is both less than threshold value, determines to input number It is stranger in recognition of face according to the data set being not belonging to when training.

Further, it interacts by the following method:

According to the face that step (4) identify, its name is obtained, its azran further can be known according to depth map From；Name is played to user with 3D sound, and the angle of 3D sound is used to indicate the orientation of face, and the size of 3D sound is for referring to It lets others have a look at the distance of face.

The beneficial effects of the present invention are:

1. the present invention provides a kind of method for identifying its people around's identity information for visually impaired people.

2. face tracking method proposed by the present invention can improve face recall rate, and the label of the automatic tag image of energy.

3. facial image antidote proposed by the present invention can remove head pose variation and non-uniform illumination to face The influence of identification.

4. proposed by the present invention use neural metwork training and face identification system, it can achieve the effect that real-time face identifies.

5. the interaction of 3D stereo sound proposed by the present invention result for identification, effectively improves face identification system and used The Experience Degree of journey.

Detailed description of the invention

Fig. 1 is system structure diagram；

Fig. 2 is face detection result figure；

Fig. 3 is gray processing treated color histogram or depth histogram；

Fig. 4 is fused back projection figure；

Fig. 5 is the preceding comparison diagram with facial image after correction of correction.

Specific embodiment

A kind of visually impaired people's householder method based on RGB-D camera and recognition of face, the specific steps are as follows:

(1) foundation of the typing of face and face database；

For each object to be identified, the continuous color image of multiframe and depth image are acquired respectively, is further passed through The magazine color image channel RGB-D detects facial image, and the facial image detected using in first frame is as face tracking Initialize starting point.If occurring face missing inspection or detection mistake in n-th frame, it can star face tracking mode, detect face Region.The face image data of all objects to be identified of typing and corresponding name, establish face database.

The face tracking mode the following steps are included:

First, in the face detection of the (n-1)th frame, (human face region is outlined) as shown in Figure 2 calculates separately colour The histogram of human face region in figure and depth map, as shown in Figure 3.The abscissa of color histogram is chromatic value, and ordinate is The number of the corresponding pixel of each chromatic value；The abscissa of depth histogram is depth value, and ordinate is corresponding for each depth value Pixel number.

Second, in n-th frame, the back projection figure of cromogram and depth map is calculated, as shown in Figure 4.Cromogram is corresponding Back projection figure is that the chromatic value of each pixel in cromogram is replaced with corresponding ordinate in color histogram and is obtained ?；The corresponding back projection figure of depth map be the depth value of each pixel in depth map is replaced with it is right in depth histogram The ordinate answered and obtain.Back projection figure is gray level image, in the cromogram and the corresponding back projection of depth map In figure, it is human face region that, which there is a possibility that bigger in the bigger region of gray value,；After two back projection figures are merged, more accorded with Close the human face region prediction of actual conditions.

(2) correction of facial image

The correction of face is for removing head pose variation and influence of the non-uniform illumination to recognition of face.Face is known Be not equivalent to a classification problem, in the training process of classifier, the class inherited of sample should larger and every one kind class Interior difference should be smaller, and the head pose variation and non-uniform illumination will increase difference in class, poor even up between class Different comparable degree, for such sample, during classifier training, classifier is difficult to find that the difference between inhomogeneity It is different, it is as a result exactly that classifier does not have the ability correctly classified.Similarly, the facial image without correction in identification process more It is easy error.

The correction of facial image is divided into following steps:

Second, the characteristic point of human face region is detected, the characteristic point includes cheek profile, eyes, eyebrow, nose and mouth Bar.The detection of the characteristic point is based on color image.

Third finds the three-dimensional coordinate of character pair point, the three-dimensional coordinate in a general three-dimensional face model In world coordinate system.According to characteristic point in the two-dimensional coordinate and camera parameter and the threedimensional model in color image Three-dimensional coordinate, the transformational relation of world coordinate system and camera coordinates system is calculated.

4th, all the points in threedimensional model are projected to the camera coordinates system according to the coordinate system transformational relation In, result in the RGB information of each point.

5th, the human face three-dimensional model after assignment RGB information is projected on positive direction, the face figure after being corrected Picture.

6th, turning colorized face images is grayscale image, and does histogram equalization processing.

It is illustrated in figure 5 the comparison before correction with facial image after correction, wherein a, b, c are the image before correction, d, e, f Image after respectively corresponding correction.

(3) neural metwork training

Correct obtained facial image, size is unified for 100 pixel *, 100 pixel, can be regarded as one 10000 dimension to Amount.Such dimension is too big for the neural network input for needing to calculate in real time and cannot receive.Principal component analysis PCA quilt Apply to preprocessed data.The data prediction is dimensionality reduction.

Each face corresponds to a data label being made of 0 and 1, and the data label of m-th of face is [a₁, a₂,…a_m,…a_k], wherein a_m=1, remaining is that 0, k is face sum；Using the data after dimensionality reduction as input, data label is made For output, neural network model is trained with back-propagation algorithm BP.

(4) face is identified

(5) interaction of the stereo result for identification of 3D

To the face of the step (4) identification, its name is obtained, its azran further can be known according to depth map From；Name is played to user with 3D sound, and the angle of 3D sound is used to indicate the orientation of face, and the size of 3D sound is for referring to It lets others have a look at the distance of face.

Claims

1. a kind of visually impaired people's householder method based on RGB-D camera and recognition of face, which is characterized in that specific step is as follows:

(1) foundation of the typing of face and face database；

Object is identified for each, the continuous color image of multiframe and depth image is acquired respectively, further by RGB-D Color image channel detect facial image, initialization starting point of the facial image detected using in first frame as face tracking； If occurring face missing inspection or detection mistake in n-th frame, it can star face tracking mode, detect the region of face；Typing The face image data of all identification objects and corresponding name, establish face database；The face tracking mode include with Lower step:

First, in the face detection of the (n-1)th frame, calculate separately the histogram of the human face region in cromogram and depth map Figure；The abscissa of color histogram is chromatic value, and ordinate is the number of the corresponding pixel of each chromatic value；Depth histogram Abscissa is depth value, and ordinate is the number of the corresponding pixel of each depth value；

Second, in n-th frame, calculate the back projection figure of cromogram and depth map；The corresponding back projection figure of cromogram be by The chromatic value of each pixel in cromogram replaces with corresponding ordinate in color histogram and obtains；Depth map is corresponding Back projection figure is that the depth value of each pixel in depth map is replaced with corresponding ordinate in depth histogram and is obtained ?；After two back projection figures are merged, the human face region more to be tallied with the actual situation is predicted；

Third, using mean shift algorithm MeanShift, calculates the area of n-th frame face in fused back projection figure Domain；

(2) correction of facial image；

(3) neural metwork training；

(4) face is identified；

(5) interaction of the stereo result for identification of 3D.

2. the method according to claim 1, wherein the step (2) specifically:

First, the format of facial image is adjusted as unified size, i.e. 100 pixel *, 100 pixel；

Second, the characteristic point of human face region is detected, the characteristic point includes cheek profile, eyes, eyebrow, nose and mouth；Institute The detection for stating characteristic point is based on color image；

Third, using the three-dimensional face model of point as characterized above as benchmark coordinate system, according to the feature point in color image It sets, coordinate calibration is carried out to RGB-D, obtains camera coordinates system；

4th, all the points in threedimensional model are projected in the camera coordinates system；

6th, front projection is carried out to the threedimensional model after assignment, the facial image after being corrected；

3. the method according to claim 1, wherein the step (3) specifically: the face figure corrected Picture, size are unified for 100 pixel *, 100 pixel, can be regarded as the vector of one 10000 dimension；Then pass through principal component analysis PCA Carry out dimension-reduction treatment；

Each face corresponds to a data label being made of 0 and 1, and the data label of m-th of face is [a₁,a₂,…a_m,… a_k], wherein a_m=1, remaining is that 0, k is face sum；Using the data after dimensionality reduction as input, data label is used as output Back-propagation algorithm BP trains neural network model.

4. the method according to claim 1, wherein being identified by the following method:

Facial image to be identified is acquired, through overcorrection, dimension-reduction treatment, then trained neural network is inputted, in output vector Each element in, if only one is greater than threshold value 0.5, determine the vector element pair thus of classification belonging to input data The class answered；If having, the value of more than one element is greater than threshold value or the value of all elements is both less than threshold value, determines input data not Data set when belonging to trained is stranger in recognition of face.

5. the method according to claim 1, wherein interacting by the following method:

According to the face that step (4) identify, its name is obtained, its azimuth-range further can be known according to depth map；With 3D sound plays name to user, and the angle of 3D sound is used to indicate the orientation of face, and the size of 3D sound is used to indicate people The distance of face.