CN106469465A

CN106469465A - A kind of three-dimensional facial reconstruction method based on gray scale and depth information

Info

Publication number: CN106469465A
Application number: CN201610794122.1A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2016-08-31
Filing date: 2016-08-31
Publication date: 2017-03-01
Also published as: WO2018040099A1

Abstract

A kind of three-dimensional facial reconstruction method based on gray scale and depth information proposing in the present invention, its main contents includes：Face half-tone information is identified；Face depth information is identified；Multi-modal recognition of face；Mated by 3D model；3D reconstruction is carried out to face.Its process is to carry out characteristic area positioning to human face data, carries out registration and feature extraction using characteristic point.Selected for maximally efficient feature of classifying using Adaboost algorithm.Then calculate coupling fraction using nearest neighbor classifier and realize multi-modal recognition of face.Finally by coupling local 3D model, complete human face rebuilding.The present invention passes through convergence strategy, effectively improves performance and the efficiency of face identification system.Cascaded using 3D and return, select the 3D point set of densification, face is fully labeled, it is to avoid mark position changes, solve the problems, such as that action change anchor point is inconsistent and self-enclosed, calculate and spend reduction.Highly versatile, live effect is good.

Description

A kind of three-dimensional facial reconstruction method based on gray scale and depth information

Technical field

The present invention relates to technical field of face recognition, especially relate to a kind of three-dimensional face based on gray scale and depth information Method for reconstructing.

Background technology

3D face mesh reconstruction method, can be used for criminal's monitoring, in the feelings not needing criminal's fingerprint or identity information Carry out face reconstruct under condition, can be also used for 3 D-printing, can be additionally used for the fields such as three-dimensional face modeling, cartoon making In, the impact to each field is great.Three-dimensional face identification, with respect to two-dimension human face identification, has it to illumination robust, is subject to attitude And expression etc. factor impact less the advantages of, therefore three dimensional data collection technology develop rapidly and three-dimensional data quality After greatly promoting with precision, a lot of scholars put into their research in this field.

Face gray level image is easily affected by illumination variation, and face depth image is easily subject to accuracy of data acquisition And the impact such as expression shape change, these factors have impact on stability and the accuracy of face identification system to a certain extent.Cause This multi-modal fusion system is increasingly paid close attention to by people.Multimodal systems are by carrying out the collection of multi-modal data, permissible Using the advantage of each modal data, and overcome some inherences weakness of single mode system by convergence strategy (as gray-scale maps The illumination of picture, the expression of depth image), effectively improve the performance of face identification system.

The present invention passes through to merge gray scale and depth information obtains multimodal systems by carrying out two dimensional gray information and three-dimensional The collection of depth information, passes through to mate local 3D Model Reconstruction facial contours using collection information point.By convergence strategy Lai gram Take some inherences weakness (such as the illumination of gray level image, the expression of depth image) of single mode system, effectively improve face The performance of identifying system is so that recognition of face more accurate quick.Cascaded using 3D and return, in face in action change, mark keeps Unanimously, by selecting fine and close 3D point set, face is fully labeled, it is to avoid the position of mark changes, and solves action and becomes Change the inconsistent and self-enclosed problem of anchor point；Calculate cost to greatly reduce.3D grid does not contain background, highly versatile, Er Qieshi Shi Xiaoguo is good.

Content of the invention

Easily affected by illumination variation for face gray level image, and face depth image is easily subject to data acquisition The problem of the impact such as precision and expression shape change, it is an object of the invention to provide a kind of three-dimensional based on gray scale and depth information Facial reconstruction method, obtains multimodal systems by merging gray scale and depth information, by carrying out two dimensional gray information and three-dimensional The collection of depth information, passes through to mate local 3D Model Reconstruction facial contours using collection information point.

For solving the above problems, the present invention provides a kind of three-dimensional facial reconstruction method based on gray scale and depth information, its Main contents include：

(1) face half-tone information is identified；

(2) face depth information is identified；

(3) multi-modal recognition of face

(4) mated by 3D model；

(5) 3D reconstruction is carried out to face；

Wherein, described face half-tone information is identified, comprise the steps：

(1) characteristic area positioning, obtains human eye area using human eye detection device, and described human eye detection device is hierarchical classification Device H, obtains through following algorithm：

Given training sample set S={ (x₁,y₁),…,(x_m,y_m), weak spatial classification deviceWherein x_i∈ χ is sample Vector, y_i=± 1 is tag along sort, and m is total sample number, initialization sample probability distribution

To t=1 ..., each Weak Classifier in T makees following operation：

Sample space χ is divided, obtains X₁,X₂,…,X_n；

Wherein ε is a little normal number；

Calculate normalization factor,

Select one so that Z minimizes in Weak Classifier space

Update training sample probability distribution

Wherein Z_tFor normalization factor so that D_t+1For a probability distribution；

Finally strong classifier H is

(2) carry out registration using the human eye area position obtaining, obtain LBP using LBP algorithm process position of human eye data Nogata

Figure feature, value formula is

This feature input gray level Image Classifier is obtained Gray-scale Matching fraction.

Wherein, described face depth information is identified, comprise the steps：

(1) characteristic area positioning, judges face nose regional location；

(2) for the three-dimensional data of different attitudes, after obtaining the reference zone of registration, carry out data according to ICP algorithm Registration, calculates the Euclidean distance between the three-dimensional face model data in input data and registry after the completion of registration；

(3) carry out the acquisition of depth image according to depth information, using wave filter in the depth image after mapping Noise point compensates denoising, finally expression robust region is selected, obtains final three-dimensional face depth image；

(4) extract the visual dictionary histogram feature vector of three dimensional depth image, after testing facial image input, pass through After Gabor filtering, by all primitive vocabulary ratios in vision participle allusion quotation all corresponding with its position for arbitrary filter vector Relatively, by way of distance coupling, it is mapped to therewith on closest primitive, extracts original depth image Visual dictionary histogram feature, obtains coupling fraction using this feature input depth image grader.

Wherein, described multi-modal recognition of face, includes multiple data sources including multi-modal fusion system：As two dimensional gray Image, three dimensional depth image；

(1) for 2-D gray image, carry out feature point detection (human eye) first, then using the characteristic point position obtaining Carry out registration, after gray level image registration, using LBP algorithm to this data acquisition LBP histogram feature；

(2) for range data, carry out feature point detection (nose) first and joined using the characteristic point obtaining Then three-dimensional space data after registration is mapped as face depth image by standard, using visual dictionary algorithm to this data acquisition Visual dictionary histogram feature；

Further, this multimodal systems utilizes Feature-level fusion strategy, therefore after obtaining each data source characteristic, by institute There are merging features to form feature pool, one Weak Classifier of each of feature pool feature construction together, then utilize Adaboost algorithm, picks out in feature pool for maximally efficient feature of classifying, is finally based on multi-modal Feature-level fusion The feature obtaining, calculates coupling fraction using nearest neighbor classifier, realizes multi-modal recognition of face with this.

Wherein, described mated by 3D model, comprise the steps：

(1) iterative algorithm refinement corresponding relation

Two dimensional gray information before and the collection of three-dimensional depth information, rebuild 3D shape from two-dimensional shapes, need to make Reconstructed error minimizes

Here P represents matrix in two-dimentional projection, and z is the two-dimensional shapes of target, and alternative manner is in 2D characteristic point Registration 3D model, establishes rigidity (p={ s, α, beta, gamma, t }) and the conversion of non-rigid (r and s)；

The increase of summit quantity is faint to the reducing effect rebuilding error rate, and summit quantity increases impact regression model And matching speed, number of vertex measures lower value；The increase of iterative algorithm number of times is notable to the reducing effect rebuilding error rate, to mould The impact of molded dimension is little, so iterative algorithm number of times takes higher value；

(2) corrected by matrix

It is assumed that there being semantic correspondence between 2D and 3D characteristic point, to select the corresponding 2D's of correct 3D in the form of matrix Characteristic point, semanteme here corresponded in the modelling phase it has been established that two-dimensional projection's mark of 3D mark is obtained by cascading recurrence；

(3) constrain visible mark

By constraining the process of visible mark, cascade returns the definition that have evaluated mark

ξ=j | v_j=1 } subset showing indicator index is visible；

(4) two-dimensional measurement

The synchronous two-dimensional measurement (z (1) ..., z (C)) of entry time, all of C measurement represents identical three-dimensional face, But from different angles, by the restriction of the reconstruction to all measurements, above formula is extended：

Subscript (k) represents k^thSecondary measurement, visibility is set to ξ (k) because we observe be identical face but be From different perspectives, integral rigidity (r) is all identical with partly non-rigid (s) measuring method；

(5) rigidity, non-rigid parameter are determined

It is assumed that the rigid structure of face varies less, (parameter r), an espressiove has change, and (parameter s), in order to solve this The situation of kind, is solved in the time domain

1) calculate rigid modifications' parameter：

т={ z^(t)| t=1 ..., T } represent the setting of measure of time, rigid modifications' parameter that r т representative calculates from т, this Non-rigid parameter in one step is set to 0；

2) frame calculates rigid modifications parameter t ∈ [1 ..., T] at any time,

Wherein, described 3D reconstruction is carried out to face, including in a parameter vector

q:p(q)∝N(q；0,∧)

The priority principle of parameter follow meansigma methodss be 0, variance be Λ normal distribution, true using Principal Component Analysis Method Determine the d part of 3-dimensional base vector, then：

Respectively this two parts rigid and non-rigid are modeled,

Wherein 3-dimensional base vector d part (θ=[θ₁；...；θ_M]∈R^3M×d) description rigid deformation, the e portion of 3-dimensional base vector Divide (ψ=[ψ₁；...；ψ_M]∈R3^M×d) describe non-rigid deformation.

Further, described characteristic area positioning, comprises the steps：

(1) threshold value, determines that the threshold value of efficiency metric density is averagely born in domain, is defined as thr；

(2) utilize depth information to choose pending data, using the depth information of data, be extracted in the range of certain depth Human face data as pending data；

(3) calculating of normal vector, calculates the side vector information of the human face data being selected by depth information；

(4) zone leveling bears the calculating of efficiency metric density, bears the definition of efficiency metric density according to zone leveling, obtains The connected domain averagely born efficiency metric density, select wherein density value maximum of a connected domain in pending data；

(5) determine whether to find nose region, when current region threshold value is more than predefined thr, this region is nose Region, otherwise returns to step (1) and restarts to circulate.

Further, described ICP algorithm, key step is as follows：

(1) determine matched data set pair, the three-dimensional nose data decimation reference point data point set P from reference template；

(2) select to input the number matching in three-dimensional face with reference data using the nearest distance between point-to-point Strong point collection Q；

(3) calculate rigid motion parameter, calculate spin matrix R and translation vector t；When X determinant is 1, R=X；t =P-R*Q；

(4) according to the error judgment 3-D data set between the data set RQ+t after rigid transformation and reference data set P it is No registration, by the Euclidean distance between the three-dimensional face model data in input data and registry after registration

Wherein P, Q are set of characteristic points to be matched respectively, contain N number of characteristic point in set.

Further, the visual dictionary histogram feature vector of described extraction three dimensional depth image, comprises the steps：

1) three-dimensional face Range Image Segmentation is become some local grain regions；

2) for each GaBor filter response vector, the difference according to position maps that to its corresponding vision participle In the vocabulary of allusion quotation, and according to this based on set up visual dictionary histogram vectors as three-dimensional face special medical treatment express；

3) the visual dictionary histogram feature of the LBP histogram feature of gray level image and depth image is stitched together composition Feature pool, using Feature Selection algorithm, such as Adaboost, chooses wherein for recognition of face from the feature pool having obtained For combinations of features effectively, realize the data fusion of characteristic layer；

4), after obtaining face characteristic, nearest neighbor classifier is used as last recognition of face, wherein L1 is apart from selected As distance metric.

Further, described rigid element, is selection intermediate frame from each video, and application Principal Component Analysis Method determines Base vector (θ) and meansigma methodssProvide the linear subspaces of an entirety, describe the change of face shape.

Further, the linear subspaces target setting up description non-rigid deformation (ψ) is to set up a model, by independently instructing The pca model collection practicing and sharing soft-sided circle is combined into, and sets up the model based on part, makes apex height related, is formed intensive Region, because these regions will more preferably be compressed by PCA, drives segmentation to find facial expression data, employs data set In 6000 frames selected, data set D ∈ R6000 × 3072 are made up of 6000 frames and 1024 three-dimensional vertices；D is divided into three subsets Dx, Dy, Dz ∈ R6000 × 1024 each comprise the space coordinatess of vertex correspondence, the measurement of correlation between description summit, pass through Dx, Dy, Dz calculate correlation matrix normalization, then averagely become a correlation matrix C；The summit of same area also should be in face Surface is close to each other, and therefore, we normalize to [0,1] scope using calculating distance formation distance matrix G between model vertices, This two matrixes are integrated into a matrix.

Brief description

Fig. 1 is a kind of system flow chart based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention.

Fig. 2 is a kind of two-dimension human face human eye detection based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.

Fig. 3 is a kind of two-dimension human face LBP feature based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.

Fig. 4 be a kind of two-dimension human face gray scale chart based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention as Feature extraction schematic diagram.

Fig. 5 is a kind of three-dimensional face nose positioning based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.

Fig. 6 is a kind of three-dimensional face space reflection based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.

Fig. 7 be a kind of three-dimensional face depthmeter based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention as Feature extraction schematic diagram.

Fig. 8 is a kind of multi-modal recognition of face of the three-dimensional facial reconstruction method based on gray scale and depth information of the present invention FB(flow block).

Fig. 9 is a kind of multi-modal recognition of face of the three-dimensional facial reconstruction method based on gray scale and depth information of the present invention System block diagram.

Figure 10 is a kind of being carried out by 3D model of three-dimensional facial reconstruction method based on gray scale and depth information of the present invention Coupling flow chart.

Figure 11 is a kind of iterationses based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention and fixed point Quantity is to the graph of relation rebuilding error rate.

Figure 12 is that a kind of 3D that face is carried out of three-dimensional facial reconstruction method based on gray scale and depth information of the present invention weighs Build flow chart.

Figure 13 is a kind of face reconstruct image based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention.

Specific embodiment

It should be noted that in the case of not conflicting, the embodiment in the application and the feature in embodiment can phases Mutually combine, with specific embodiment, the present invention is described in further detail below in conjunction with the accompanying drawings.

Fig. 1 is a kind of system flow chart based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention, including Face half-tone information is identified；Face depth information is identified；Multi-modal recognition of face；Carried out by 3D model Join；3D reconstruction is carried out to face.

Fig. 2 is a kind of two-dimension human face human eye detection based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.As shown in Fig. 2 obtaining human eye area by human eye detection device, this human eye detection device is hierarchical classification device, and each layer is all It is a strong classifier (as Adaboost), each layer all can filter a part of non-human eye area, the image-region finally obtaining It is exactly human eye area.Adaboost algorithm may be summarized as follows：

To t=1 ..., each Weak Classifier in T makees following operation：

Sample space χ is divided, obtains X₁,X₂,…,X_n；

Wherein ε is a little normal number；

Calculate normalization factor,

Select one so that Z minimizes in Weak Classifier space

Update training sample probability distribution

Finally strong classifier H is

Fig. 3 is a kind of two-dimension human face LBP feature based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.As shown in figure 3, carrying out registration using the human eye area position obtaining, obtained using LBP algorithm process position of human eye data Obtain LBP histogram feature, value formula is

Fig. 4 is two-dimension human face gray scale chart of the present invention as feature extraction schematic diagram.As shown in figure 4, input two-dimension human face data, First pass through human eye detection and extract key point, then according to position of human eye, this facial image is adjusted to positive straight by rigid transformation Standing position state.The gray-scale maps being passed through by registration are extracted LBP histogram feature.This feature input gray level Image Classifier is obtained Take Gray-scale Matching fraction.

Fig. 5 is a kind of three-dimensional face nose positioning based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.As shown in figure 5, for range data, carrying out the detection in face nose region first, walk especially by following Suddenly：

Fig. 6 is a kind of three-dimensional face space reflection based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention Schematic diagram.As shown in fig. 6, carrying out registration using the nose region obtaining, carry out the registration of data using ICP algorithm, step is such as Under：

Fig. 7 be a kind of three-dimensional face depthmeter based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention as Feature extraction schematic diagram.As shown in fig. 7, after testing facial image input, after Gabor filtering, by arbitrary filter vector All primitive vocabulary in all corresponding with its position vision participle allusion quotation compare, by way of distance coupling, it It is mapped to therewith on closest primitive, extract the visual dictionary histogram feature of original depth image, flow process is such as Under：

Fig. 8 is a kind of multi-modal recognition of face of the three-dimensional facial reconstruction method based on gray scale and depth information of the present invention FB(flow block).Fig. 9 is a kind of multi-modal recognition of face of the three-dimensional facial reconstruction method based on gray scale and depth information of the present invention System block diagram.As shown in Figure 8,9, multi-modal fusion system includes multiple data sources：As 2-D gray image, three-dimensional depth map Picture；

This multimodal systems utilizes Feature-level fusion strategy, therefore after obtaining each data source characteristic, all features is spelled Be connected together formation feature pool, one Weak Classifier of each of feature pool feature construction, then utilizes Adaboost algorithm, Feature pool is picked out for maximally efficient feature of classifying, be finally based on the feature that multi-modal Feature-level fusion obtains, profit Calculate coupling fraction with nearest neighbor classifier, multi-modal recognition of face is realized with this.

Figure 10 is a kind of being carried out by 3D model of three-dimensional facial reconstruction method based on gray scale and depth information of the present invention Coupling flow chart, mainly comprises the following steps：

(1) iterative algorithm refinement corresponding relation

(2) corrected by matrix

(3) constrain visible mark

ξ=j | v_j=1 } subset showing indicator index is visible；

(4) two-dimensional measurement

(5) rigidity, non-rigid parameter are determined

1) calculate rigid modifications' parameter：

2) frame calculates rigid modifications parameter t ∈ [1 ..., T] at any time,

Figure 11 is a kind of iterationses based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention and fixed point Quantity is to the graph of relation rebuilding error rate.As can be seen that the increase of summit quantity is micro- to the reducing effect rebuilding error rate Weak, and summit quantity increases impact regression model and matching speed, and number of vertex measures lower value；The increase of iterative algorithm number of times To the reducing effect rebuilding error rate significantly, the impact to moulded dimension is little, so iterative algorithm number of times takes higher value.Make During with monocular camera image, corresponding formula has multiple solutions, it is to avoid produce 3D hallucination, here simultaneously using multiple images frame.

Figure 12 is that a kind of 3D that face is carried out of three-dimensional facial reconstruction method based on gray scale and depth information of the present invention weighs Build flow chart.In a parameter vector

q:p(q)∝N(q；0,∧)

Respectively this two parts rigid and non-rigid are modeled,

Further, set up rigid element, we select intermediate frame from each video, application Principal Component Analysis Method determines Base vector (θ) and meansigma methodssProvide the linear subspaces of an entirety, describe the change of face shape；

Figure 13 is a kind of face reconstruct image based on gray scale and the three-dimensional facial reconstruction method of depth information of the present invention.Permissible Find out, using multi-frame video image, obtain 3D grid vertex, face is completely covered by 3D point set, action change anchor point keeps Unanimously, and be successfully completed human face rebuilding.

For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of god and scope, the present invention can be realized with other concrete forms.Additionally, those skilled in the art can be to this Bright carry out various change and modification without departing from the spirit and scope of the present invention, these improve and modification also should be regarded as the present invention's Protection domain.Therefore, all changes that claims are intended to be construed to including preferred embodiment and fall into the scope of the invention More and modification.

Claims

1. a kind of three-dimensional facial reconstruction method based on gray scale and depth information is it is characterised in that mainly include to face gray scale Information is identified (one)；Face depth information is identified (two)；Multi-modal recognition of face (three)；Carried out by 3D model Coupling (four)；Face is carried out with 3D reconstruction (five).

2. based on (one) being identified to face half-tone information described in claims 1 it is characterised in that including following walking Suddenly：

(1) characteristic area positioning, obtains human eye area using human eye detection device, and described human eye detection device is hierarchical classification device H, Obtain through following algorithm：

D_{1} (i) = \frac{1}{m}, i = 1, ..., m;

To t=1 ..., each Weak Classifier in T makees following operation：

Sample space χ is divided, obtains X₁,X₂,…,X_n；

&ForAll; x &Element; X_{i}, h (x) = \frac{1}{2} \ln (\frac{W_{+ 1}^{j} + ϵ}{W_{- 1}^{j} + ϵ}), j = 1, ..., n,

Wherein ε is a little normal number；

Calculate normalization factor,

Z = 2 \underset{j}{Σ} \sqrt{W_{+ 1}^{j} W_{- 1}^{j}}

Select one so that Z minimizes in Weak Classifier space

Update training sample probability distribution

D_{i + 1} (i) = D_{i} (i) \frac{\exp [- y_{i} h_{i} (x_{i})]}{Z_{t}}, i = 1, ..., m,

Finally strong classifier H is

H (x) = s i g n [Σ_{t = 1}^{r} h_{t} (x) - b]

(2) carry out registration using the human eye area position obtaining, obtain LBP Nogata using LBP algorithm process position of human eye data Figure feature, value formula is

{LBP}_{P R} = Σ_{P = 0}^{P - 1} s (g_{p} - g_{c}) 2^{p}

3. based on (two) being identified to face depth information described in claims 1 it is characterised in that including following walking Suddenly：

(1) characteristic area positioning, judges face nose regional location；

(2) for the three-dimensional data of different attitudes, after obtaining the reference zone of registration, carry out the registration of data according to ICP algorithm, The Euclidean distance between the three-dimensional face model data in input data and registry is calculated after the completion of registration；

(3) carry out the acquisition of depth image according to depth information, using wave filter for the noise in the depth image after mapping Point compensates denoising, finally expression robust region is selected, obtains final three-dimensional face depth image；

4. based on characteristic area positioning (1) described in claims 3 it is characterised in that comprising the steps：

(2) utilize depth information to choose pending data, using the depth information of data, be extracted in the people in the range of certain depth Face data is as pending data；

(4) zone leveling bears the calculating of efficiency metric density, bears the definition of efficiency metric density according to zone leveling, obtains and waits to locate The connected domain averagely born efficiency metric density, select wherein density value maximum of a connected domain in reason data；

(5) determine whether to find nose region, when current region threshold value is more than predefined thr, this region is nose region, Otherwise return to step (1) to restart to circulate.

5. based on the ICP algorithm described in claims 3 it is characterised in that inclusion step is as follows：

(2) select to input the data point matching in three-dimensional face with reference data using the nearest distance between point-to-point Collection Q；

(3) calculate rigid motion parameter, calculate spin matrix R and translation vector t；When X determinant is 1, R=X；T=P- R*Q；

(4) whether joined according to the error judgment 3-D data set between the data set RQ+t after rigid transformation and reference data set P Standard, by the Euclidean distance between the three-dimensional face model data in input data and registry after registration

D_{(P, Q)} = Σ_{1}^{N} {(p_{i} - q_{i})}^{2} / N

6. based on the step (4) described in claims 3 it is characterised in that comprising the steps：

2) for each GaBor filter response vector, the difference according to position maps that to its corresponding vision participle allusion quotation In vocabulary, and according to this based on set up visual dictionary histogram vectors as three-dimensional face special medical treatment express；

3) the visual dictionary histogram feature of the LBP histogram feature of gray level image and depth image is stitched together constitutive characteristic Pond, using Feature Selection algorithm, such as Adaboost, choosing from the feature pool having obtained wherein has the most for recognition of face Effect ground combinations of features, realizes the data fusion of characteristic layer；

4), after obtaining face characteristic, nearest neighbor classifier is used as last recognition of face, wherein L1 distance is selected as Distance metric.

7. based on the multi-modal recognition of face (three) described in claims 1 it is characterised in that including multi-modal fusion system bag Include multiple data sources：As 2-D gray image, three dimensional depth image；

(1) for 2-D gray image, carry out feature point detection (human eye) first, then carried out using the characteristic point position obtaining Registration, after gray level image registration, using LBP algorithm to this data acquisition LBP histogram feature；

(2) for range data, carry out feature point detection (nose) first and carry out registration, so using the characteristic point obtaining Afterwards the three-dimensional space data after registration is mapped as face depth image, using visual dictionary algorithm to this data acquisition visual word Allusion quotation histogram feature；

This multimodal systems utilizes Feature-level fusion strategy, therefore after obtaining each data source characteristic, all merging features is existed Form feature pool, one Weak Classifier of each of feature pool feature construction together, then utilize Adaboost algorithm, in spy Levy and pick out in pond for maximally efficient feature of classifying, be finally based on the feature that multi-modal Feature-level fusion obtains, using Nearest Neighbor Classifier calculates coupling fraction, realizes multi-modal recognition of face with this.

8. based on described in claims 1, (four) are mated it is characterised in that comprising the steps by 3D model：

(1) iterative algorithm refinement corresponding relation

Two dimensional gray information before and the collection of three-dimensional depth information, rebuild 3D shape from two-dimensional shapes, need to make reconstruct Error minimize

\underset{p, r, s}{argmin} Σ_{i = 1}^{M} | | {Px}_{i} (p, r, s) - z_{i} | |_{2}^{2}

Here P represents matrix in two-dimentional projection, and z is the two-dimensional shapes of target, and alternative manner is registered in 2D characteristic point 3D model, establishes rigidity (p={ s, α, beta, gamma, t }) and the conversion of non-rigid (r and s)；

Summit quantity increase to rebuild error rate reducing effect faint, and summit quantity increase impact regression model and Join speed, number of vertex measures lower value；The increase of iterative algorithm number of times is notable to the reducing effect rebuilding error rate, to model scale Very little impact is little, so iterative algorithm number of times takes higher value；

(2) corrected by matrix

In the form of matrix, it is assumed that there being semantic correspondence between 2D and 3D characteristic point, to select the feature of the corresponding 2D of correct 3D Point, semanteme here corresponded in the modelling phase it has been established that two-dimensional projection's mark of 3D mark is obtained by cascading recurrence；

(3) constrain visible mark

\underset{p, r, s}{argmin} \underset{i &Element; ξ}{Σ} | | {Px}_{i} (p, r, s) - z_{i} | |_{2}^{2}

ζ=j | v_j=1 } subset showing indicator index is visible；

(4) two-dimensional measurement

\underset{p^{(1)}, ..., p^{(C)},}{\arg \min} Σ_{k = 1}^{C} \underset{i &Element; ξ^{(k)}}{Σ} | | {Px}_{i} (p^{(k)}, r, s) - {z_{i}}^{(k)} | |_{2}^{2}

Subscript (k) represents k^thSecondary measurement, visibility is set to ξ (k) because we observe be identical face but be never Same angle, integral rigidity (r) is all identical with partly non-rigid (s) measuring method；

(5) rigidity, non-rigid parameter are determined

It is assumed that the rigid structure of face varies less, (parameter r), an espressiove has change, and (parameter s), in order to solve this feelings Condition, is solved in the time domain

1) calculate rigid modifications' parameter：

\arg \underset{r_{T}}{m i n} \underset{t &Element; T}{Σ} \underset{i &Element; ξ^{(t)}}{Σ} | | {Px}_{i} (p^{(t)}, r_{T}, 0) - {z_{i}}^{(t)} | |_{2}^{2}

т={ z^(t)| t=1 ..., T } represent the setting of measure of time, rigid modifications' parameter that r т representative calculates, this step from т In non-rigid parameter be set to 0；

2) frame calculates rigid modifications parameter t ∈ [1 ..., T] at any time,

\underset{p^{(t)}, s^{(t)}}{argmin} Σ_{i &Element; ξ^{(t)}} | | {Px}_{i} (p^{(t)}, r_{T}, s^{(t)}) - {z_{i}}^{(t)} | |_{2}^{2} .

9. 3D reconstruction (five) is carried out to face it is characterised in that including in a parameter vector based on described in claims 1 In

q:p(q)∝N(q；0,∧)

The priority principle of parameter follow meansigma methodss be 0, variance be Λ normal distribution, determine 3 using Principal Component Analysis Method The d part of Wiki vector, then：

Respectively this two parts rigid and non-rigid are modeled,

x_{i} = (p, r, s) = s R ({\overset{&OverBar;}{x}}_{i} + θ_{i} r + ψ_{i} s) + t, (i = 1, ..., M)

Wherein 3-dimensional base vector d part (θ=[θ₁；...；θ_M]∈R^3M×d) description rigid deformation, the e part (ψ of 3-dimensional base vector =[ψ₁；...；ψ_M]∈R3^M×d) describe non-rigid deformation.

10. based on the rigid element described in claims 9 it is characterised in that including selecting intermediate frame from each video, should Determine base vector (θ) and meansigma methodss with Principal Component Analysis MethodProvide the linear subspaces of an entirety, describe face The change of shape；Described non-rigid deformation is it is characterised in that include setting up the linear subspaces of description non-rigid deformation (ψ) Target is to set up a model, is combined into by the pca model collection independently training and sharing soft-sided circle, sets up the mould based on part Type, makes apex height related, forms intensive region, because these regions will more preferably be compressed by PCA, in order to find facial table Feelings data-driven is split, and employs 6000 frames selected in data set, data set D ∈ R6000 × 3072 are by 6000 frames and 1024 Three-dimensional vertices form；D is divided into three subsets Dx, Dy, Dz ∈ R6000 × 1024 each comprise the space coordinatess of vertex correspondence, Measurement of correlation between description summit, by Dx, Dy, Dz calculate correlation matrix normalization, then averagely become a correlation matrix C；The summit of same area also should be close to each other on face surface, and therefore, we are using calculating between model vertices apart from shape Distance matrix G is become to normalize to [0,1] scope, this two matrixes are integrated into a matrix.