AUSTRALIA Patents Act 1990 COMPLETE SPECIFICATION INNOVATION PATENT Biometric Person Identity Verification Based on Face and Gait Fusion The following statement is a full description of this invention, including the best method of performing it known to both of us: 1. Description: Preface: Human identity verification from arbitrary views is a very challenging problem, especially when one is walking at a distance. Of late, recognizing identity from gait patterns has become a popular area of research in biometrics and computer vision, and one of the most successful applications of image analysis and understanding. Gait recognition is one of new and important biometric technologies based on behavioural characteristics, and it involves identifying individuals by their walking patterns. Gait can be captured at a distance by using low resolution devices, while other biometrics needs higher resolution. Gait is difficult to disguise, and can be performed at a distance or at low resolution and requires no body-invading equipment to capture gait information. Gait recognition can hence be considered as a powerful recognition technology for next-generation surveillance and access control applications, with applicability to many civilian and high security environments such as airports, banks, military bases, car parks, railway stations etc. Further, gait is an inherently multimodal biometric as proposed by Murray et. al in[I], suggesting that there are 24 different components to human gait, and involves not only the lower body but also the upper body motion, including head and the hands. If all gait movements from full body images can be captured, it can be a truly an unique biometric for ascertaining identity. In this paper we propose a novel approach based on learning face and gait features in image transform subspaces. And show that even without inclusion of dynamic gait features, it is possible to obtain a significant improvement in recognition performance, provided appropriate transform subspaces and fusion strategies are considered. We examined two such multivariate statistical subspaces based on principal component analysis (PCA) and linear discriminant analysis (LDA). And fusion of face and gait features based on holistic and hierarchical fusion strategy. Extensive experiments conducted on a publicly available gait database [1] suggest that the proposed approach can capture several inherent multimodal components from gait, and face of a walking human from low resolution video. Even without dynamic cues, a simple, practical and robust identity verification system can be built in spite of poor quality data from surveillance video, and significant pose and illumination variations. Rest of the paper is organized as follows. Next Section discusses the background and the previous work, followed by our proposed scheme in Section 3. In Section 4 we describe the details of the experimental work carried out, and a discussion on some of the results obtained from the experimental work. The paper concludes in Section 6 with conclusions and plan for further work. Background: Current state-of-the-art video surveillance systems, when used for recognizing the identity of the person in the scene, cannot perform very well due to low quality Video or inappropriate processing techniques. Though much progress has been made in the past decade on visual based automatic person identification through utilizing different biometrics, including face recognition, gait analysis, iris and fingerprint recognition, each of these techniques work satisfactorily in highly controlled operating environments such as border control or immigration check points, under constrained illumination, pose and facial expressions. To address the next generation security and surveillance requirements and for diffusion of biometrics based security systems for day-to-day civilian access control applications, we need a robust and invariant biometric trait [2] to identify a person for both controlled and uncontrolled operational environments. Face recognition has been the focus of extensive research for the past three decades [2]. The approaches for this task can be broadly divided into two categories: 1) Feature-based methods [3, 4], which first process the input image to identify and extract distinctive facial features such as the eyes, mouth, nose, etc. as well as other fiducially marks and then compute the geometric relationships among those facial points, thus, reducing the input facial image to a vector of geometric features. Standard statistical pattern recognition techniques are then employed for matching faces using these measurements. 2) Appearance-based (or holistic) methods [5, 6], which attempt to identify faces using global representations, i.e., descriptions based on the entire image rather than on local features of the face. Though face recognition methods traditionally operate on static intensity images. In recent years, much effort has also been directed towards identifying faces from video [7] as well as from other modalities such as 3D [8] and infrd-red [9]. Recently, much effort has been expended on combining various biometrics in a bid to improve upon the recognition accuracy of classifiers that are based on a single biometric. Some biometric combinations which have been tested include face, fingerprint and hand geometry [10]; face, fingerprint and speech [II]; face and iris [12]; face and ear [13]; and face and speech [14, 15, 16]. The potential of gait as a powerful biometric has been explored in some of the recent works [17, 18], though inherent multimodal components present in the whole body during walking has not been much exploited by the research community. In this paper we explore some preliminary work on how these multimodal aspects can play an important role in differentiating individuals during walking. On another note, some of the most important challenges for diffusion of biometrics in day-to-day civilian applications are issues related to invasion of privacy. In [19,] an extensive study has shown that physiological biometrics as having no negative impact on privacy. That is an excellent motivation for us to investigate face, body and gait cues during walking as a powerful biometric with inherent multimodality for establishing the identity of a person. Further, these video based cues can be captured remotely from a distance, and by using an appropriate biometric identification protocol such as the one suggested by authors in [20], it can be ensured that sensitive privacy concerns are addressed as well. An appropriate protocol as in [20] can ensure that the identification system is not misused and that function creep (i.e. use for another purpose is prevented). This means in particular that a component should not be able to learn more information than what is really needed for a correct result. In fact our proposed fusion of side face, body and gait cues captured from low resolution surveillance videos ("security check: pass") needs strong algorithms and processing techniques to be of any use for establishing identity, and of no use without them, and safe-guard the privacy to some extent automatically. The details of the publicly available gait database used for this research, and the proposed multimodal identification scheme are described in the next Section. Experimental Result and Discussion: We performed different sets of experiments for examining the discriminating ability of proposed feature extraction in PCA and LDA subspaces and different learning classifier techniques. Further we also compared the performance of score and feature-level fusion (schematic shown in Figure 2(a) and 2(b)) The recognition performance of single mode face and gait features, and with fusion of face and gait features at score-level and at feature-level, are discussed in next few Sections. A. Recognition Performance With PCA-Features For the first set of experiments we applied PCA transformation and perfonned classification with Bayesian (linear/quadratic) and k-nearest neighbour classifiers. Table I shows the recognition accuracies achieved for PCA only features. For this experimental scenario, we received 85% recognition accuracy for Bayesian-linear classifier, 90% accuracy for Bayesian quadratic, and 95% for I-NN classifier. Though we expect a 100% accuracy for face-only mode, what we found was that quality of side face images was very poor, resulting in failure to recognize some poor quality faces. Classifier Type Face-Only Gait-Only Face-Gait PCA PCA Score Faion Bayesian-linear 85% 45% 65% Bayesian-quadratic 90% 50% 600% 1-NN classify 95% 50% 55% Table 1: PCA with Bayesian Classifiers and 1-Nearest Neighbour Classifier Next, we performed experiments for gait only mode and we achieved a recognition accuracy of 45% recognition for Bayesian linear classifier, 50% for Bayesian-quadratic classifier and 50% of l-NN classifier. Once again, PCA features for gait only mode failed badly because of the inability of PCA technique to capture the gait dynamics of each person. However, when we integrated the face-only information with gait information, the performance improved significantly, resulting in an accuracy of 75%, 65% and 70% for Bayesian-linear, Bayesian quadratic and I -NN classifiers respectively. For all the experiments in this set we used 40 PCA feature dimensions. Figure 3 shows the Eigen faces and Eigen gaits of one of the data subsets. B. Recognition Performance Acuuracy With PCA-LDA Features For this set of experiments, we obtained the PCA transformation first and then PCA features were transformed in the LDA space again. And we achieved 100% accuracy for face-only data set. For gait only data set, we achieved a recognition accuracy of 90% for Bayesian-linear, 90% for Bayesian-quadratic, and 80% for 1-NN classifier. Combining the face-gait features in PCA+LDA subspace it was possible to achieve a recognition accuracy of 100% for all three types of classifiers.
Name Face-Only Gait-Only Face-Gait PCA-LDA PCA-LDA Score Fusion (40) (40) (30) Bayesian-linear 100% 90% 100% Bayesian-quadratic 100% 90% 100% 1-NN classify 100% 80% 100% Table 2: PCA - LDA with Bayesian Classifiers and 1-Nearest Neighbour Classifier Since the face only classifier in PCA-LDA subspace results in 100% accuracy, it would appear that there is no need for fusion with gait features, However, the dimensionality of face only PCA-LDA features was 40 for achieving 100% accuracy, whereas, the dimensionality of features needed to achieve 10 0 ^ accuracy was much lesser when face and gait features were fused. We needed 30 features with score-level fusion to achieve 100% accuracy. As can be seen in Table 2, PCA features in LDA subspace were capable in capturing the person-specific gait variations accurately data set for all three classifiers. So it was a.synergistic fusion, with PCA helpful in reducing the dimensionality and LDA capturing inter-person and intra-person gait associated variations accurately. C. Recognition Performance With Holistic versus Hierarchical Fusion In this set of experiments, we examined the hierarchical versus holistic fusion. The holistic fusion is essentially same as score level fusion. While the score level fusion in Figure (2) uses 30 features, for examining the performance of hierarchical vs. holistic fusion, we used 20 features for each. With 20 features, both the classifiers have less than 100% accuracy, and with gait classifier as the first stage classifier, the hierarchical fusion performance is as shown in Table 3. We set the threshold for 2"a stage face classifier to be invoked to 95%, so that when gait classifier accuracy is less than 95%, the confidence level of the ID accept/reject decision is enhanced by 2 " stage classifier with face PCA-LDA features. Name Gait-Only Face-Only Face-Gait PCA-LDA PCA-LDA Ierachical (20) (20) Fusion (20) Bayesiam-linear 55% 80% 90% Bayesian-quadratic 60% 85% 95% 1-NN classify 50% 80% 85% Table 3: PCA - LDA with Bayesian Classifiers and 1-Nearest Neighbor Classifier D. Recognition Performance Acuuracy With Features levelfusion For this set of experiments, we obtained the PCA transformation first and then PCA features were transformed in the LDA space again, training and testing was performed on PCA-LDA vectors, With this, we achieved 100% accuracy for face only data set. For gait only data set, we achieved a recognition accuracy of 90% for Bayesian-linear, 90% for Bayesian-quadratic, and 80% for 1 NN classifier. Combining the face-gait features in PCA+LDA subspace it was possible to achieve a recognition accuracy of 100% for all three types of classifiers. Since the face only classifier in PCA-LDA subspace results in 100% accuracy, it would appear that there is no need for fusion with gait features. However, the dimensionality of face only PCA-LDA features was 40 for achieving 100% accuracy, whereas, the dimensionality of features needed to achieve 10 0 ^ accuracy was much lesser when face and gait features were fused. We needed 20 features for feature-level fusion and 30 features with score-level fusion to achieve 100% accuracy. As can be Seen in Table 2, PCA features in LDA subspace were capable in capturing the person specific gait variations accurately for all three classifiers. So it was a synergistic fusion, with PCA helpful in reducing the dimensionality and LDA capturing inter-person and intra-person gait associated variations accurately. Another interesting observation was though it is well known in literature, that the score-level fusion results in better performance than feature level fusion, we found that the number of features needed for score fusion is higher (30 as compared to 20 features for feature level fusion before concatenation). Thus could be because score level fusion does not preserve the inherent multimodality present in face and gait as well as feature-level fusion can do.. Name Face Only Gait Only Face-Gait Face-Gait (PCA-LDA) PCA-LDA Feature Score (40) (40) Fusion Fusion (20) (30) Bayesian 100% 90% 100% 100% Linear Bayesian 100% 90% 100% 100% Quadratic kNN 100% 80% 100% 100% Classifier Table 4: Recognition Performance Acuuracy With Features level fusion 3. Conclusions: At the end, we can conclude that our proposed project on multimodal fusion of face and gait biometric cues for identity verification will be a next generation futuristic solution, allowing diffusion of biometric security technologies with better user-acceptability for day to day civilian access control and public surveillance applications. We have already provided enough justification behind our assurance. We're also confident that our proposed project would be a pioneering solution for addressing the challenges of the future in terms of security, surveillance and identity assurance for a wide range of application scenanos.