WO2015122789A1

WO2015122789A1 - Facial recognition and user authentication method

Info

Publication number: WO2015122789A1
Application number: PCT/RU2014/000089
Authority: WO
Inventors: Aleksander Sergeevich SHUSHARIN; Konstantin Vasilevich CHERENKOV; Andrey Vladimirovich VALIK
Original assignee: 3Divi Company
Priority date: 2014-02-11
Filing date: 2014-02-11
Publication date: 2015-08-20

Abstract

The technology described herein provide methods and systems for facial recognition and user authentication. The methods comprises the step of acquiring an input image, determining a position of a user face on the input image, locating multiple facial landmarks of the user face, transforming at least a portion of the input image into a uniform image of the user face, retrieving a feature vector based upon the uniform image of the user face, comparing the feature vector with at least one reference feature vector, and identifying the user or making an authentication decision with respect to the user based on the result of comparison. The technology allows for the creation of multiple facial angle shots based on a single facial image of an individual. These facial angle shots are further used to train a machine-learning algorithm handling facial recognition and/or user authentication processing.

Description

FACIAL RECOGNITION AND USER AUTHENTICATION METHOD

TECHNICAL FIELD

[0001] This disclosure relates generally to object recognition and more particularly to facial recognition and user authentication technologies.

DESCRIPTION OF RELATED ART

[0002] The approaches described in this section could be pursued, but are not necessarily approaches that have previously been conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.

[0003] Traditionally, biometrics refers to identification of people by their characteristics and traits. Biometrics is widely used in computer-based user authentication and access control systems, computer vision systems, gaming platforms, and so forth. For example, an individual may activate or otherwise gain access to functionalities or data controlled by a computing device by acquiring and verifying user biometric data. Generally, biometric identifiers are the distinctive and measurable characteristics, which can be used to label and describe an individual. Some examples of biometric identifiers may include a retina image, an iris image, an image or shape of individual's face, fingerprints, a handprint, a voice, keystroke dynamics, behavioral characteristics, and so forth. [0004] Conventionally, facial recognition methods or algorithms are implemented by retrieving features of samples (e.g. human faces), comparing the features with stored reference feature records, and determining whether they are matched or not. The reliability of facial recognition methods greatly depend on the quality of samples, in other words, images of the individual's face to be identified, as well as conditions under which the samples are taken. In particular, the level and angle of illumination, the position of a camera relative to an individual's face, facial expression, luminance and color characteristics, among other things, significantly influence the quality and reliability of facial recognition. One way to reduce vulnerability from these factors is to maintain a great number of samples for a given individual, whereas the samples are taken under different conditions as outlined above. However, in most real-life scenarios, it is not possible to maintain a great number of an individual's samples, and, typically, security systems are limited by one sample image per individual. Thus, facial recognition is still inaccurate enough to be used in security and computer vision systems employing one or a few sample images of individuals. [0005] In view of at least the foregoing problems, there is still a need in the art for improvement of facial recognition methods and eliminating the influence of the above listed parameters on quality and reliability of these methods. There is a further need to decrease the false acceptance rate (FAR) and false rejection rate (FRR) especially in those circumstances when just one sample image of an individual is available.

SUMMARY

[0006] This summary is provided to introduce a selection of concepts in a simplified form that are further described in the Detailed Description below. This summary is not intended to identify key or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

[0007] Various embodiments of the present disclosure provide generally for significantly improving the quality and reliability of facial recognition and corresponding user authentication methods based upon the improved facial recognition techniques, as well as improving decreasing the FAR and FRR values. Specifically, the present technology involves intelligent creating of multiple facial angle shots of an individual based on his or her single facial image. These facial angle shots may optionally be used to train a machine- learning system handling facial recognition and/or user authentication processing. As will be explained below, the principles of the present technology can be integrated not only in facial recognition methods, but also in user authentication methods, which are based on facial recognition.

[0008] According to embodiments of the present disclosure, provided are a method for user authentication and/or identification, as well as a system and a non-transitory processor-readable medium configured to implement the method for user authentication and/or identification. The method comprises a step of acquiring, by one or more processors, an input image associated with a user (the input image shows at least the user face). Further, the method includes determining, by one or more processors, a position of the user face, and locating, by the one or more processors, multiple facial landmarks associated with the user. The method further includes transforming, by the one or more processors and based on the multiple facial landmarks, at least a portion of the input image into a uniform image of the user face. The transformation include spatial rotation, scaling, cropping, color and/or brightness adjustments, or a combination thereof. The method further includes retrieving, by the one or more processors, a feature vector based upon the uniform image of the user face, comparing the feature vector with at least one stored reference feature vector (using, for example, a machine- learning algorithm), and making an authentication and/or identification decision(s) with respect to the user based on the result of comparison. [0009] According to embodiments of the present disclosure, provided is also a method for registering users, or in other words, creating user profiles to be used in the method for user authentication and/or identification. The method for registering users includes acquiring one registration image associated with the authorized user, determining a facial position of the authorized user based on the registration image. Further, the method includes locating, based on the position, multiple facial landmarks associated with the authorized user, transforming at least a portion of the one registration image into a texture image of the authorized user face (e.g., a full-face image of the authorized user). Further, the method includes generating multiple angle shots of the authorized user face based upon the texture image. The method also includes creating a biometric pattern associated with the multiple angle shots of the user face and storing the biometric pattern associated with the authorized user in a database.

[0010] In one example embodiment, the generation process of the angle shots comprises the steps of finding similarity between the texture image and one of a plurality of reference two-dimensional (2D) facial images, wherein each one of the plurality of reference 2D facial images is associated with a reference three-dimensional (3D) facial image. Further, based on the similarity, the method selects the 3D facial image related to the reference 2D image being the most similar to the texture image. The generation method further includes superposing the texture image of the authorized user with multiple points associated with the pre-selected 3D facial image. Further, the 3D facial image and the texture image can be rotated to generate images of the multiple angle shots based upon the rotated 3D facial image and the texture image.

[0011] According to yet more embodiments of the present disclosure, provided is a method for facial recognition, a facial recognition system and a non-transitory processor-readable medium configured to implement the method for facial recognition. The method for facial recognition comprises the steps of acquiring a facial image of an individual, determining a facial position based on the facial image, locating multiple facial landmarks associated with the individual based on the position, transforming at least a portion of the facial image into a uniform facial image based on the multiple facial landmarks (the uniform facial image represents a full-face image of the individual). Further the method comprises creating a feature vector based upon the uniform facial image, comparing the feature vector with at least one stored reference feature vector and, based on the comparison, determining the identity of the individual. Other features, aspects, examples, and embodiments are described below.

BRIEF DESCRIPTION OF THE DRAWINGS [0012] Embodiments are illustrated by way of example, and not by limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

[0013] FIG. 1 is a high-level block diagram illustrating an example computing device suitable for implementing methods for facial recognition and user authentication as disclosed herein.

[0014] FIG. 2 shows a high-level process flow diagram of a method for user registration according to one exemplary embodiment.

[0015] FIG. 3 shows a high-level process flow diagram of a method for facial recognition according to one exemplary embodiment. [0016] FIG.4 shows a high-level process flow diagram of a method for user authentication according to one exemplary embodiment of the present disclosure.

[0017] FIG. 5 shows an exemplary face image and several facial landmarks, which can be used in methods for facial recognition and user authentication as described herein. [0018] FIG. 6 shows an input image of a user for the use in a method for facial recognition, according to one example embodiment, which image may also serve as a registration image in a method for user registration.

[0019] FIG. 7 shows an exemplary area of interest created by rotating and cropping of the input image shown in FIG. 6. [0020] FIG. 8 shows an exemplary uniform image of a user face suitable for the use in a method for facial recognition, which image may be also used as a texture image in a method for user registration.

[0021] FIG. 9 shows an exemplary three-dimensional image (depth map) corresponded to a selected image being the most similar to the texture image shown in FIG. 8.

[0022] FIG. 10 shows a graphical representation of the result of applying the texture image, as shown in FIG. 8, to a point cloud.

[0023] FIGs. 11A-11D illustrate exemplary angle shots based upon the to the registration image shown in FIG. 6.

[0024] FIG. 12 shows a graphical representation of forty Gabor filters suitable for implementing the methods described herein.

DETAILED DESCRIPTION

[0025] The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These example embodiments, which are also referred to herein as "examples," are described in enough detail to enable those skilled in the art to practice the present subject matter. The embodiments can be combined, other

embodiments can be utilized, or structural, logical, and electrical changes can be made without departing from the scope of what is claimed. The following detailed description is therefore not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents. In this document, the terms "a" and "an" are used, as is common in patent documents, to include one or more than one. In this document, the term "or" is used to refer to a nonexclusive "or," such that "A or B" includes "A but not B," "B but not A," and "A and B," unless otherwise indicated.

[0026] The techniques of the embodiments disclosed herein may be implemented using a variety of technologies. For example, the methods described herein may be implemented in software executing on a computer system or in hardware utilizing either a combination of microprocessors, controllers or other specially designed application-specific integrated circuits (ASICs), programmable logic devices, or various combinations thereof. In particular, the methods described herein may be implemented by a series of computer-executable instructions residing on a storage medium such as a disk drive, solid-state drive or on a computer-readable medium.

[0027] Generally speaking, facial recognition can be used by a computing device in various scenarios. For example, a computing device may use facial recognition to authenticate or authorize a user who attempts to gain access to one or more functionalities or data of the computing device or functionalities otherwise controlled by the computing device. In some common scenarios, the computing device may store facial images of one or more pre-authorized users. These facial images are referred herein as "registration images." When a user attempts to gain access to functionalities or data of the computing device, the computing device may capture an image of the user's face for authentication purposes. The computing device may then use facial recognition applications to compare the captured facial image to the enrollment images associated with authorized users. If the facial recognition applications determine match or acceptable level of similarity between the captured facial image and at least one of registration images, the computing device may authenticate the user, and grant access to requested functionalities or data. Similarly, this approach can be used in security systems to recognize people attempting to enter designated premises or land, crime detection systems, computer vision systems, gesture recognition and control systems, and so forth.

[0028] As was outlined above, the facial recognition are not accurate and may falsely accept or falsely reject a user when similarity of the user's image and one or more registration images is found incorrectly. Unauthorized users may leverage vulnerabilities of facial recognition to cause erroneous authentication. In contrast, an authorized user, that should be authenticated, may be declined to have access because a facial recognition system could not find similarity of a present user's image and a user's registration image captured some time ago. The finding similarities between the present user images and registration images greatly depend on a quality and number of registration images. In most scenarios, facial recognition systems maintain only one registration image per authorized user. Accordingly, when the facial recognition system takes an image of a user and light conditions, as well as camera position relative to the user significantly differ from those parameters relative to the registration image, FAR and FRR values may be significantly high. The present technology decreases FAR and FRR values and improves the quality and reliability of facial recognition systems by, but not limited to, artificially synthesizing facial foreshortening images with respect to an available registration image of authorized user, as well as the intelligent use of Gabor filters and linear discriminant analysis. Below is provided a detailed description of various embodiments of the present technology with reference to the drawings.

[0029] FIG. 1 is a high-level block diagram illustrating an example computing device 100 suitable for implementing the present technology. In particular, the computing device 100 may be used for facial recognition, user authentication, user authorization, and/or user registration as described herein. The computing device 100 may include, be, or be a part of one or more of a variety of types of devices, such as a general purpose computer, desktop computer, laptop computer, tablet computer, netbook, server, mobile phone, a smartphone, personal digital assistant, set-top box, television, door lock, watch, vehicle computer, electronic kiosk, automated teller machine, infotainment system, presence verification device, security device, surveillance device, among others. Furthermore, the computing device 100 may be an integrated part of another multi-component system such as a video surveillance system, access control system, among others.

[0030] As shown in FIG. 1, the computing device 100 includes one or more processors 102, memory 104, one or more storage devices 106, one or more input devices 108, one or more output devices 110, network interface 112, and an image sensor 114 (e.g. a camera or charged-coupled device (CCD)). One or more processors 102 are, in some examples, , configured to implement functionality and/or process instructions for execution within the computing device 100. For example, the processors 102 may process instructions stored in memory 104 and/or instructions stored on storage devices 106. Such instructions may include components of an operating system 118, a facial recognition module 120 and/or a user authentication module 122. Computing device 100 may also include one or more additional components not shown in FIG. 1, such as a power supply, a battery, a fan, a global positioning system (GPS) receiver, among others.

[0031] Memory 104, according to one example, is configured to store information within the computing device 100 during operation. Memory 104, in some example embodiments, may refer to a non-transitory computer- readable storage medium or a computer-readable storage device. In some examples, memory 104 is a temporary memory, meaning that a primary purpose of memory 104 may not be long-term storage. Memory 104 may also refer to a volatile memory, meaning that memory 104 does not maintain stored contents when memory 104 is not receiving power. Examples of volatile memories include random access memories (RAM), dynamic random access memories (DRAM), static random access memories (SRAM), and other forms of volatile memories known in the art. In some examples, memory 104 is used to store program instructions for execution by the processors 102. Memory 104, in one example, is used by software (e.g., operating system 118) or applications, such as facial recognition 120 and/or user authentication 122, executing on computing device 100 to temporarily store information during program execution. One or more storage devices 106 can also include one or more computer-readable storage media and/or computer-readable storage devices. In some embodiments, storage devices 106 may be configured to store greater amounts of information than memory 104. Storage devices 106 may further be configured for long-term storage of information. In some examples, the storage devices 106 include non-volatile storage elements. Examples of such non-volatile storage elements include magnetic hard discs, optical discs, solid-state discs, flash memories, forms of electrically programmable memories (EPROM) or electrically erasable and programmable memories, and other forms of non-volatile memories known in the art. [0032] Still referencing to FIG. 1, the computing device 100 may also include one or more input devices 108. The input devices 108 may be configured to receive input from a user through tactile, audio, video, or biometric channels. Examples of input devices 108 may include a keyboard, mouse, touchscreen, microphone, one or more video cameras, or any other device capable of detecting an input from a user or other source, and relaying the input to computing device 100, or components thereof. Though shown separately in FIG. 1, the image sensor 114 may, in some instances, be a part of input devices 108. It should be also noted that the image sensor 114 (e.g., a digital still and/or video camera) may be a peripheral device operatively connected to the computing device 100 via the network interface 112.

[0033] The output devices 110, in some examples, may be configured to provide output to a user through visual or auditory channels. Output devices 110 may include a video graphics adapter card, a liquid crystal display (LCD) monitor, a light emitting diode (LED) monitor, a sound card, a speaker, or any other device capable of generating output that may be intelligible to a user. Output devices 110 may also include a touchscreen, presence-sensitive display, or other input/output capable displays known in the art.

[0034] The computing device 100, in some example embodiments, also includes network interface 112. The network interface 112 can be utilized to communicate with external devices via one or more networks such as one or more wired, wireless or optical networks including, for example, the Internet, intranet, local area network (LAN), wide area network (WAN), cellular phone networks, Bluetooth radio, an IEEE 802.11-based radio frequency network, among others. The network interface 112 may be a network interface card, such as an Ethernet card, an optical transceiver, a radio frequency transceiver, or any other type of device that can send and receive information. Other examples of such network interfaces may include Bluetooth^®, 3G, 4G, and WiFi^® radios in mobile computing devices as well as USB.

[0035] The operating system 118 may control one or more functionalities of computing device 100 and/or components thereof. For example, the operating system 118 may interact with applications 124, including facial recognition 120 and user authentication 122, and may facilitate one or more interactions between applications 124 and one or more of processors 102, memory 104, storage devices 106, input devices 108, and output devices 110. As shown in FIG. 1, the operating system 118 may interact with or be otherwise coupled to facial recognition module 120, user authentication module 122, applications 124, and components thereof. In some embodiments, facial recognition module 120 and user authentication module 122 may be included in operating system 118. In these and other examples, facial recognition module 120 and user authentication module 122 may be part of applications 230, facial recognition module 120 and user authentication module 122 may be implemented externally to computing device 100 such as at a network location. In some such instances, computing device 100 may use the network interface 112 to access and implement functionalities provided by facial recognition module 120 and user authentication module 122, through methods commonly known as "cloud computing."

[0036] FIG. 2 shows a high-level process flow diagram of a method 200 for user registration (in other words, user enrolment) according to one exemplary embodiment. The method 200 may be performed by processing logic that may comprise hardware (e.g., one or more processors, controllers, dedicated logic, programmable logic, and microcode), software (such as software run on a general-purpose computer system or a dedicated machine, firmware), or a combination of both. In some example embodiments, the method 200 is implemented by the computing device 100 shown in FIG. 1, however, it should be appreciated that the method 200 is just one example operation of the computing device 100.

[0037] The method 200 commences at step 210 with the computing device 100 acquiring at least one registration image associated with an authorized user. The image may be a 2D image of the authorized user captured by the image sensor 114. Further, at step 220, the computing device 100 determines a facial position of the authorized user based on the at least one registration image. The determination of facial position may be needed for isolating (e.g. cropping) the authorized user face, which would simplify further processing steps. At step 230, the computing device 100 locates multiple facial landmarks associated with the authorized user. At step 240, the computing device 100 transforms at least a portion of the at least one registration image into a texture image of the authorized user face. At step 250, the computing device 100 generates multiple angle shots of the authorized user face based upon the texture image of the authorized user face. At step 260, the computing device 100 creates a biometric pattern associated with the multiple angle shots of the user face. Notably, the biometric pattern includes a feature vector, which components are associated with the angle shots. As will be explained below in details, the biometric pattern can be utilized for training one or more machine-learning algorithms handling face recognition and/or user authentication based on facial recognition. At step 270, the computing device 100 stores the biometric pattern associated with the authorized user in the memory 104 and/or storage device 106. In some example embodiments, the computing system 100 maintains user profiles, which comprise feature vectors and/or biometric patterns of authorized users. The user profiles can be later recalled when a new image is acquired to identify and/or authorize a user. The feature vectors may be based on various parameters such as pixel data, distances between certain facial landmarks, facial angle shots, among other things.

[0038] Once one or more authorized users are registered, the computing system 100 may operate to recognize faces, identify (authorize) users and optionally provide access to computing device functionalities, data, resources, premises, land, among others. FIG. 3 shows a high-level process flow diagram of a method 300 for facial recognition according to one exemplary embodiment of the present disclosure. The method 300 may be performed by processing logic that may comprise hardware (e.g., one or more processors, controllers, dedicated logic, programmable logic, and microcode), software (such as software run on a general-purpose computer system or a dedicated machine, firmware), or a combination of both. In some example embodiments, the method 300 is implemented by the computing device 100 shown in FIG. 1, however, it should be appreciated that the method 300 is just one example operation of the computing device 100.

[0039] FIG. 3 starts at step 310, when the computing device 100 acquires a facial image of an individual from the image sensor 114 or an analogues device. At step 320, the computing device 100 determines a facial position based on the facial image, which may be needed for cropping or isolating the individual's face. At step 330, the computing device 100 locates multiple facial landmarks associated with the individual based on the position determined. At step 340, the computing device 100 transforms, based on the multiple facial landmarks, at least a portion of the facial image into a uniform facial image. At step 350, the computing device 100 creates a feature vector based upon the uniform facial image. At step 360, the computing device 100 compares the feature vector with at least one stored reference feature vector associated with the authorized users (i.e., those users that registered using the method 200). At step 370, the computing device 100, based on the result of comparison, determines identity of the individual. [0040] Similar to the facial recognition method 300, FIG. 4 shows a high- level process flow diagram of a method 400 for user authentication according to one exemplary embodiment of the present disclosure. The method 400 may be performed by processing logic that may comprise hardware (e.g., one or more processors, controllers, dedicated logic, programmable logic, and microcode), software (such as software run on a general-purpose computer system or a dedicated machine, firmware), or a combination of both. In some example embodiments, the method 400 is implemented by the computing device 100 shown in FIG. 1, however, it should be appreciated that the method 400 is just one example operation of the computing device lOO.The method 400 starts at step 410, when the computing device 100 acquires an input image associated with a user (whereas the input image shows at least the user face) from the image sensor 114 or similar device. At step 420, the computing device 100 optionally determines a position of the user face. At step 430, the computing device 100 locates multiple facial landmarks associated with the user. At step 440, the computing device 100 transforms, based on the multiple facial landmarks, at least a portion of the input image into a uniform image of the user face. At step 450, the computing device 100 retrieves a feature vector based upon the uniform image of the user face. At step 460, the computing device 100 compares the feature vector with a stored reference feature vector associated with the authorized user. The comparison process may be based on the use of one or more machine-learning algorithms, although it is not required. At step 470, the computing device 100, based on the comparison, makes an authentication decision with respect to the user. The authentication decision can be further used in providing access for the authorized user to computing device functionalities, data, resources, or access to dedicated premises or land. The authentication decision can be also used for generating one or more control commands for the computing device 100, its elements, or any other suitable peripheral device. For example, a control command may control one or more devices, such as "unlock" computing devices.

[0041] As shown in FIGs. 2-4, the majority of operation steps replicate each other or similar to each other. Accordingly, the steps 220, 230, 240, 250, 260, 320, 330, 340, 350, 360, 420, 430, 440, 450, and 460 are further combined together and described in details below.

Determining Facial Position

[0042] Facial position is determined at least in steps 220, 320 and 420 based on an input image of a user/individual as received from the image sensor 114. In various example embodiments, facial position can be located and/or determined by a machine-learning algorithm, pattern recognition algorithms, and/or statistical analysis configured to search objects of a predetermined class (i.e., faces) in input images. Some examples of machine-learning algorithms include neural networks, heuristic methods, support vector machines, or a combination thereof.

[0043] One example of statistical analysis suitable for facial position detection includes Principal Components Analysis (PCA), which is a procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrected variables called principal components. This transformation can be defined in such a way that the first principal component has the largest possible variance, and each succeeding component in turn has the highest variance possible under the constraint that it be orthogonal to the preceding components.

[0044] Another example method suitable for facial position detection is Linear Discriminant Analysis (LDA) also known as Fisher's linear discriminant method. LDA may be used to find linear combination of features, which characterize or distinguish two or more classes or species of one class.

[0045] Another example method suitable for facial position detection is Viola-Jones object detection framework. This method adapted the idea of using Haar wavelets and developed so called Haar-like features. A Haar-like feature considers adjacent rectangular regions at a specific location in a detection window, sums up the pixel intensities in these regions and calculates the difference between them. This difference is then used to categorize subsections of an image. For example, consider an image database with human faces. It is a common observation that among all faces the region of the eyes is darker than the region of the cheeks. Therefore, a common Haar feature for face detection is a set of two adjacent rectangles that lie above the eye and the cheek region. The position of these rectangles is defined relative to a detection window that acts like a bounding box to the target object (the face in this case). In the detection phase of the Viola-Jones object detection framework, a window of the target size is moved over the input image, and for each subsection of the image the Haar-like feature is calculated. This difference is then compared to a learned threshold that separates non-objects from objects. Because such a Haar-like feature is only a weak learner or classifier (its detection quality is slightly better than random guessing) a large number of Haar-like features is necessary to describe an object with sufficient accuracy. In the Viola-Jones object detection framework, the Haar-like features are therefore organized in something called a classifier cascade to form a strong learner or classifier. One advantage of the Haar-like feature over most other features is its calculation speed. Due to the use of integral images, a Haar-like feature of any size may be calculated in constant time. Accordingly, the Viola-Jones object detection framework may serve one of preferred algorithms for facial position detection performed in the steps 220, 320 and 420 of the methods 200-400.

Locating Facial Landmarks

[0046] Generally, facial landmark points refer to various face elements such as inner/outer corner of eyes, eye centers (e.g., pupils), left/right corner of mouth, nose center, nose corners, ears, chin, eyebrows, among others. FIG. 5 shows an exemplary face image and some facial landmarks, which can be used by the methods for facial recognition and user authentication as described herein.

[0047] In example embodiments, facial landmark are located using at least one of Active Shape Model (ASM) searching process and Active Appearance Model (AAM) searching process. In an example, ASM searching process is a statistical model of the shape of various objects, which iteratively deform to fit to an example of the object in a new image. In general, the principles of ASM searching process consists in the use of statistical relations between mutual arrangements of landmarks associated with a new image and at least one reference image. The process, generally, includes two steps: (a) locating an area associated with an initial landmark point coordinates, and (b) iteratively adjust the statistical model to define landmark coordinates more precisely. This process is described below in greater details.

[0048] Assume that there is a training sample based on L images of faces shot full-face, with N marked landmarks for each face, wherein all landmarks are numbered in the same order. The following equation defines coordinates of landmarks (in the system of coordinates of the image):

(*ij> >¾^*) = l, N _where i = 1, 1 (Equation No. 1)

[0049] To bring point coordinates from all images to a uniform system, it is carried out Generalized Procrustes Analysis (GPA). In another example embodiment, when all face images have identical scale, it is possible to limit this process to centering. We will consider further that coordinates are centered. Now, L vectors of height 2N describing "form" or "contour" of arrangement of landmarks are defined as follows:

Sf = (*il - ^xiN >'tl - }¾V) . (Equation No. 2)

[0050] Assume also that an average of the above value is defined as follows:

L i=l (Equation No. 3) si— Si— S, i = 1, L (Equation No. 4)

[0051] Further, based on a set of vectors ^ΰ, the covariance matrix of their coordinates is calculated as:

L i=l (Equation No. 5)

[0052] Let -¾ define eigenvalues of the matrix K ordered in decreasing order with appropriate eigenvectors ^u£« ' ⁼ 1>²^ . Set of vectors ^ut is the basis of a 2N-dimensional vector space, and thus any vector ¾ can be presented in the form of their linear combination. Based on statistical relations between landmark point coordinates, the equation No. 5 can be replaced with the following approximation:

s_£ « s + b_iitu_t + b_ii2u₂ + b_Lvu_p = s + bi _{(Equation No 6)} where matrix ^φ includes V principal components, i.e. eigenvectors ^u J ~ ^' P, which eigenvectors refer to P the largest eigenvalues and is a vector having P coefficients (also known as parameters of model). The value P is defined as follows:

(Equation No. 7)

[0053] ASM model is defined by the matrix ^φ and the vector Any image/shape can approximately be described by means of the ASM model and parameters obtained from the equation:

?i = ¾ = <t>^r(s_i - s) . (Equation No. 8)

[0054] The vector ^s can represent a common pattern of landmarks arrangement and individual features of specific face shape.

[0055] According to example embodiments, the localization of landmarks in a new face image is carried out as follows. First, a face position is determined as described above in this paper. For example, Viola-Jones classifier is utilized, which returns an image with isolated user face.

[0056] An average shape defined by the vector s is coincided with the center of isolated user face. In some embodiments, the coordinates of the average form defined by * can be scaled, if required. The average shape defines initial approximation to landmark point coordinates, which can be considered as an initial iteration ^f ^ .

[0057] Further, for each landmark point with number i the Viola-Jones cascade classifier is trained. For the entire image, it generates a set of landmarks classified as landmarks with number z. When the classifier is trained, positive examples are those areas of the image centered in a landmark point, and negative examples are those areas crossed with positive examples.

[0058] At ^'-th iteration of algorithm and for z^'-th landmark, a corresponding cascade classifier Q is applied to a small area of the image centered in a landmark with coordinates defined as \ ¹ ' ^i+N ) . As the classifier generates some landmark points classified as anthropometrical, the located landmark point is considered as nearest to landmark with the coordinates \ ¹ ' ^l+,v / .

[0059] Assume ^c is a shape consisted of landmark points found by the cascade classifiers at ;-th iteration, and the coordinates of this shape are centered and divided by a scale coefficient t¹ . This shape may be checked for conformity with the statistical ASM model. The result of conformity may define the shape ^c^ for the next iteration. Procedure of landmark localization is repeated and terminates when a predetermined number of iterations is performed (for example, three iterations). Accordingly, landmark points may be represented by three-dimensional (3D) coordinates and associated with a facial image. The process of locating landmarks as described above can be used in the steps 230, 330 and 430 of the methods 200-400.

Face Image Transformation

[0060] The face image transformation also refers to image pre-processing required to bring the face image to a uniform image being in a better condition for further processing. For example, the image transformation may include the following operations. First, a color face image or its part can be transformed into a half-tone image or monotone (single-colored) image.

[0061] Second, an area of interest can be isolated from the image. This process can include the following steps: (a) rotating at least a portion of the face image until landmarks associated with user pupils are horizontally oriented; (b) scaling the face image (for example, until landmarks associated with user pupils are at a predetermined distance from each other, e.g. 60 pixels between the pupils); and/or (c) cropping the face image or its part to create an image of a predetermined pixel size. FIGs. 6 and 7 show example face images illustrating the above processes. Namely, FIG. 6 shows an exemplary input image of a user. FIG. 7 shows the same image as in FIG. 6, but subjected to the process of rotation and cropping as outlined above to isolate the area of interest, i.e. a user face.

[0062] Third, the image transformation may include adjusting a brightness of at least a portion of the face image. The brightness adjustment may include a number of processes including, but not limited to, Single Scale Retinex (SSR) filtering, homomorphic filtering based normalization technique, Discrete Cosine Transform (DCT) based normalization technique, wavelet based normalization technique, among others. In addition, the brightness adjustment may include histogram correcting of the face image. Furthermore, the brightness adjustment may include contrast enhancing. It should be noted that the above listed procedures are just examples that can be used for brightness adjustments, their order can be different from the listed above, and furthermore these example procedures can be used interchangeably and their parameters may vary in order to achieve better results in image processing. FIG. 8 shows an example face image shown in FIG. 7 and subject to the brightness adjustment as outlined above. In other words, FIG. 8 shows a uniform image of the user face. Accordingly, the image transformation process as described above is used in the steps 240, 340 and 440 of the methods 200-400.

Synthesizing Angle Shots

[0063] Generally, angle shots are artificial images taken or created by a virtual camera at an angle from the horizontal or vertical lines. In the present technology, angle shots may be created by rotation of 3D representation of a user face at different angles with a fixed virtual camera. The rotation may create any of a yaw angle, pitch angle, and/or roll angle. Accordingly, each angle shot may be characterized by yaw, pitch and roll angles. One example of the angle shot may be characterized as {yaw=15; pitch=-15; roll=10}. The synthesis (generation) of the angle shots includes the following operations performed by the computing device 100: superposing a texture image of an authorized user, such as one shown in FIG. 7, with multiple points associated with a reference three-dimensional (3D) facial image of the same or another user; rotating the reference 3D facial image with the texture image in concert with each other; and generating new images corresponded to multiple angle shots based upon the rotated reference 3D facial image with the texture image at different angles. This process is used in the step 250 of method 200 for user registration as described above.

[0064] Notably, for synthesizing of angle shots, it is necessary to maintain a database of 3D images of a group of people (preferably of different sex, age and ethnicity), wherein the reference 3D facial images are associated with their corresponding reference 2D half-tone or color images, which are also stored in the database such as the memory 104 or storage device 106. In some examples, the reference 2D facial images are uniform face images as discussed above.

[0065] Reference 3D images can be represented by depth maps, whereas each pixel of these images include information related to a distance between the image sensor 114 and certain parts of user face or other objects. The association of reference 2D images with the depth maps means that each pixel of reference 3D image also include data related to brightness and/or color. In some example embodiments, the reference 3D facial images associated with reference 2D images may be pre-processed/transformed as described above with reference to the steps 240, 340 and 440.

[0066] More particularly, the synthesis process of angle shots includes the following operations. The computing device 100 finds similarity between the texture image and one or more of the plurality of reference 3D and 2D images, and then, based on the similarity, the computing device 100 selects the most similar reference 3D and 2D image to the texture image. This process of finding similarity can be implemented by a machine-learning algorithm or a statistical analysis. Some examples of the methods suitable for finding similarity includes Principal Components Analysis (PCA) or a discriminant analysis such as Linear Discriminant Analysis (LDA) or Fisher's linear discriminant analysis. Referring back to the drawings, FIG. 9 shows an exemplary 3D image, which is selected by the computing device 100 as the most similar to the texture image such as the one shown in FIG. 8 (the corresponding 2D image of the selected 3D image is not shown).

[0067] Further, a homography-based process is utilized to match landmarks associated with the selected 2D image and landmarks associated with the texture image to find conformity there between. The homography- based process refers to perspective transformation of one plane into another. Therefore, having a first set of landmarks associated with one image and a second set of landmarks, being corresponded to the first set but associated with another image, it is possible to find conformity between these two images in the form of a homographic matrix. One example of homography based process for finding conformity is a Random Sample Consensus (RANSAC) method. RANSAC is an iterative process for estimating parameters from a set of observed data. A basic assumption of this process is that the observed data consists of "inliers," i.e., data whose distribution can be explained by a certain model, though may be subject to noise, and "outliers" which are data that do not fit said model. The outliers can come, for example, from extreme values of noise or from erroneous measurements or incorrect hypotheses about the interpretation of data. RANSAC process also assumes that, given a set of inliers, there exists a procedure which can estimate the parameters of a model that optimally explains or fits this data. In this technology, RANSAC makes an iterative estimation of model parameters for randomly selected landmarks.

[0068] Further, the depth map of the most similar 3D image located, such the one shown in FIG. 9, is transformed into a point cloud, or in other words, a vertex set in a 3D coordinate system. Further, the texture image is taken as a texture and applied to the point cloud. FIG. 10 shows the result of applying the texture image, as shown in FIG. 8, to a point cloud. Next, the point cloud along with the "attached" texture is rotated and multiple shots are taken at different angles which constitute angle shots. FIGs. 11A-11D illustrate exemplary angle shots created with respect to the registration image (such as the one shown in FIG. 6) and based on the technology described herein. Specifically, FIGs. 11A-11D show angle shots rotated at different angles relative to horizontal and vertical axis (i.e., yaw and pitch angles): FIG. 11A illustrates an angle shot taken at the yaw angle of +15 degrees, FIG. 11B illustrates an angle shot taken at the yaw angle of -15 degrees, FIG. 11C illustrates an angle shot taken at the pitch angle of +15 degrees, and FIG. 11D illustrates an angle shot taken at the pitch angle of -15 degrees.

Feature Vector Extraction

[0069] In general, features extracted from facial images may relate to pixel data such as coordinates, brightness, color, a depth value, among other things.

In the present technology, it is more preferable to use responses of a linear filter such as responses of two-dimensional Gabor filters applied to transformed (pre-processed) face images as the features.

[0070] In some example embodiments, responses can be calculated as follows. The impulse response of Gabor filter is defined as the product of

Gaussian function by a harmonic function. Accordingly, the Gabor filter can be defined, for example, as follows: ^J " " 7Γ 7

xt = x cos Θ 4- ¾/ sin #,

Vt = — sin f? + v cos ^, _T_ . _{χ τ} „ y^{t y} ' (Equation No. 9) where x and y are pixel coordinates, / is a frequency of a complex sine curve, Θ is a filter orientation, X is a spatial width of the filter along a sinusoidal wave, ^rl is a spatial width of the filter perpendicularly to wave.

[0071] In one example, each transformed image is convolved together with a set of 2D Gabor filters, which may include, but not limited to, forty various filters. FIG. 12 shows graphical representation of forty Gabor filters suitable for implementing the methods described herein.

[0072] In other example embodiments, the convolution in spatial area can be replaced with multiplication, in the frequency area, of Fourier images of the input image and the filter impulse response followed by inverse Fourier- transformation of the multiplication product.

[0073] In one example, a result of convolution of the transformed face image together with forty Gabor filters, 40 images are generated which are then consequently converted into a single vector. This vector further is considered as a feature vector analyzed with the image recognition method. The process of retrieving a feature vector as described above can be used in the steps 260, 350, and 450 of the methods 200-400. Comparing Feature Vectors

[0074] Comparing feature vectors with reference feature vectors required for the steps 360 and 460 of the methods 300, 400 can be implemented differently. In some examples, statistical algorithms can be utilized; in other examples, machine-learning algorithms can be utilized; and in yet more examples a combination of the foregoing can be utilized. In one embodiment, LDA or Fisher's linear discriminant method can be used to find similarity between feature vectors.

Conclusion

[0075] Thus, methods and systems for facial recognition and user authentication have been described. The technology described herein have significantly improved the quality and reliability of facial recognition especially in the cases when there is just a single user registration image is available. In particular, tests performed with respect to 100 individuals based on a traditional facial recognition methods utilizing a single full face image of each user for training a machine-learning algorithm shown that at FAR value of 0.1%, the probability of identification value is up to 85.60%. Similar tests performed utilizing the present technology, which used not only a single full face image of each user for training a machine-learning algorithm, but also four additional synthesized angle shots, the probability of identification value was increased up to 98.80% at FAR value of 0.1%. These results illustrate the great performance of the present technology enabling to use in conditions when user images are taken in conditions significantly different from those when a registration image was taken.

[0076] Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A method for user authentication, the method comprising:

acquiring, by one or more processors, an input image associated with a user, the input image shows at least the user face;

locating, by the one or more processors, multiple facial landmarks associated with the user;

transforming, by the one or more processors and based on the multiple facial landmarks, at least a portion of the input image into a uniform image of the user face;

retrieving, by the one or more processors, a feature vector based upon the uniform image of the user face;

comparing, by the one or more processors, the feature vector with at least one stored reference feature vector; and

based on the comparison, making, by the one or more processors, an authentication decision with respect to the user.

2. The method of claim 1, further comprising creating, by the one or more processors, at least one user profile, wherein the at least one user profile comprises at least a reference feature vector associated with an authorized user.

3. The method of claim 2, wherein the creating of the at least one user profile comprising:

acquiring, by the one or more processors, at least one registration image associated with the authorized user;

determining, by the one or more processors, a facial position of the authorized user based on the at least one registration image; locating, by the one or more processors, multiple facial landmarks associated with the authorized user;

transforming, by the one or more processors, at least a portion of the at least one registration image into a texture image of the authorized user face; generating, by the one or more processors, multiple angle shots of the authorized user face based upon the texture image of the authorized user face;

creating, by the one or more processors, a biometric pattern associated with the multiple angle shots of the user face; and

storing, by the one or more processors, the biometric pattern associated with the authorized user in a database.

4. The method of claim 3, wherein the biometric pattern comprises at least one feature vector associated with the multiple angle shots of the authorized user face.

5. The method of claim 4, wherein the feature vector includes pixel data of the texture image.

6. The method of claim 4, wherein the feature vector includes one or more impulse responses of a linear filter or Gabor filter.

7. The method of claim 3, wherein the generation of the multiple angle shots comprising:

superposing, by the one or more processors, the texture image of the authorized user with multiple points associated with a stored three- dimensional (3D) facial image; rotating, by the one or more processors, the 3D facial image with the texture image in concert with each other; and

generating, by the one or more processors, images of the multiple angle shots based upon the rotated the 3D facial image and with texture image.

8. The method of claim 7, wherein the texture image includes a full-face image of the authorized user.

9. The method of claim 7, further comprising:

finding similarity, by the one or more processors, between the texture image and one of a plurality of reference two-dimensional (2D) facial images, wherein each one of the plurality of reference 2D facial images is associated with a reference 3D facial image; and

based on the similarity, selecting, by the one or more processors, the 3D facial image related to the reference 2D image being the most similar to the texture image.

10. The method of claim 9, wherein the finding similarity is based on a discriminant analysis.

11. The method of claim 10, wherein the finding similarity is based on a Linear Discriminant Analysis (LDA) or Gabor filter analysis.

12. The method of claim 9, wherein the finding similarity comprising matching, by the one or more processors, first facial landmarks associated with the texture image and second facial landmarks associated with the plurality of reference 2D facial images.

13. The method of claim 1, further comprising:

determining, by the one or more processors, a position of the user face; and

isolating, by the one or more processors, an image area associated with the user face from the input image, wherein the transforming is performed with respect to the image area.

14. The method of claim 1, wherein the machine-learning algorithm comprises one or more heuristic algorithms, one or more support vector machines, one or more neural network algorithms, or a combination thereof.

15. The method of claim 1, wherein the determination of the position of the user face comprising a facial object recognition processing.

16. The method of claim 15, wherein the facial object recognition processing includes Viola-Jones facial recognition processing.

17. The method of claim 1, wherein the multiple facial landmarks relate to at least one of the following: a pupil, an eye corner, a nose, a nose corner, and a mouth corner.

18. The method of claim 1, wherein the locating of the multiple facial landmarks comprising at least one of: an Active Shape Model (ASM) searching and an Active Appearance Model (AAM) searching.

19. The method of claim 1, wherein the transforming of the at least a portion of the input image to generate the uniform image of the user face comprising: transforming the at least a portion of the input image into a half-tone image, wherein the input image is a color picture.

20. The method of claim 1, wherein the transforming of the at least a portion of the input image to generate the uniform image of the user face comprising: rotating of the at least a portion of the input image until facial landmarks associated with user pupils are horizontally oriented.

21. The method of claim 1, wherein the transforming of the at least a portion of the input image to generate the uniform image of the user face comprising: scaling of the at least a portion of the input image.

22. The method of claim 21, wherein the scaling is performed until a distance between facial landmarks related to user pupils is of a predetermined value.

23. The method of claim 1, wherein the transforming of the at least a portion of the input image to generate the uniform image of the user face comprising: cropping the at least a portion of the input image to generate an image of a predetermined pixel size.

24. The method of claim 1, wherein the transforming of the at least a portion of the input image to generate the uniform image of the user face comprising: adjusting a brightness of the at least a portion of the input image.

25. The method of claim 24, wherein the adjusting of the brightness comprises Single Scale Retinex (SSR) filtering.

26. The method of claim 24, wherein the adjusting of the brightness comprises histogram correcting of the at least a portion of the input image.

27. The method of claim 24, wherein the adjusting of the brightness comprises contrast enhancing.

28. A system for user authentication, the system comprising: at least one processor and a memory having processor-readable code embodied therein for programming the processor to perform a face recognition method, wherein the method comprises:

acquiring, from an image sensor, an input image associated with a user, the input image shows at least the user face;

locating multiple facial landmarks associated with the user face;

transforming, based on the multiple facial landmarks, at least a portion of the input image into a uniform image of the user;

retrieving a feature vector based upon the uniform image of the user face;

comparing the feature vector with at least one stored reference feature vector using a machine-learning algorithm; and

based on the comparison, making an authentication decision with respect to the user.

29. A non-transitory processor-readable medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method for user authentication, the method comprising:

acquiring, from an image sensor, an input image associated with a user, the input image shows at least the user face; locating multiple facial landmarks associated with the user face;

retrieving a feature vector based upon the uniform image of the user face;

comparing the feature vector with at least one stored reference feature vector; and

30. A method for facial recognition, the method comprising:

acquiring, by one or more processors, a facial image of an individual- determining, by the one or more processors, a facial position based on the facial image;

locating, by the one or more processors, multiple facial landmarks associated with the individual;

transforming, by the one or more processors and based on the multiple facial landmarks, at least a portion of the facial image into a uniform facial image, wherein the uniform facial image represents a full-face image of the individual;

creating, by the one or more processors, a feature vector based upon the uniform facial image;

based on the comparison, determining, by the one or more processors, identity of the individual.

31. A system for facial recognition, the system comprising: at least one processor and a memory having processor-readable code embodied therein for programming the processor to perform a face recognition method, wherein the method comprises:

acquiring, by one or more processors, a facial image of an individual; determining, by the one or more processors, a facial position based on the facial image;

locating, by the one or more processors, multiple facial landmarks associated with the individual- transforming, by the one or more processors and based on the multiple facial landmarks, at least a portion of the facial image into a uniform facial image, wherein the uniform facial image represents a full-face image of the individual;

32. A non-transitory processor-readable medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method for facial recognition, the method comprising:

acquiring, by one or more processors, a facial image of an individual- determining, by the one or more processors, a facial position based on the facial image; locating, by the one or more processors, multiple facial landmarks associated with the individual- transforming, by the one or more processors and based on the multiple facial landmarks, at least a portion of the facial image into a uniform facial image, wherein the uniform facial image represents a full-face image of the individual;