WO2013079766A1 - Methods, apparatuses and computer program products for generating a new subspace representation for faces that improves discriminant analysis - Google Patents

Methods, apparatuses and computer program products for generating a new subspace representation for faces that improves discriminant analysis Download PDF

Info

Publication number
WO2013079766A1
WO2013079766A1 PCT/FI2012/050948 FI2012050948W WO2013079766A1 WO 2013079766 A1 WO2013079766 A1 WO 2013079766A1 FI 2012050948 W FI2012050948 W FI 2012050948W WO 2013079766 A1 WO2013079766 A1 WO 2013079766A1
Authority
WO
WIPO (PCT)
Prior art keywords
faces
face
clean
person
undesirable
Prior art date
Application number
PCT/FI2012/050948
Other languages
French (fr)
Inventor
Krishna Annasagar Govindarao
Basavaraja S V
Gururaj Gopal Putraya
Pranav Mishra
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Publication of WO2013079766A1 publication Critical patent/WO2013079766A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering

Definitions

  • An example embodiment of the invention relates generally to imaging processing technology and more particularly relates to a method, apparatus, and computer program product for providing an efficient and reliable manner in which to perform face recognition.
  • face detection and recognition is becoming an increasingly important technology.
  • face recognition may be useful in biometrics, user interface, and other areas such as creating context for accessing communities in the mobile domain. Face recognition may also be important going forward in relation to initiatives such as metadata standardization.
  • LDA linear discriminant analysis
  • the choice of representing faces (e.g., by using PCA and/or LDA) and the choice of training data may have a significant effect on the performance for unseen/unexamined test data.
  • one of the main problems in face recognition is because of intra personal variability in a person's face due to expression, occlusion (e.g., glasses), lighting variation among other factors. These intra personal variability features may make it more difficult to perform face recognition in an efficient and accurate manner.
  • an example embodiment may generate a new subspace (e.g., an interim representation) onto which face features are projected for dimensionality reduction before a discriminant analysis (e.g., linear discriminant analysis) is performed.
  • the new subspace may be a clean face space that is generated based in part on selecting training data (e.g., training images).
  • the clean face space may exclude intra person variations (e.g., a person's expression, occlusion, lighting variations, etc.) that may be particular to a person captured in an image of the training data in order to obtain increased face recognition accuracy.
  • the training images may be selected for building an interim representation (e.g., a clean face space) before applying discriminant analysis such that the images do not include any undesirable intra person variations.
  • the discriminant matrix obtained may be highly robust to lighting, emotion, expression, pose, occlusion and other factors in the face recognition.
  • the example embodiments may also utilize any object recognition kind of pattern classification task(s).
  • the example embodiments may be applicable to discriminant analysis tasks and, may help in obtaining a more useful discriminant matrix such that the functional performance or facial classification accuracy is greatly increased.
  • the example embodiments may be utilized for discriminant analysis and similar mechanisms in pattern recognition, the example embodiments may also be utilized to more efficiently detect/recognize faces as well as to group/cluster a set of detected faces. Face images that include pixel values typically may not be directly used for classification. As such, some example embodiments may transform the pixels to some other features which are better suited for classification. These features may be chosen and utilized for face clustering in which faces of the same person that are determined to be close to one another (e.g., similar faces of a same person) may be grouped together, whereas faces of different persons that are determined to be distant from each other may be excluded from the group.
  • an example embodiment may analyze 1000 faces of 10 different people and may have no prior indication of which face belongs to which person, the example embodiments may still be able to automatically group faces of the same person.
  • the faces of the same person are very close (e.g., part of the group) and faces of the different people are far apart from each other (e.g., excluded from the group).
  • discriminant techniques are employed by some example embodiments of the invention.
  • a transformation may be computed by an example embodiment such that the within-class scatter is decreased while the between-class scatter is increased. This may improve the discriminability of a face classifier for recognizing a face or a grouping of faces of the same person from different images.
  • the example embodiments may provide robustness against unwanted intra person variations based in part on using a generated clean face subspace.
  • the clean face subspace may be built by selecting training data (e.g., training images) intelligently. Since no special preprocessing may be needed to nullify any unwanted intra person variations, the computational complexity and/or memory complexity of the example embodiments may not necessarily be increased.
  • Some example embodiments may provide advantages/benefits over existing techniques by using a novel representation of new subspace such as for example a clean face space that excludes unwanted intra personal variations while performing PCA and/or discriminant analysis. Additionally, the example embodiments may enable selection of training data to improve performance of the subspace by rejecting a null space (e.g., undesired features (e.g., undesired intra personal features)).
  • a null space e.g., undesired features (e.g., undesired intra personal features)
  • a method for generating a new subspace onto which face features are projected for determining a discriminant matrix.
  • the method may include utilizing one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons.
  • the method may also include determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces.
  • the data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
  • an apparatus for generating a new subspace onto which face features are projected for determining a discriminant matrix.
  • the apparatus may include a processor and a memory including computer program code.
  • the memory and computer program code are configured to, with the processor, cause the apparatus to at least perform operations including utilizing one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons.
  • the method may also include determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces.
  • the data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
  • a computer program product for generating a new subspace onto which face features are projected for determining a discriminant matrix.
  • the computer program product includes at least one computer-readable storage medium having computer-executable program code instruction stored therein.
  • the computer-executable program code instructions may include program code instructions configured to utilize one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons.
  • the program code instructions may also determine one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces. The data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
  • an apparatus for generating a new subspace onto which face features are projected for determining a discriminant matrix.
  • the apparatus may include means for utilizing one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons.
  • the apparatus may also include means for determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces. The data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
  • An example embodiment of the invention may provide a better user experience given the ease, efficiency and accuracy in performing face recognition via a communication device. For example, an example embodiment of the invention may improve the efficiency and accuracy in recognizing faces and in grouping a set of images of a same face associated with a person. As a result, device users may enjoy improved capabilities with respect to face recognition. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • FIG. 1 is a schematic block diagram of a system according to an example embodiment of the invention.
  • FIG. 2 is a schematic block diagram of an apparatus according to an example embodiment of the invention.
  • FIGS. 3A & 3B are diagrams of images in a pixel domain according to an example embodiment of the invention.
  • FIGS. 3C & 3D are diagrams of images corresponding to a clean face space according to an example embodiment of the invention.
  • FIG. 4 is a flowchart of an example method for generating a discriminant matrix according to an example embodiment of the invention.
  • FIG. 5 is a flowchart of an example method for generating a discriminant matrix according to another example embodiment of the invention.
  • circuitry refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present.
  • This definition of 'circuitry' applies to all uses of this term herein, including in any claims.
  • the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware.
  • the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
  • an Eigen face may denote one or more Eigen vectors utilized for identifying features of a face.
  • a clean face(s) and/or a clean face space(s) or the like may, but need not, refer to a face without undesirable intra person variations.
  • intra person variability, intra personal variability, intra personal variations or similar terms may be referred to interchangeably to denote variability in a person's face of an image.
  • the intra person variability of a person's face may be due to, but is not limited to, expression, emotion, a pose, occlusion, lighting variations and any other suitable factors.
  • a clean face(s) may be a face of a person that does not include intra personal variations (e.g., expression, emotion, a pose, occlusion, lighting variations and any other suitable factors) or in which the effect of the intra personal variations is reduced.
  • intra personal variations e.g., expression, emotion, a pose, occlusion, lighting variations and any other suitable factors
  • FIG. 1 illustrates a block diagram in which a device such as a mobile terminal 10 is shown in an example communication environment.
  • a system in accordance with an example embodiment of the invention may include a first communication device (e.g., mobile terminal 10) and a second communication device 20 capable of communication with each other via a network 30.
  • a first communication device e.g., mobile terminal 10
  • a second communication device 20 capable of communication with each other via a network 30.
  • an embodiment of the invention may further include one or more additional communication devices, one of which is depicted in FIG. 1 as a third communication device 25.
  • not all systems that employ an embodiment of the invention may comprise all the devices illustrated and/or described herein.
  • While an embodiment of the mobile terminal 10 and/or second and third communication devices 20 and 25 may be illustrated and hereinafter described for purposes of example, other types of terminals, such as personal digital assistants (PDAs), pagers, mobile televisions, mobile telephones, gaming devices, laptop computers, cameras, video recorders, audio/video players, radios, global positioning system (GPS) devices, Bluetooth headsets, Universal Serial Bus (USB) devices or any combination of the aforementioned, and other types of voice and text communications systems, can readily employ an embodiment of the invention.
  • PDAs personal digital assistants
  • GPS global positioning system
  • Bluetooth headsets Bluetooth headsets
  • USB Universal Serial Bus
  • the network 30 may include a collection of various different nodes (of which the second and third communication devices 20 and 25 may be examples), devices or functions that may be in communication with each other via corresponding wired and/or wireless interfaces.
  • the illustration of FIG. 1 should be understood to be an example of a broad view of certain elements of the system and not an all inclusive or detailed view of the system or the network 30.
  • the network 30 may be capable of supporting communication in accordance with any one or more of a number of First-Generation (1G), Second-Generation (2G), 2.5G, Third-Generation (3G), 3.5G, 3.9G, Fourth-Generation (4G) mobile communication protocols, Long Term Evolution (LTE), LTE advanced (LTE-A) and/or the like.
  • the network 30 may be a point-to- point (P2P) network.
  • One or more communication terminals such as the mobile terminal 10 and the second and third communication devices 20 and 25 may be in communication with each other via the network 30 and each may include an antenna or antennas for transmitting signals to and for receiving signals from a base site, which could be, for example a base station that is a part of one or more cellular or mobile networks or an access point that may be coupled to a data network, such as a Local Area Network (LAN), a Metropolitan Area Network (MAN), and/or a Wide Area Network (WAN), such as the Internet.
  • LAN Local Area Network
  • MAN Metropolitan Area Network
  • WAN Wide Area Network
  • other devices such as processing elements (e.g., personal computers, server computers or the like) may be coupled to the mobile terminal 10 and the second and third communication devices 20 and 25 via the network 30.
  • the mobile terminal 10 and the second and third communication devices 20 and 25 may be enabled to communicate with the other devices or each other, for example, according to numerous communication protocols including Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various communication or other functions of the mobile terminal 10 and the second and third communication devices 20 and 25, respectively.
  • HTTP Hypertext Transfer Protocol
  • the mobile terminal 10 and the second and third communication devices 20 and 25 may communicate in accordance with, for example, radio frequency (RF), near field communication (NFC), Bluetooth (BT), Infrared (IR) or any of a number of different wireline or wireless communication techniques, including Local Area Network (LAN), Wireless LAN (WLAN), Worldwide Interoperability for Microwave Access (WiMAX), Wireless Fidelity (WiFi), Ultra-Wide Band (UWB), Wibree techniques and/or the like.
  • RF radio frequency
  • NFC near field communication
  • BT Bluetooth
  • IR Infrared
  • LAN Local Area Network
  • WLAN Wireless LAN
  • WiMAX Worldwide Interoperability for Microwave Access
  • WiFi Wireless Fidelity
  • UWB Ultra-Wide Band
  • Wibree techniques and/or the like.
  • the mobile terminal 10 and the second and third communication devices 20 and 25 may be enabled to communicate with the network 30 and each other by any of numerous different access mechanisms.
  • W-CDMA Wideband Code Division Multiple Access
  • CDMA2000 Global System for Mobile communications
  • GSM Global System for Mobile communications
  • GPRS General Packet Radio Service
  • WLAN Wireless Local Area Network
  • WiMAX Wireless Fidelity
  • DSL Digital Subscriber Line
  • Ethernet Ethernet and/or the like.
  • the first communication device e.g., the mobile terminal
  • a mobile communication device such as, for example, a wireless telephone or other devices such as a personal digital assistant (PDA), mobile computing device, camera, video recorder, audio/video player, positioning device, game device, television device, radio device, or various other like devices or combinations thereof.
  • PDA personal digital assistant
  • the second communication device 20 and the third communication device 25 may be mobile or fixed communication devices.
  • the second communication device 20 and the third communication device 25 may be servers, remote computers or terminals such as, for example, personal computers (PCs) or laptop computers.
  • the network 30 may be an ad hoc or distributed network arranged to be a smart space.
  • devices may enter and/or leave the network 30 and the devices of the network 30 may be capable of adjusting operations based on the entrance and/or exit of other devices to account for the addition or subtraction of respective devices or nodes and their corresponding capabilities.
  • one or more of the devices in communication with the network 30 may employ a face recognizer (e.g., face recognizer 78 of FIG. 2).
  • the face recognizer may generate a clean face space that does not include intra person variations of an image(s) of a face(s). This clean face space may be utilized in part to generate a discriminant matrix.
  • the face recognizer may perform face recognition and/or face clustering in part based on the clean face space. For example, with respect to face clustering, the face recognizer may be able to group or cluster a set of images of a same person more efficiently and reliably since the images may not have some intra person variations which may make the grouping/clustering more complex.
  • the intra person variations may include, but are not limited to, lighting variations (e.g., shadows on a face, bright and/or lowly lit lighting on a face, etc.), expression/emotion (e.g., a smile, laughter, a frown, anger, sadness, happiness, blinking, etc.), a pose (e.g., frontal, half profile, full profile, etc.), occlusion (e.g., facial hair, glasses, clothes, etc.) and any other suitable variations.
  • lighting variations e.g., shadows on a face, bright and/or lowly lit lighting on a face, etc.
  • expression/emotion e.g., a smile, laughter, a frown, anger, sadness, happiness, blinking, etc.
  • a pose e.g., frontal, half profile, full profile, etc.
  • occlusion e.g., facial hair, glasses, clothes, etc.
  • the mobile terminal 10 and the second and third communication devices 20 and 25 may be configured to include the face recognizer.
  • the mobile terminal 10 may include the face recognizer and the second and third communication devices 20 and 25 may be network entities such as servers or the like that may be configured to communicate with each other and/or the mobile terminal 10.
  • the second communication device 20 may be a dedicated server (or server bank) associated with a particular information source or service (e.g., a face recognition service, media provision service, etc.) or the second communication device 20 may be a backend server associated with one or more other functions or services.
  • the second communication device 20 may represent a potential host for a plurality of different services or information sources.
  • the functionality of the second communication device 20 may be provided by hardware and/or software components configured to operate in accordance with techniques for the provision of information to users of communication devices. However, at least some of the functionality provided by the second communication device 20 may be information provided in accordance with an example embodiment of the invention.
  • the second communication device 20 may host an apparatus for providing a localized face recognition service to a device (e.g., mobile terminal 10) practicing an embodiment of the invention.
  • the localized face recognition service may store one or more images of one or more faces and associated metadata identifying individuals corresponding to the faces.
  • the second communication device 20 may match the received face data with a corresponding individual and may provide information to a device (e.g., mobile terminal 10) identifying an individual(s) that corresponds to the face data.
  • the third communication device 25 may also be a server providing a number of functions or associations with various information sources and services (e.g., a face recognition service, a media provision service, etc.).
  • the third communication device 25 may host an apparatus for providing a localized face recognition service that provides information (e.g., face data, etc.) to enable the second communication device 20 to provide face recognition information to a device (e.g., mobile terminal 10) practicing an embodiment of the invention.
  • the face recognition information provided by the third communication device 25 to the second communication device 20 may be used by the second communication device to provide information to a device (e.g., mobile terminal 10) identifying an individual(s) that corresponds to received face data recognized or extracted from an image(s).
  • the mobile terminal 10 may itself perform an example embodiment.
  • the second and third communication devices 20 and 25 may facilitate (e.g., by the provision of face recognition information) operation of an example embodiment at another device (e.g., the mobile terminal 10).
  • the second and third communication devices 20 and 25 may not be included at all.
  • FIG. 2 illustrates a schematic block diagram of an apparatus for recognizing one or more faces of one or more images according to an example embodiment.
  • An example embodiment of the invention will now be described with reference to FIG. 2, in which certain elements of an apparatus 50 are displayed.
  • the apparatus 50 of FIG. 2 may be employed, for example, on the mobile terminal 10 (and/or the second communication device 20 or the third communication device 25).
  • the apparatus 50 may be embodied on a network device of the network 30.
  • the apparatus 50 may alternatively be embodied at a variety of other devices, both mobile and fixed (such as, for example, any of the devices listed above).
  • an embodiment may be employed on a combination of devices.
  • one embodiment of the invention may be embodied wholly at a single device (e.g., the mobile terminal 10), by a plurality of devices in a distributed fashion (e.g., on one or a plurality of devices in a P2P network) or by devices in a client/server relationship.
  • a single device e.g., the mobile terminal 10
  • a plurality of devices in a distributed fashion (e.g., on one or a plurality of devices in a P2P network) or by devices in a client/server relationship.
  • the devices or elements described below may not be mandatory and thus some may be omitted in a certain embodiment.
  • the apparatus 50 may include or otherwise be in communication with a processor 70, a user interface 67, a communication interface 74, a memory device 76, a display 85, a face recognizer 78, and a camera module 36.
  • the display 85 may be a touch screen display.
  • the memory device 76 may include, for example, volatile and/or non-volatile memory.
  • the memory device 76 may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like processor 70).
  • the memory device 76 may be a tangible memory device that is not transitory.
  • the memory device 76 may be configured to store information, data, files, applications, instructions or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the invention.
  • the memory device 76 could be configured to buffer input data for processing by the processor 70.
  • the memory device 76 could be configured to store instructions for execution by the processor 70.
  • the memory device 76 may be one of a plurality of databases that store information and/or media content (e.g., images, pictures, videos, etc.).
  • the memory device 76 may store one or more images which may, but need not, include one or more images of faces of individuals. Features may be extracted from the images by the processor 70 and/or the face recognizer 78 and the extracted features may be evaluated by the processor 70 and/or the face recognizer 78 to determine whether the features relate to one or more faces of individuals.
  • the apparatus 50 may, in one embodiment, be a mobile terminal (e.g., mobile terminal 10) or a fixed communication device or computing device configured to employ an example embodiment of the invention.
  • the apparatus 50 may be embodied as a chip or chip set.
  • the apparatus 50 may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard).
  • the structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon.
  • the apparatus 50 may therefore, in some cases, be configured to implement an embodiment of the invention on a single chip or as a single "system on a chip.”
  • a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • the chip or chipset may constitute means for enabling user interface navigation with respect to the functionalities and/or services described herein.
  • the processor 70 may be embodied in a number of different ways.
  • the processor 70 may be embodied as one or more of various processing means such as a coprocessor, microprocessor, a controller, a digital signal processor (DSP), processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • the processor 70 may be configured to execute instructions stored in the memory device 76 or otherwise accessible to the processor 70.
  • the processor 70 may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the invention while configured accordingly.
  • the processor 70 when the processor 70 is embodied as an ASIC, FPGA or the like, the processor 70 may be specifically configured hardware for conducting the operations described herein.
  • the processor 70 when the processor 70 is embodied as an executor of software instructions, the instructions may specifically configure the processor 70 to perform the algorithms and operations described herein when the instructions are executed.
  • the processor 70 may be a processor of a specific device (e.g., a mobile terminal or network device) adapted for employing an embodiment of the invention by further configuration of the processor 70 by instructions for performing the algorithms and operations described herein.
  • the processor 70 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 70.
  • ALU arithmetic logic unit
  • the processor 70 may be configured to operate a connectivity program, such as a browser, Web browser or the like.
  • the connectivity program may enable the apparatus 50 to transmit and receive Web content, such as for example location-based content or any other suitable content, according to a Wireless Application Protocol (WAP), for example.
  • WAP Wireless Application Protocol
  • the processor 70 may also be in communication with a display 85 and may instruct the display to illustrate any suitable information, data, content (e.g., media content) or the like.
  • the communication interface 74 may be any means such as a device or circuitry embodied in either hardware, a computer program product, or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 50.
  • the communication interface 74 may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network (e.g., network 30).
  • the communication interface 74 may alternatively or also support wired communication.
  • the communication interface 74 may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB), Ethernet or other mechanisms.
  • the user interface 67 may be in communication with the processor 70 to receive an indication of a user input at the user interface 67 and/or to provide an audible, visual, mechanical or other output to the user.
  • the user interface 67 may include, for example, a keyboard, a mouse, a joystick, a display, a touch screen, a microphone, a speaker, or other input/output mechanisms.
  • the apparatus is embodied as a server or some other network devices
  • the user interface 67 may be limited, remotely located, or eliminated.
  • the processor 70 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface, such as, for example, a speaker, ringer, microphone, display, and/or the like.
  • the processor 70 and/or user interface circuitry comprising the processor 70 may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor 70 (e.g., memory device 76, and/or the like).
  • computer program instructions e.g., software and/or firmware
  • a memory accessible to the processor 70 e.g., memory device 76, and/or the like.
  • the apparatus 50 may include a media capturing element, such as camera module 36.
  • the camera module 36 may include a camera, video and/or audio module, in communication with the processor 70 and the display 85.
  • the camera module 36 may be any means for capturing an image, video and/or audio for storage, display or transmission.
  • the camera module 36 may include a digital camera capable of forming a digital image file from a captured image.
  • the camera module 36 may include all hardware, such as a lens or other optical component(s), and software necessary for creating a digital image file from a captured image.
  • the camera module 36 may include only the hardware needed to view an image, while a memory device (e.g., memory device 76) of the apparatus 50 stores instructions for execution by the processor 70 in the form of software necessary to create a digital image file from a captured image.
  • the camera module 36 may further include a processing element such as a co-processor which assists the processor 70 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data.
  • the encoder and/or decoder may encode and/or decode according to a Joint Photographic Experts Group, (JPEG) standard format or another like format.
  • JPEG Joint Photographic Experts Group
  • the camera module 36 may provide live image data to the display 85.
  • the camera module 36 may facilitate or provide a camera view to the display 85 to show live image data, still image data, video data, or any other suitable data.
  • the display 85 may be located on one side of the apparatus 50 and the camera module 36 may include a lens, and/or a viewfmder positioned on the opposite side of the apparatus 50 with respect to the display 85 to enable the camera module 36 to capture images on one side of the apparatus 50 and present a view of such images to the user positioned on the other side of the apparatus 50.
  • the processor 70 may be embodied as, include or otherwise control the face recognizer.
  • the face recognizer 78 may be any means such as a device or circuitry operating in accordance with software or otherwise embodied in hardware or a combination of hardware and software (e.g., processor 70 operating under software control, the processor 70 embodied as an ASIC or FPGA specifically configured to perform the operations described herein, or a combination thereof) thereby configuring the device or circuitry to perform the corresponding functions of the face recognizer 78 as described below.
  • a device or circuitry e.g., the processor 70 in one example
  • executing the software forms the structure associated with such means.
  • the face recognizer 78 may be in communication with the processor 70, the camera module 36 and the memory device 76 (e.g., via the processor 70). In this regard, the face recognizer 78 may extract one or more features from one or more images captured by the camera module 36 and based in part on the extracted features, the face recognizer 78 may determine whether the extracted features relate to a face(s) of an individual(s).
  • the face recognizer 78 may analyze data stored in the memory device 76 to determine whether the memory device 76 stores face data relating to an individual that also corresponds to the same individual identified in an image being newly captured by the camera module 36.
  • the face recognizer 78 may generate a new representation (e.g., a subspace) for images of faces that belong to a subspace category of representations (e.g. an Eigen face).
  • This subspace referred to herein as the clean face space may be constructed based on training data including clean faces.
  • a problem in face recognition is due to the intra personal variability in a person's face due to expression (e.g., a smile, a blink, etc.), occlusion (glasses, etc.), lighting variations and other factors.
  • the training images may be selected from the memory device 76 by the face recognizer 78 to enable the face recognizer 78 to build an interim representation before applying discriminant analysis (e.g., LDA) such that the images of the interim representation do not have any of these undesirable intra personal variations.
  • the interim representation generated by the face recognizer 78 may be utilized for Principal Component Analysis prior to applying discriminant analysis (e.g., LDA).
  • the PCA may reduce the dimensionality of the selected images.
  • the PCA may be computed by the face recognizer 78 from the clean face space. Therefore, faces without glasses, without facial hair, without expression variation, and without any other suitable intra personal variations may be selected for obtaining the principal component vectors or Eigen vectors that represent the clean face space.
  • the interim representation may not have images of faces with undesirable variations which may cause problems for face recognition/face clustering.
  • the images of the faces of the clean space face may have neutral expression, good lighting, etc.
  • the clean faces dataset may be stored by the face recognizer 78 in the memory device 76.
  • the clean face dataset may refer to a collection of images of faces which do not have undesirable variations (e.g., intra person variations).
  • this dataset may be used by the face recognizer 78 to obtain the interim representation (e.g., clean face subspace).
  • the interim representation e.g., clean face subspace.
  • Using the generated interim representation before applying LDA in performing face recognition and/or face clustering, may help in a variety of ways. For instance, it may make the apparatus 50 less sensitive to the exact choice of Eigen vectors chosen after PCA, and it may make the obtained discriminant matrix extremely robust to unseen test data and also reduces the dimensionality of the feature vectors, thus alleviating the singularity problem in LDA to some extent.
  • a LDA matrix generated from clean face PCA may be generated in 2 modes, such as (a) in offline mode, and (b) in online mode. Both of these modes are described more fully below.
  • a labeled set of training data may be projected on the clean face space and this is used for LDA matrix generation.
  • the online training data may be projected on the LDA vectors and stored as templates for each class (or person). These templates may then be used to classify a given unseen test face.
  • An unseen test data/face may be a face provided for classifying into any of the trained classes.
  • online training data may be used to generate the LDA matrix. Then the training data may be projected on the LDA vectors and stored as templates for each class (or person). These templates may then be used to classify a given unseen test face.
  • An unseen test data/face may be a face provided for classifying into any of the trained classes.
  • an appropriate number of eigenvectors may be chosen from the clean face PCA.
  • regularization techniques may also be utilized to overcome the singularity issues during LDA phase.
  • the null space of the Eigen space computed for the features using the clean data may be discarded by the face recognizer 78.
  • the face recognizer 78 may discard the null space by dropping some Eigen vectors (e.g., Eigen vectors having the least Eigen values).
  • This reduced set of Eigen vectors may include or represent the clean face space (also referred to herein as clean face subspace).
  • a different set of training data may be utilized by the face recognizer 78.
  • the different set of training data (e.g., image 3 of FIG. 3A, image 5 of FIG. 3B) may be retrieved by the face recognizer 78 from a memory (e.g., memory device 76).
  • This different set of training data may not be 'clean' and may include all the undesirable intra personal variations, which is typically undesirable for the clean face space.
  • the high dimensionality feature vector computed, by the face recognizer 78, from these training faces may be projected on the clean face space. It should be pointed out that projection of the high dimensionality feature vector onto the clean space may not necessarily add back the intra personal variations to the image of the clean space face.
  • the face recognizer 78 may perform discriminant analysis (e.g., utilizing LDA or any other discriminant technique). Thus, the face recognizer 78 may obtain a discriminant matrix which transforms one or more feature vectors into a representation that is more suited for face classification/face clustering.
  • the face recognizer 78 may analyze a different set of training images including clean faces selected such that they have none of the undesirable intra personal variations. Therefore, faces without glasses, without facial hair, without expression variation, etc. may be selected, by the face recognizer 78 for computing the interim representation in the form of PCA. By discarding the null space (e.g., feature vectors corresponding to the undesirable intra personal variations) obtained from clean faces, the face recognizer 78 may ensure that the discriminant analysis matrix selects only the most reliable features for discrimination (e.g., not select intra personal variations). In an example embodiment, empirical results show significantly improved performance using this mechanism, for example, for face clustering, as described more fully below.
  • null space e.g., feature vectors corresponding to the undesirable intra personal variations
  • FIGS. 3A-3D diagrams illustrating images of a person with and without intra personal variation features are provided.
  • the images 3, 5 e.g., training images (for example, for computing the discriminant matrix using LDA)
  • undesirable intra person variations e.g., glasses, a smile, lighting variations
  • the face recognizer 78 may generate the images of clean face spaces of FIGS. 3C and 3D to exclude the undesirable intra person variations or at least minimize the effects of the intra person variations. For example, the effect of the glasses and the smile of the person of the image 3 of FIG.
  • the clean face spaces of FIGS. 3C and 3D may be useful for face recognition and/or face clustering, as described more fully below.
  • the images 7, 9 of the clean face spaces of FIGS. 3C and 3D were generated based in part on analyzing Eigen faces pertaining to the faces of the images 3, 5 in FIGS. 3A and 3B in the pixel domain
  • the clean face space (also referred to herein as clean face Eigen space) of the images of FIGS. 3C and 3D may be determined/generated by the face recognizer 78 based on evaluating training images in a feature domain (e.g., generated based on local binary pattern (LBP) histograms) in another example embodiment.
  • LBP local binary pattern
  • the face recognizer 78 may include the corresponding Eigen vector(s) related to the unwanted intra person variations in a null space.
  • the null space may, but need not, be discarded by face recognizer 78.
  • a discriminant matrix analysis performed by the face recognizer 78 may be unable to partially or entirely reconstruct any of the undesirable intra person variations. For instance, consider the face of FIG. 3 A in which a person is smiling/laughing.
  • This clean space face may correspond to an interim representation during PCA and may be, partially or entirely, unable to properly represent this laughter.
  • the smile/laughter of the face of FIG. 3C is degraded with respect to the same face of FIG. 3A and does not entirely represent the smile/laughter.
  • the clean face space(s) may not partially or entirely represent undesirable variations of a face.
  • the image of FIG. 3 A relates to a person that is wearing glasses, which may cause problems (e.g., increased dimensionality (e.g., an increase in pixel size of the image)) for face clustering because the glasses are very large and they may significantly change the appearance of the image.
  • problems e.g., increased dimensionality (e.g., an increase in pixel size of the image)
  • the effect of the glasses is significantly reduced. For instance, although the glasses are not completely removed, the effects of the glasses are reduced in this example embodiment.
  • the face recognizer 78 may be able to more efficiently perform face clustering (e.g., grouping a set of images related the same face of a person(s)), since the face recognizer 78 may not need to evaluate these undesirable intra person variations.
  • undesirable intra person variations from a face clustering viewpoint may not be represented well in the clean face space.
  • the face recognizer 78 may not be misled by the undesirable variations such as, for example, the glasses or the smile/laughter or any other undesirable variations.
  • the face recognizer 78 may effectively ignore the glasses and/or the smile since these features are provided to a null space and the corresponding clean face space does not entirely include these undesirable variation features.
  • the face recognizer 78 may not consider the undesirable intra person variations in the null space while performing PCA, in an instance in which the face recognizer 78 analyzes a corresponding image that includes the glasses, the face recognizer 78 may still be able to detect the person of the image. For instance, the face recognizer 78 may perform face recognition and may be able to recognize that the person of the image relates to a set of corresponding clean trained data (e.g., images 7, 9 of FIGS. 3C and 3D), for example.
  • a set of corresponding clean trained data e.g., images 7, 9 of FIGS. 3C and 3D
  • the face recognizer 78 may implement the discriminant analysis based in part on selecting those features that are least affected by these undesirable intra personal variations. By projecting the faces onto a space where such unwanted variations cannot be represented well, the face recognizer 78 may minimize the chances of these unwanted features being selected and utilized for the discriminant analysis. For example, while the presence of facial hair may be good to discriminate between some men and women, it may not necessarily be an ideal feature since a large percentage of men are clean shaven, and it is generally not applicable to women. Therefore, by ensuring that such features are either not selected or are weighted less, the face recognizer 78 may ensure that truly robust features are selected for the discriminant analysis.
  • the face recognizer 78 may collect training data (e.g., training images) for building a clean face subspace such as, for example, normalized clean faces.
  • the clean faces of the clean face subspace may not include undesirable intra person variations or may have undesirable intra person variations with reduced effects (e.g., degraded areas corresponding to the undesirable intra person variations).
  • the face recognizer 78 may normalize clean faces by aligning the clean faces. For example, the face recognizer 78 may make the face sizes the same and may analyze the eyes of the faces to characterize images of a same face. In one example embodiment, the face recognizer 78 may detect an alignment of the faces based in part on a location of the eyes being calculated from zero degrees.
  • the face recognizer 78 may perform feature extraction (e.g., LBP histograms) on the clean face subspace.
  • the face recognizer 78 may extract features from one or more images of the clean face subspace and may, for example, utilize Local Binary Patterns (LBPs) to represent faces.
  • LBPs Local Binary Patterns
  • the face recognizer 78 may examine shape and texture information (e.g., texture of a pixel(s)) associated with extracted data of images of clean faces.
  • the face recognizer 78 may extract a face feature vector (e.g., a histogram) from an image(s) having a face(s) and the face recognizer 78 may divide an image of the face(s) into small regions from which LBPs are extracted and concatenated into a single feature histogram that may efficiently represent an image of a face(s).
  • the textures of the facial regions may be encoded by face recognizer 78 utilizing the LBPs while the entire shape of the face(s) may be recovered by the construction of the face feature histogram.
  • the face recognizer 78 may view images of faces as compositions of micro-patterns which may be invariant with respect to monotonic grey scale transformation. By combining these micro-patterns, a description of a face image of a clean face subspace may be obtained, by the face recognizer 78.
  • the face recognizer 78 may utilize LBP patterns to obtain or extract a face(s) from an image(s) in one example embodiment, it should be pointed out that the face recognizer 78 may utilize any other suitable mechanisms for obtaining or extracting a face(s) from an image(s), including but not limited to, utilizing pixel values in a pixel domain for extracting features of clean faces.
  • the face recognizer 78 may compute the Eigen vectors or Eigen faces on the extracted features corresponding to the clean face subspace using Principal Component Analysis.
  • the Eigen vectors may correspond to significant positive Eigen values of extracted features (e.g., LBP histogram features) from the clean faces.
  • the face recognizer 78 may project undesirable intra person features to a null space.
  • the face recognizer 78 may discard the null space, for example, corresponding to Eigen vectors whose corresponding Eigen values are insignificant (e.g., negative values or values of zero). These insignificant values may correspond to undesirable intra person variation features which are part of the null space and which may be discarded.
  • the face recognizer 78 may project training data (e.g., for discriminant analysis) on the clean face space.
  • This training data (e.g., the images 3, 5 of FIGS. 3 A and 3B) may not be clean.
  • This data may have the available intra person variations of a corresponding face such as, for example, the unwanted intra person variation features.
  • the PCA may be computed from clean faces and training data for LDA may be projected on this space.
  • the intra personal variations in this training data are not reconstructed properly.
  • the LDA may avoid choosing features (e.g., corresponding to intra personal variations) to discriminate between different people.
  • the unwanted features may be noisy (e.g., degraded) once projected onto the clean face space.
  • the face recognizer 78 may perform discriminant analysis (e.g., linear discriminant analysis) on the transformed vectors.
  • the face recognizer 78 may perform discriminant analysis on the transformed vectors (e.g., transformed feature vectors) to obtain a discriminant matrix.
  • the face recognizer 78 may store the discriminant matrix.
  • the data of the stored discriminant matrix may be based on the PCA and the discriminant analysis (e.g., LDA). This data may be used for face clustering and/or face recognition, as well as for any other suitable reasons, by the face recognizer 78.
  • a test face may be applied to this stored discriminant matrix and the face recognizer 78 may determine where the test face should be grouped. For example, presume that there are 100 faces captured from camera module 36. As such, these 100 faces may be projected or transformed, via the face recognizer 78, by this stored discriminant matrix (e.g., based in part on PCA and/or LDA) which may result in 100 vectors being used for testing. In other words, in an example in which the camera module 36 captures 100 faces, each of the 100 faces may be projected on the discriminant matrix. Additionally, in this regard, each of the 100 faces may be transformed using the discriminant matrix and the transformed data may then be used and stored with the discriminant matrix.
  • this stored discriminant matrix e.g., based in part on PCA and/or LDA
  • faces of the same person may be close to one another (e.g., have less distance between them) as compared to faces of different people, which may be more distant from each other.
  • the face may be normalized and projected onto a space (e.g., obtained by implementing PC A and/or LDA) by the face recognizer 78.
  • This space may enable the face recognizer 78 to determine a feature vector(s) for classification.
  • the face recognizer 78 may utilize the determined feature vector(s) in part to perform face recognition and/or automatic face clustering.
  • the face recognizer may analyze features of faces and recognize/identify faces of the same person as being close to each other and faces of different people as being far apart from each other.
  • faces that are close to each correspond to faces of a same person that may be grouped (e.g., clustered together) by the face recognizer 78 and those faces that are determined to be far apart denote faces of different people that may be excluded from the group.
  • intra person variations may not necessarily be ideal for performing face clustering.
  • some of the example embodiments may address this issue by excluding or removing the intra person variations of faces of images from the discriminant matrix, as described above.
  • the face recognizer 78 determined that the distance between similar faces or faces belonging to the same person is reduced. As a result, the number of faces clustered for a given threshold increased significantly across a wide variety of datasets. Alternately, the number of false positives decreased significantly in an instance in which approximately the same number of faces were clustered.
  • a conventional clustering method clusters 1,735 faces forming 31 false clusters, which denotes that a cluster does not include/indicate only one person.
  • the face clustering technique of an example embodiment clusters 1,766 faces forming only 19 false clusters.
  • this example illustrates that the face clustering technique of an example embodiment clusters more faces and results in less false clusters than the conventional/existing clustering method.
  • the fewer false clusters of the face clustering technique of an example embodiment is more preferable since this results in more accurate and efficient face clustering.
  • the number of faces clustered e.g., 1,735 faces clustered from approximately 2,600 faces
  • the conventional method e.g., 1,766 faces clustered from approximately 2,600 faces
  • the apparatus 50 in an instance in which incremental updating of the discriminant matrix may be desired on the apparatus 50, then it may beneficial for the apparatus 50 to store (for example, in a memory (e.g., memory device 76)) the Eigen vectors that represent the clean face space.
  • the Eigen vectors may be updated and clean face space may be updated accordingly for usage in the discriminant matrix.
  • clean face space may be implemented in the feature domain (e.g., via
  • the clean face space may be implemented in a variety of other ways.
  • the clean face space may be implemented by the face recognizer 78 in a pixel domain.
  • the clean face space(s) e.g., images 7, 9 of FIGS. 3C-3D in a clean face space
  • the clean face space(s) may include Eigen vectors computed from pixel data associated with images (e.g., images 3, 5 of FIG. 3A-3B in the pixel domain) of faces.
  • the training data for discriminant analysis may be projected onto this clean face space in a pixel domain and reconstructed by the face recognizer 78.
  • the resulting image may then be used by the face recognizer 78 for discriminant analysis.
  • the face recognizer 78 may generate the clean face by warping an image(s) of face to a clean face.
  • the face recognizer 78 may warp a training/testing face having intra person variations (e.g., a smile, etc.) to a neutral expression (e.g., an image of the face without the smile) of a frontal face.
  • the face recognizer 78 may use the clean face generated from the warped image to train/test a pattern classifier.
  • the face recognizer 78 may warp an image to generate a clean face (e.g., clean face space) consider an image of a face in which a person is smiling (e.g., an intra person variation).
  • the face recognizer 78 may warp the image by shrinking the lips of the face such that the face appears to be without expression (e.g., neutral). This warped image with the lips shrunk may become the clean face.
  • an apparatus may utilize one or more faces (e.g., faces of images 7, 9) of different people in which the faces are without intra personal variations (e.g., an expression(s) of a person, an emotion(s) of a person, a pose, occlusion, lighting variation, etc.) for learning discriminability between faces of different persons.
  • faces e.g., faces of images 7, 9
  • intra personal variations e.g., an expression(s) of a person, an emotion(s) of a person, a pose, occlusion, lighting variation, etc.
  • an apparatus may determine one or more principal component vectors or Eigen vectors by applying principal component analysis (PCA) on data of the clean faces that excludes the undesirable variation features to obtain a clean face space.
  • the clean face space may represent the clean faces (e.g., images 7, 9 of clean space faces) based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
  • an apparatus may determine a discriminant matrix using discriminant analysis (e.g., LDA) based in part on analyzing a different set of labeled training images (e.g., images 3, 5).
  • the different set of training images may include one or more undesirable intra person variation features.
  • an apparatus e.g., apparatus 50
  • an apparatus e.g., apparatus 50
  • FIGS. 4 and 5 are flowcharts of a system, method and computer program product according to an example embodiment of the invention. It will be understood that each block of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by various means, such as hardware, firmware, and/or a computer program product including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, in an example embodiment, the computer program instructions which embody the procedures described above are stored by a memory device (e.g., memory device 76) and executed by a processor (e.g., processor 70, face recognizer 78).
  • a memory device e.g., memory device 76
  • a processor e.g., processor 70, face recognizer 78
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the instructions which execute on the computer or other programmable apparatus cause the functions specified in the flowcharts blocks to be implemented.
  • the computer program instructions are stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instructions which implement the function(s) specified in the flowcharts blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer- implemented process such that the instructions which execute on the computer or other programmable apparatus implement the functions specified in the flowcharts blocks.
  • blocks of the flowcharts support combinations of means for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
  • an apparatus for performing the methods of FIGS. 4 and 5 above may comprise a processor (e.g., the processor 70, the face recognizer 78) configured to perform some or each of the operations (400 - 425, 500 - 520) described above.
  • the processor may, for example, be configured to perform the operations (400 - 425, 500 - 520) by performing hardware implemented logical functions, executing stored instructions, or executing algorithms for performing each of the operations.
  • the apparatus may comprise means for performing each of the operations described above.
  • examples of means for performing operations may comprise, for example, the processor 70 (e.g., as means for performing any of the operations described above), the face recognizer 78, and/or a device or circuitry for executing instructions or executing an algorithm for processing information as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

An apparatus for generating a new subspace onto which face features are projected for determining a discriminant matrix may include a processor and memory storing executable computer program code causing the apparatus to at least perform operations including utilizing one or more faces of different people in which the faces are without intra personal variations for learning discriminability between faces of different persons. The computer program code may further cause the apparatus to determine principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces that excludes the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis. Corresponding methods and computer program products are also provided.

Description

METHODS, APPARATUSES AND COMPUTER PROGRAM PRODUCTS FOR GENERATING A NEW SUBSPACE REPRESENTATION FOR FACES THAT IMPROVES DISCRIMINANT ANALYSIS
TECHNOLOGICAL FIELD
An example embodiment of the invention relates generally to imaging processing technology and more particularly relates to a method, apparatus, and computer program product for providing an efficient and reliable manner in which to perform face recognition.
BACKGROUND
The modern communications era has brought about a tremendous expansion of wireline and wireless networks. Computer networks, television networks, and telephony networks are experiencing an unprecedented technological expansion, fueled by consumer demand. Wireless and mobile networking technologies have addressed related consumer demands, while providing more flexibility and immediacy of information transfer.
Current and future networking technologies continue to facilitate ease of information transfer and convenience to users. Due to the now ubiquitous nature of electronic communication devices, people of all ages and education levels are utilizing electronic devices to communicate with other individuals or contacts, receive services and/or share information, media and other content. One area in which there is a demand to increase convenience to users relates to improving the ability of a communication device to effectively perform face detection and recognition.
In this regard, face detection and recognition is becoming an increasingly important technology. For example, face recognition may be useful in biometrics, user interface, and other areas such as creating context for accessing communities in the mobile domain. Face recognition may also be important going forward in relation to initiatives such as metadata standardization.
Although face recognition techniques continue to improve, many current methods require a high computation capability (e.g., statistical methods of detecting faces by scanning images in a traversing manner on multiple scales). Furthermore, some statistical face recognition mechanisms such as, for example, statistical pattern recognition techniques may be useful for face recognition and face clustering. In this regard, discriminant analysis is typically used to assist in obtaining a more useful representation of data such that functional performance or face classification accuracy is greatly increased. At present, linear discriminant analysis (LDA) is a commonly used discriminant technique for face recognition accuracy. To utilize LDA, large quantities of training data may be needed. In the context of face recognition, this may mean that a large number of faces of different people along with the labels (e.g., which face belongs to whom) may be needed. The data may be used by an LDA algorithm to compute an optimal transformation for the chosen features such that the intra class separability is decreased while the inter class separability is increased.
Prior to applying discriminant analysis, it may be useful to project the data or features on an Eigen space. For instance, this may help to achieve some dimensionality reduction. On the other hand, there may be singularity problems in LDA, for example, in an instance in which data/features are not projected on an Eigen space prior to applying discriminant analysis. Principal Component Analysis (PC A) is commonly used for an interim representation, and often for dimensionality reduction. Features of a given image of a face may typically be transformed to around MxN pixels after face normalization (for example, 6400 pixels with M = N = 80). However using these pixels directly may be computationally intense and as such PCA may be utilized to reduce the dimensionality (e.g., reduce 6,400 pixels to 3,000 pixels of the face) to minimize the computation complexity associated with face recognition. In addition, projected feature vectors may be subject to LDA, thus giving rise to a commonly used PCA and/or LDA technique (known as Fisher faces).
The choice of representing faces (e.g., by using PCA and/or LDA) and the choice of training data may have a significant effect on the performance for unseen/unexamined test data.
Currently, in Fisher faces (e.g., PCA and/or LDA), the choice of Eigen vectors chosen in PCA and/or LDA is typically important and may influence face recognition performance significantly. As such, it may be desirable to have a dimensionality reduction technique and discriminant analysis algorithm that is not as sensitive to the selection of Eigen vectors.
At present, one of the main problems in face recognition is because of intra personal variability in a person's face due to expression, occlusion (e.g., glasses), lighting variation among other factors. These intra personal variability features may make it more difficult to perform face recognition in an efficient and accurate manner.
As such, it may be beneficial to generate a mechanism for defining a subspace onto which face features are projected for dimensionality reduction which does not have these undesirable intra personal variations before applying discriminant analysis, or the effect of the intra personal variations is reduced. BRIEF SUMMARY
A method, apparatus and computer program product are therefore provided for generating a new subspace useful for obtaining a discriminant matrix to enable accurate and efficient face classification and recognition. In this regard, an example embodiment may generate a new subspace (e.g., an interim representation) onto which face features are projected for dimensionality reduction before a discriminant analysis (e.g., linear discriminant analysis) is performed. The new subspace may be a clean face space that is generated based in part on selecting training data (e.g., training images). The clean face space may exclude intra person variations (e.g., a person's expression, occlusion, lighting variations, etc.) that may be particular to a person captured in an image of the training data in order to obtain increased face recognition accuracy.
In this regard, the training images may be selected for building an interim representation (e.g., a clean face space) before applying discriminant analysis such that the images do not include any undesirable intra person variations. As such, the discriminant matrix obtained may be highly robust to lighting, emotion, expression, pose, occlusion and other factors in the face recognition.
The example embodiments may also utilize any object recognition kind of pattern classification task(s). For instance, the example embodiments may be applicable to discriminant analysis tasks and, may help in obtaining a more useful discriminant matrix such that the functional performance or facial classification accuracy is greatly increased.
Although the example embodiments may be utilized for discriminant analysis and similar mechanisms in pattern recognition, the example embodiments may also be utilized to more efficiently detect/recognize faces as well as to group/cluster a set of detected faces. Face images that include pixel values typically may not be directly used for classification. As such, some example embodiments may transform the pixels to some other features which are better suited for classification. These features may be chosen and utilized for face clustering in which faces of the same person that are determined to be close to one another (e.g., similar faces of a same person) may be grouped together, whereas faces of different persons that are determined to be distant from each other may be excluded from the group. For purposes of illustration and not of limitation, in an instance in which an example embodiment may analyze 1000 faces of 10 different people and may have no prior indication of which face belongs to which person, the example embodiments may still be able to automatically group faces of the same person. In this regard, the faces of the same person are very close (e.g., part of the group) and faces of the different people are far apart from each other (e.g., excluded from the group).
To further reduce the intra class separation and increase inter class separation, discriminant techniques are employed by some example embodiments of the invention. By utilizing training data, a transformation may be computed by an example embodiment such that the within-class scatter is decreased while the between-class scatter is increased. This may improve the discriminability of a face classifier for recognizing a face or a grouping of faces of the same person from different images.
The example embodiments may provide robustness against unwanted intra person variations based in part on using a generated clean face subspace. The clean face subspace may be built by selecting training data (e.g., training images) intelligently. Since no special preprocessing may be needed to nullify any unwanted intra person variations, the computational complexity and/or memory complexity of the example embodiments may not necessarily be increased.
Some example embodiments may provide advantages/benefits over existing techniques by using a novel representation of new subspace such as for example a clean face space that excludes unwanted intra personal variations while performing PCA and/or discriminant analysis. Additionally, the example embodiments may enable selection of training data to improve performance of the subspace by rejecting a null space (e.g., undesired features (e.g., undesired intra personal features)).
In one example embodiment, a method is provided for generating a new subspace onto which face features are projected for determining a discriminant matrix. The method may include utilizing one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons. The method may also include determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces. The data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
In another example embodiment, an apparatus is provided for generating a new subspace onto which face features are projected for determining a discriminant matrix. The apparatus may include a processor and a memory including computer program code. The memory and computer program code are configured to, with the processor, cause the apparatus to at least perform operations including utilizing one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons. The method may also include determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces. The data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
In another example embodiment, a computer program product is provided for generating a new subspace onto which face features are projected for determining a discriminant matrix. The computer program product includes at least one computer-readable storage medium having computer-executable program code instruction stored therein. The computer-executable program code instructions may include program code instructions configured to utilize one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons. The program code instructions may also determine one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces. The data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
In another example embodiment, an apparatus is provided for generating a new subspace onto which face features are projected for determining a discriminant matrix. The apparatus may include means for utilizing one or more faces of different people in which the faces may be without intra personal variations for learning discriminability between faces of different persons. The apparatus may also include means for determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces. The data of the clean faces may exclude the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
An example embodiment of the invention may provide a better user experience given the ease, efficiency and accuracy in performing face recognition via a communication device. For example, an example embodiment of the invention may improve the efficiency and accuracy in recognizing faces and in grouping a set of images of a same face associated with a person. As a result, device users may enjoy improved capabilities with respect to face recognition. BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
FIG. 1 is a schematic block diagram of a system according to an example embodiment of the invention;
FIG. 2 is a schematic block diagram of an apparatus according to an example embodiment of the invention;
FIGS. 3A & 3B are diagrams of images in a pixel domain according to an example embodiment of the invention;
FIGS. 3C & 3D are diagrams of images corresponding to a clean face space according to an example embodiment of the invention;
FIG. 4 is a flowchart of an example method for generating a discriminant matrix according to an example embodiment of the invention; and
FIG. 5 is a flowchart of an example method for generating a discriminant matrix according to another example embodiment of the invention.
DETAILED DESCRIPTION
Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Like reference numerals refer to like elements throughout. As used herein, the terms "data," "content," "information" and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Moreover, the term "exemplary", as used herein, is not provided to convey any qualitative assessment, but instead merely to convey an illustration of an example. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
Additionally, as used herein, the term 'circuitry' refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of 'circuitry' applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
As defined herein a "computer-readable storage medium," which refers to a non- transitory, physical or tangible storage medium (e.g., volatile or non- volatile memory device), may be differentiated from a "computer-readable transmission medium," which refers to an electromagnetic signal.
As referred to herein, an Eigen face may denote one or more Eigen vectors utilized for identifying features of a face. Also, as referred to herein, a clean face(s) and/or a clean face space(s) or the like may, but need not, refer to a face without undesirable intra person variations. Additionally, as referred to herein, intra person variability, intra personal variability, intra personal variations or similar terms may be referred to interchangeably to denote variability in a person's face of an image. The intra person variability of a person's face may be due to, but is not limited to, expression, emotion, a pose, occlusion, lighting variations and any other suitable factors.
As referred to herein, a clean face(s) may be a face of a person that does not include intra personal variations (e.g., expression, emotion, a pose, occlusion, lighting variations and any other suitable factors) or in which the effect of the intra personal variations is reduced.
FIG. 1 illustrates a block diagram in which a device such as a mobile terminal 10 is shown in an example communication environment. As shown in FIG. 1, an embodiment of a system in accordance with an example embodiment of the invention may include a first communication device (e.g., mobile terminal 10) and a second communication device 20 capable of communication with each other via a network 30. In some cases, an embodiment of the invention may further include one or more additional communication devices, one of which is depicted in FIG. 1 as a third communication device 25. However, not all systems that employ an embodiment of the invention may comprise all the devices illustrated and/or described herein. While an embodiment of the mobile terminal 10 and/or second and third communication devices 20 and 25 may be illustrated and hereinafter described for purposes of example, other types of terminals, such as personal digital assistants (PDAs), pagers, mobile televisions, mobile telephones, gaming devices, laptop computers, cameras, video recorders, audio/video players, radios, global positioning system (GPS) devices, Bluetooth headsets, Universal Serial Bus (USB) devices or any combination of the aforementioned, and other types of voice and text communications systems, can readily employ an embodiment of the invention. Furthermore, devices that are not mobile, such as servers and personal computers may also readily employ an embodiment of the invention.
The network 30 may include a collection of various different nodes (of which the second and third communication devices 20 and 25 may be examples), devices or functions that may be in communication with each other via corresponding wired and/or wireless interfaces. As such, the illustration of FIG. 1 should be understood to be an example of a broad view of certain elements of the system and not an all inclusive or detailed view of the system or the network 30. Although not necessary, in one embodiment, the network 30 may be capable of supporting communication in accordance with any one or more of a number of First-Generation (1G), Second-Generation (2G), 2.5G, Third-Generation (3G), 3.5G, 3.9G, Fourth-Generation (4G) mobile communication protocols, Long Term Evolution (LTE), LTE advanced (LTE-A) and/or the like. In one embodiment, the network 30 may be a point-to- point (P2P) network.
One or more communication terminals such as the mobile terminal 10 and the second and third communication devices 20 and 25 may be in communication with each other via the network 30 and each may include an antenna or antennas for transmitting signals to and for receiving signals from a base site, which could be, for example a base station that is a part of one or more cellular or mobile networks or an access point that may be coupled to a data network, such as a Local Area Network (LAN), a Metropolitan Area Network (MAN), and/or a Wide Area Network (WAN), such as the Internet. In turn, other devices such as processing elements (e.g., personal computers, server computers or the like) may be coupled to the mobile terminal 10 and the second and third communication devices 20 and 25 via the network 30. By directly or indirectly connecting the mobile terminal 10 and the second and third communication devices 20 and 25 (and/or other devices) to the network 30, the mobile terminal 10 and the second and third communication devices 20 and 25 may be enabled to communicate with the other devices or each other, for example, according to numerous communication protocols including Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various communication or other functions of the mobile terminal 10 and the second and third communication devices 20 and 25, respectively. Furthermore, although not shown in FIG. 1, the mobile terminal 10 and the second and third communication devices 20 and 25 may communicate in accordance with, for example, radio frequency (RF), near field communication (NFC), Bluetooth (BT), Infrared (IR) or any of a number of different wireline or wireless communication techniques, including Local Area Network (LAN), Wireless LAN (WLAN), Worldwide Interoperability for Microwave Access (WiMAX), Wireless Fidelity (WiFi), Ultra-Wide Band (UWB), Wibree techniques and/or the like. As such, the mobile terminal 10 and the second and third communication devices 20 and 25 may be enabled to communicate with the network 30 and each other by any of numerous different access mechanisms. For example, mobile access mechanisms such as Wideband Code Division Multiple Access (W-CDMA), CDMA2000, Global System for Mobile communications (GSM), General Packet Radio Service (GPRS) and/or the like may be supported as well as wireless access mechanisms such as WLAN, WiMAX, and/or the like and fixed access mechanisms such as Digital Subscriber Line (DSL), cable modems, Ethernet and/or the like.
In an example embodiment, the first communication device (e.g., the mobile terminal
10) may be a mobile communication device such as, for example, a wireless telephone or other devices such as a personal digital assistant (PDA), mobile computing device, camera, video recorder, audio/video player, positioning device, game device, television device, radio device, or various other like devices or combinations thereof. The second communication device 20 and the third communication device 25 may be mobile or fixed communication devices. However, in one example, the second communication device 20 and the third communication device 25 may be servers, remote computers or terminals such as, for example, personal computers (PCs) or laptop computers.
In an example embodiment, the network 30 may be an ad hoc or distributed network arranged to be a smart space. Thus, devices may enter and/or leave the network 30 and the devices of the network 30 may be capable of adjusting operations based on the entrance and/or exit of other devices to account for the addition or subtraction of respective devices or nodes and their corresponding capabilities. In an example embodiment, one or more of the devices in communication with the network 30 may employ a face recognizer (e.g., face recognizer 78 of FIG. 2). The face recognizer may generate a clean face space that does not include intra person variations of an image(s) of a face(s). This clean face space may be utilized in part to generate a discriminant matrix. The face recognizer may perform face recognition and/or face clustering in part based on the clean face space. For example, with respect to face clustering, the face recognizer may be able to group or cluster a set of images of a same person more efficiently and reliably since the images may not have some intra person variations which may make the grouping/clustering more complex. The intra person variations may include, but are not limited to, lighting variations (e.g., shadows on a face, bright and/or lowly lit lighting on a face, etc.), expression/emotion (e.g., a smile, laughter, a frown, anger, sadness, happiness, blinking, etc.), a pose (e.g., frontal, half profile, full profile, etc.), occlusion (e.g., facial hair, glasses, clothes, etc.) and any other suitable variations.
In an example embodiment, the mobile terminal 10 and the second and third communication devices 20 and 25 may be configured to include the face recognizer. However, in an alternative embodiment the mobile terminal 10 may include the face recognizer and the second and third communication devices 20 and 25 may be network entities such as servers or the like that may be configured to communicate with each other and/or the mobile terminal 10. For instance, in an example embodiment, the second communication device 20 may be a dedicated server (or server bank) associated with a particular information source or service (e.g., a face recognition service, media provision service, etc.) or the second communication device 20 may be a backend server associated with one or more other functions or services. As such, the second communication device 20 may represent a potential host for a plurality of different services or information sources. In one example embodiment, the functionality of the second communication device 20 may be provided by hardware and/or software components configured to operate in accordance with techniques for the provision of information to users of communication devices. However, at least some of the functionality provided by the second communication device 20 may be information provided in accordance with an example embodiment of the invention.
In an example embodiment, the second communication device 20 may host an apparatus for providing a localized face recognition service to a device (e.g., mobile terminal 10) practicing an embodiment of the invention. The localized face recognition service may store one or more images of one or more faces and associated metadata identifying individuals corresponding to the faces. In this regard, in an instance in which the second communication device receives face data (e.g., from the mobile terminal 10), the second communication device 20 may match the received face data with a corresponding individual and may provide information to a device (e.g., mobile terminal 10) identifying an individual(s) that corresponds to the face data.
The third communication device 25 may also be a server providing a number of functions or associations with various information sources and services (e.g., a face recognition service, a media provision service, etc.). In this regard, the third communication device 25 may host an apparatus for providing a localized face recognition service that provides information (e.g., face data, etc.) to enable the second communication device 20 to provide face recognition information to a device (e.g., mobile terminal 10) practicing an embodiment of the invention. The face recognition information provided by the third communication device 25 to the second communication device 20 may be used by the second communication device to provide information to a device (e.g., mobile terminal 10) identifying an individual(s) that corresponds to received face data recognized or extracted from an image(s).
As such, in one embodiment, the mobile terminal 10 may itself perform an example embodiment. In another embodiment, the second and third communication devices 20 and 25 may facilitate (e.g., by the provision of face recognition information) operation of an example embodiment at another device (e.g., the mobile terminal 10). In still another example embodiment, the second and third communication devices 20 and 25 may not be included at all.
FIG. 2 illustrates a schematic block diagram of an apparatus for recognizing one or more faces of one or more images according to an example embodiment. An example embodiment of the invention will now be described with reference to FIG. 2, in which certain elements of an apparatus 50 are displayed. The apparatus 50 of FIG. 2 may be employed, for example, on the mobile terminal 10 (and/or the second communication device 20 or the third communication device 25). Alternatively, the apparatus 50 may be embodied on a network device of the network 30. However, the apparatus 50 may alternatively be embodied at a variety of other devices, both mobile and fixed (such as, for example, any of the devices listed above). In some cases, an embodiment may be employed on a combination of devices. Accordingly, one embodiment of the invention may be embodied wholly at a single device (e.g., the mobile terminal 10), by a plurality of devices in a distributed fashion (e.g., on one or a plurality of devices in a P2P network) or by devices in a client/server relationship. Furthermore, it should be noted that the devices or elements described below may not be mandatory and thus some may be omitted in a certain embodiment.
Referring now to FIG. 2, the apparatus 50 may include or otherwise be in communication with a processor 70, a user interface 67, a communication interface 74, a memory device 76, a display 85, a face recognizer 78, and a camera module 36. In one example embodiment, the display 85 may be a touch screen display. The memory device 76 may include, for example, volatile and/or non-volatile memory. For example, the memory device 76 may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like processor 70). In an example embodiment, the memory device 76 may be a tangible memory device that is not transitory. The memory device 76 may be configured to store information, data, files, applications, instructions or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the invention. For example, the memory device 76 could be configured to buffer input data for processing by the processor 70. Additionally or alternatively, the memory device 76 could be configured to store instructions for execution by the processor 70. As yet another alternative, the memory device 76 may be one of a plurality of databases that store information and/or media content (e.g., images, pictures, videos, etc.).
The memory device 76 may store one or more images which may, but need not, include one or more images of faces of individuals. Features may be extracted from the images by the processor 70 and/or the face recognizer 78 and the extracted features may be evaluated by the processor 70 and/or the face recognizer 78 to determine whether the features relate to one or more faces of individuals.
The apparatus 50 may, in one embodiment, be a mobile terminal (e.g., mobile terminal 10) or a fixed communication device or computing device configured to employ an example embodiment of the invention. However, in one embodiment, the apparatus 50 may be embodied as a chip or chip set. In other words, the apparatus 50 may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus 50 may therefore, in some cases, be configured to implement an embodiment of the invention on a single chip or as a single "system on a chip." As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein. Additionally or alternatively, the chip or chipset may constitute means for enabling user interface navigation with respect to the functionalities and/or services described herein.
The processor 70 may be embodied in a number of different ways. For example, the processor 70 may be embodied as one or more of various processing means such as a coprocessor, microprocessor, a controller, a digital signal processor (DSP), processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. In an example embodiment, the processor 70 may be configured to execute instructions stored in the memory device 76 or otherwise accessible to the processor 70. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 70 may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the invention while configured accordingly. Thus, for example, when the processor 70 is embodied as an ASIC, FPGA or the like, the processor 70 may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor 70 is embodied as an executor of software instructions, the instructions may specifically configure the processor 70 to perform the algorithms and operations described herein when the instructions are executed. However, in some cases, the processor 70 may be a processor of a specific device (e.g., a mobile terminal or network device) adapted for employing an embodiment of the invention by further configuration of the processor 70 by instructions for performing the algorithms and operations described herein. The processor 70 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 70.
In an example embodiment, the processor 70 may be configured to operate a connectivity program, such as a browser, Web browser or the like. In this regard, the connectivity program may enable the apparatus 50 to transmit and receive Web content, such as for example location-based content or any other suitable content, according to a Wireless Application Protocol (WAP), for example. It should be pointed out that the processor 70 may also be in communication with a display 85 and may instruct the display to illustrate any suitable information, data, content (e.g., media content) or the like.
Meanwhile, the communication interface 74 may be any means such as a device or circuitry embodied in either hardware, a computer program product, or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 50. In this regard, the communication interface 74 may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network (e.g., network 30). In fixed environments, the communication interface 74 may alternatively or also support wired communication. As such, the communication interface 74 may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB), Ethernet or other mechanisms. The user interface 67 may be in communication with the processor 70 to receive an indication of a user input at the user interface 67 and/or to provide an audible, visual, mechanical or other output to the user. As such, the user interface 67 may include, for example, a keyboard, a mouse, a joystick, a display, a touch screen, a microphone, a speaker, or other input/output mechanisms. In an example embodiment in which the apparatus is embodied as a server or some other network devices, the user interface 67 may be limited, remotely located, or eliminated. The processor 70 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface, such as, for example, a speaker, ringer, microphone, display, and/or the like. The processor 70 and/or user interface circuitry comprising the processor 70 may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor 70 (e.g., memory device 76, and/or the like).
The apparatus 50 may include a media capturing element, such as camera module 36. The camera module 36 may include a camera, video and/or audio module, in communication with the processor 70 and the display 85. The camera module 36 may be any means for capturing an image, video and/or audio for storage, display or transmission. For example, the camera module 36 may include a digital camera capable of forming a digital image file from a captured image. As such, the camera module 36 may include all hardware, such as a lens or other optical component(s), and software necessary for creating a digital image file from a captured image. Alternatively, the camera module 36 may include only the hardware needed to view an image, while a memory device (e.g., memory device 76) of the apparatus 50 stores instructions for execution by the processor 70 in the form of software necessary to create a digital image file from a captured image. In an example embodiment, the camera module 36 may further include a processing element such as a co-processor which assists the processor 70 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a Joint Photographic Experts Group, (JPEG) standard format or another like format. In some cases, the camera module 36 may provide live image data to the display 85. In this regard, the camera module 36 may facilitate or provide a camera view to the display 85 to show live image data, still image data, video data, or any other suitable data. Moreover, in an example embodiment, the display 85 may be located on one side of the apparatus 50 and the camera module 36 may include a lens, and/or a viewfmder positioned on the opposite side of the apparatus 50 with respect to the display 85 to enable the camera module 36 to capture images on one side of the apparatus 50 and present a view of such images to the user positioned on the other side of the apparatus 50.
In an example embodiment, the processor 70 may be embodied as, include or otherwise control the face recognizer. The face recognizer 78 may be any means such as a device or circuitry operating in accordance with software or otherwise embodied in hardware or a combination of hardware and software (e.g., processor 70 operating under software control, the processor 70 embodied as an ASIC or FPGA specifically configured to perform the operations described herein, or a combination thereof) thereby configuring the device or circuitry to perform the corresponding functions of the face recognizer 78 as described below. Thus, in an example in which software is employed, a device or circuitry (e.g., the processor 70 in one example) executing the software forms the structure associated with such means.
The face recognizer 78 may be in communication with the processor 70, the camera module 36 and the memory device 76 (e.g., via the processor 70). In this regard, the face recognizer 78 may extract one or more features from one or more images captured by the camera module 36 and based in part on the extracted features, the face recognizer 78 may determine whether the extracted features relate to a face(s) of an individual(s).
Additionally, the face recognizer 78 may analyze data stored in the memory device 76 to determine whether the memory device 76 stores face data relating to an individual that also corresponds to the same individual identified in an image being newly captured by the camera module 36.
The face recognizer 78 may generate a new representation (e.g., a subspace) for images of faces that belong to a subspace category of representations (e.g. an Eigen face). This subspace, referred to herein as the clean face space may be constructed based on training data including clean faces. As described above, a problem in face recognition is due to the intra personal variability in a person's face due to expression (e.g., a smile, a blink, etc.), occlusion (glasses, etc.), lighting variations and other factors. In an example embodiment, the training images may be selected from the memory device 76 by the face recognizer 78 to enable the face recognizer 78 to build an interim representation before applying discriminant analysis (e.g., LDA) such that the images of the interim representation do not have any of these undesirable intra personal variations. The interim representation generated by the face recognizer 78 may be utilized for Principal Component Analysis prior to applying discriminant analysis (e.g., LDA). The PCA may reduce the dimensionality of the selected images. In this regard, the PCA may be computed by the face recognizer 78 from the clean face space. Therefore, faces without glasses, without facial hair, without expression variation, and without any other suitable intra personal variations may be selected for obtaining the principal component vectors or Eigen vectors that represent the clean face space. As such, the interim representation may not have images of faces with undesirable variations which may cause problems for face recognition/face clustering. In this regard, the images of the faces of the clean space face may have neutral expression, good lighting, etc.
The clean faces dataset may be stored by the face recognizer 78 in the memory device 76. The clean face dataset may refer to a collection of images of faces which do not have undesirable variations (e.g., intra person variations). As described above, this dataset may be used by the face recognizer 78 to obtain the interim representation (e.g., clean face subspace). Using the generated interim representation before applying LDA in performing face recognition and/or face clustering, may help in a variety of ways. For instance, it may make the apparatus 50 less sensitive to the exact choice of Eigen vectors chosen after PCA, and it may make the obtained discriminant matrix extremely robust to unseen test data and also reduces the dimensionality of the feature vectors, thus alleviating the singularity problem in LDA to some extent.
In an example embodiment, a LDA matrix generated from clean face PCA may be generated in 2 modes, such as (a) in offline mode, and (b) in online mode. Both of these modes are described more fully below.
In the offline mode of LDA generation, after the clean face PCA is generated, a labeled set of training data may be projected on the clean face space and this is used for LDA matrix generation. The online training data may be projected on the LDA vectors and stored as templates for each class (or person). These templates may then be used to classify a given unseen test face. An unseen test data/face may be a face provided for classifying into any of the trained classes.
In the online mode of LDA generation, after the clean face PCA is generated, online training data may be used to generate the LDA matrix. Then the training data may be projected on the LDA vectors and stored as templates for each class (or person). These templates may then be used to classify a given unseen test face. An unseen test data/face may be a face provided for classifying into any of the trained classes. In this case, if the number of training data is less than the dimension of the vector, it may result in a singularity problem during the LDA matrix generation. To overcome this issue, an appropriate number of eigenvectors may be chosen from the clean face PCA. However, regularization techniques may also be utilized to overcome the singularity issues during LDA phase. The null space of the Eigen space computed for the features using the clean data may be discarded by the face recognizer 78. For instance, the face recognizer 78 may discard the null space by dropping some Eigen vectors (e.g., Eigen vectors having the least Eigen values). This reduced set of Eigen vectors may include or represent the clean face space (also referred to herein as clean face subspace).
For computing the discriminant matrix using LDA, a different set of training data may be utilized by the face recognizer 78. The different set of training data (e.g., image 3 of FIG. 3A, image 5 of FIG. 3B) may be retrieved by the face recognizer 78 from a memory (e.g., memory device 76). This different set of training data may not be 'clean' and may include all the undesirable intra personal variations, which is typically undesirable for the clean face space. The high dimensionality feature vector computed, by the face recognizer 78, from these training faces may be projected on the clean face space. It should be pointed out that projection of the high dimensionality feature vector onto the clean space may not necessarily add back the intra personal variations to the image of the clean space face. Instead, projection of the high dimensionality feature vector onto the clean space may cause the intra personal variations to not be well represented when the LDA training data is projected onto the clean face space. This may ensure that the unreliable features (e.g., corresponding to intra personal variations) are not selected by the discriminant analysis. In response to performing PCA based in part on the clean face space, the face recognizer 78 may perform discriminant analysis (e.g., utilizing LDA or any other discriminant technique). Thus, the face recognizer 78 may obtain a discriminant matrix which transforms one or more feature vectors into a representation that is more suited for face classification/face clustering.
As described above, the face recognizer 78 may analyze a different set of training images including clean faces selected such that they have none of the undesirable intra personal variations. Therefore, faces without glasses, without facial hair, without expression variation, etc. may be selected, by the face recognizer 78 for computing the interim representation in the form of PCA. By discarding the null space (e.g., feature vectors corresponding to the undesirable intra personal variations) obtained from clean faces, the face recognizer 78 may ensure that the discriminant analysis matrix selects only the most reliable features for discrimination (e.g., not select intra personal variations). In an example embodiment, empirical results show significantly improved performance using this mechanism, for example, for face clustering, as described more fully below.
Referring now to FIGS. 3A-3D, diagrams illustrating images of a person with and without intra personal variation features are provided. The images 3, 5 (e.g., training images (for example, for computing the discriminant matrix using LDA)) of the faces of FIGS. 3A and 3B may be analyzed by the face recognizer 78 and undesirable intra person variations (e.g., glasses, a smile, lighting variations) may be extracted and provided to a null space. On the other hand, the face recognizer 78 may generate the images of clean face spaces of FIGS. 3C and 3D to exclude the undesirable intra person variations or at least minimize the effects of the intra person variations. For example, the effect of the glasses and the smile of the person of the image 3 of FIG. 3 A is minimized in the image 7 of the clean face space of FIG. 3C since these intra person variations may be provided by the face recognizer 78 to the null space. The clean face spaces of FIGS. 3C and 3D may be useful for face recognition and/or face clustering, as described more fully below. Although the images 7, 9 of the clean face spaces of FIGS. 3C and 3D were generated based in part on analyzing Eigen faces pertaining to the faces of the images 3, 5 in FIGS. 3A and 3B in the pixel domain, the clean face space (also referred to herein as clean face Eigen space) of the images of FIGS. 3C and 3D may be determined/generated by the face recognizer 78 based on evaluating training images in a feature domain (e.g., generated based on local binary pattern (LBP) histograms) in another example embodiment.
Additionally, in this example embodiment, in an instance in which the face recognizer 78 computes/generates the clean face spaces of FIGS. 3C and 3D, the face recognizer 78 may include the corresponding Eigen vector(s) related to the unwanted intra person variations in a null space. The null space may, but need not, be discarded by face recognizer 78. In this regard, in an instance in which the faces of images of FIGS. 3A and 3B are directed on the clean face space. As such, a discriminant matrix analysis performed by the face recognizer 78 may be unable to partially or entirely reconstruct any of the undesirable intra person variations. For instance, consider the face of FIG. 3 A in which a person is smiling/laughing. This clean space face may correspond to an interim representation during PCA and may be, partially or entirely, unable to properly represent this laughter. For instance the smile/laughter of the face of FIG. 3C is degraded with respect to the same face of FIG. 3A and does not entirely represent the smile/laughter. As such, in an example embodiment, the clean face space(s) may not partially or entirely represent undesirable variations of a face.
As shown in FIG. 3 A the image of FIG. 3 A relates to a person that is wearing glasses, which may cause problems (e.g., increased dimensionality (e.g., an increase in pixel size of the image)) for face clustering because the glasses are very large and they may significantly change the appearance of the image. However, in an instance in which this space is constructed onto the clean face space, by the face recognizer 78, the effect of the glasses is significantly reduced. For instance, although the glasses are not completely removed, the effects of the glasses are reduced in this example embodiment. By generating a clean space for faces of images (excluding intra person variations), the face recognizer 78 may be able to more efficiently perform face clustering (e.g., grouping a set of images related the same face of a person(s)), since the face recognizer 78 may not need to evaluate these undesirable intra person variations. As such, undesirable intra person variations from a face clustering viewpoint may not be represented well in the clean face space. In this regard, the face recognizer 78 may not be misled by the undesirable variations such as, for example, the glasses or the smile/laughter or any other undesirable variations.
In the example embodiment of FIGS. 3A-3D, even though the person is wearing glasses and smiling, the face recognizer 78 may effectively ignore the glasses and/or the smile since these features are provided to a null space and the corresponding clean face space does not entirely include these undesirable variation features.
Although the face recognizer 78 may not consider the undesirable intra person variations in the null space while performing PCA, in an instance in which the face recognizer 78 analyzes a corresponding image that includes the glasses, the face recognizer 78 may still be able to detect the person of the image. For instance, the face recognizer 78 may perform face recognition and may be able to recognize that the person of the image relates to a set of corresponding clean trained data (e.g., images 7, 9 of FIGS. 3C and 3D), for example.
In response to completing PCA, to achieve maximum discrimination and robustness to a wide variety of test data, the face recognizer 78 may implement the discriminant analysis based in part on selecting those features that are least affected by these undesirable intra personal variations. By projecting the faces onto a space where such unwanted variations cannot be represented well, the face recognizer 78 may minimize the chances of these unwanted features being selected and utilized for the discriminant analysis. For example, while the presence of facial hair may be good to discriminate between some men and women, it may not necessarily be an ideal feature since a large percentage of men are clean shaven, and it is generally not applicable to women. Therefore, by ensuring that such features are either not selected or are weighted less, the face recognizer 78 may ensure that truly robust features are selected for the discriminant analysis.
Referring now to FIG. 4, a flowchart of an example method of generating a discriminant matrix according to an example embodiment is provided. At operation 400, the face recognizer 78 may collect training data (e.g., training images) for building a clean face subspace such as, for example, normalized clean faces. The clean faces of the clean face subspace may not include undesirable intra person variations or may have undesirable intra person variations with reduced effects (e.g., degraded areas corresponding to the undesirable intra person variations). The face recognizer 78 may normalize clean faces by aligning the clean faces. For example, the face recognizer 78 may make the face sizes the same and may analyze the eyes of the faces to characterize images of a same face. In one example embodiment, the face recognizer 78 may detect an alignment of the faces based in part on a location of the eyes being calculated from zero degrees.
At operation 405, the face recognizer 78 may perform feature extraction (e.g., LBP histograms) on the clean face subspace. In this regard, in one example embodiment, the face recognizer 78 may extract features from one or more images of the clean face subspace and may, for example, utilize Local Binary Patterns (LBPs) to represent faces. The face recognizer 78 may examine shape and texture information (e.g., texture of a pixel(s)) associated with extracted data of images of clean faces. In this regard, the face recognizer 78 may extract a face feature vector (e.g., a histogram) from an image(s) having a face(s) and the face recognizer 78 may divide an image of the face(s) into small regions from which LBPs are extracted and concatenated into a single feature histogram that may efficiently represent an image of a face(s). The textures of the facial regions may be encoded by face recognizer 78 utilizing the LBPs while the entire shape of the face(s) may be recovered by the construction of the face feature histogram. By utilizing LBP features, the face recognizer 78 may view images of faces as compositions of micro-patterns which may be invariant with respect to monotonic grey scale transformation. By combining these micro-patterns, a description of a face image of a clean face subspace may be obtained, by the face recognizer 78.
Although the face recognizer 78 may utilize LBP patterns to obtain or extract a face(s) from an image(s) in one example embodiment, it should be pointed out that the face recognizer 78 may utilize any other suitable mechanisms for obtaining or extracting a face(s) from an image(s), including but not limited to, utilizing pixel values in a pixel domain for extracting features of clean faces.
At operation 410, the face recognizer 78 may compute the Eigen vectors or Eigen faces on the extracted features corresponding to the clean face subspace using Principal Component Analysis. The Eigen vectors may correspond to significant positive Eigen values of extracted features (e.g., LBP histogram features) from the clean faces. Additionally, as described above, during Principal Component Analysis, the face recognizer 78 may project undesirable intra person features to a null space. In this regard, the face recognizer 78 may discard the null space, for example, corresponding to Eigen vectors whose corresponding Eigen values are insignificant (e.g., negative values or values of zero). These insignificant values may correspond to undesirable intra person variation features which are part of the null space and which may be discarded.
At operation 415, the face recognizer 78 may project training data (e.g., for discriminant analysis) on the clean face space. This training data (e.g., the images 3, 5 of FIGS. 3 A and 3B) may not be clean. This data may have the available intra person variations of a corresponding face such as, for example, the unwanted intra person variation features. In an example embodiment, the PCA may be computed from clean faces and training data for LDA may be projected on this space. The intra personal variations in this training data are not reconstructed properly. As a result, the LDA may avoid choosing features (e.g., corresponding to intra personal variations) to discriminate between different people. The unwanted features may be noisy (e.g., degraded) once projected onto the clean face space.
At operation 420, the face recognizer 78 may perform discriminant analysis (e.g., linear discriminant analysis) on the transformed vectors. The face recognizer 78 may perform discriminant analysis on the transformed vectors (e.g., transformed feature vectors) to obtain a discriminant matrix. At operation 425, the face recognizer 78, may store the discriminant matrix. The data of the stored discriminant matrix may be based on the PCA and the discriminant analysis (e.g., LDA). This data may be used for face clustering and/or face recognition, as well as for any other suitable reasons, by the face recognizer 78.
In this regard, a test face may be applied to this stored discriminant matrix and the face recognizer 78 may determine where the test face should be grouped. For example, presume that there are 100 faces captured from camera module 36. As such, these 100 faces may be projected or transformed, via the face recognizer 78, by this stored discriminant matrix (e.g., based in part on PCA and/or LDA) which may result in 100 vectors being used for testing. In other words, in an example in which the camera module 36 captures 100 faces, each of the 100 faces may be projected on the discriminant matrix. Additionally, in this regard, each of the 100 faces may be transformed using the discriminant matrix and the transformed data may then be used and stored with the discriminant matrix.
By utilizing this approach faces of the same person may be close to one another (e.g., have less distance between them) as compared to faces of different people, which may be more distant from each other. Given a test face, the face may be normalized and projected onto a space (e.g., obtained by implementing PC A and/or LDA) by the face recognizer 78. This space may enable the face recognizer 78 to determine a feature vector(s) for classification. In this regard, the face recognizer 78 may utilize the determined feature vector(s) in part to perform face recognition and/or automatic face clustering.
To perform face clustering, the face recognizer may analyze features of faces and recognize/identify faces of the same person as being close to each other and faces of different people as being far apart from each other. In other words, faces that are close to each correspond to faces of a same person that may be grouped (e.g., clustered together) by the face recognizer 78 and those faces that are determined to be far apart denote faces of different people that may be excluded from the group. As described above, intra person variations may not necessarily be ideal for performing face clustering. As such, some of the example embodiments may address this issue by excluding or removing the intra person variations of faces of images from the discriminant matrix, as described above.
Upon using a face clustering technique of an example embodiment, the face recognizer 78 determined that the distance between similar faces or faces belonging to the same person is reduced. As a result, the number of faces clustered for a given threshold increased significantly across a wide variety of datasets. Alternately, the number of false positives decreased significantly in an instance in which approximately the same number of faces were clustered.
For purposes of illustration and not of limitation, given a collection of several databases having approximately 2,600 faces, a conventional clustering method clusters 1,735 faces forming 31 false clusters, which denotes that a cluster does not include/indicate only one person. In this example, the conventional clustering method may not cluster 865 faces of the 2,600 faces (e.g., 2,600 faces - 1,735 faces = 865 faces). On the other hand, the face clustering technique of an example embodiment clusters 1,766 faces forming only 19 false clusters. In this example, the face clustering technique of an example embodiment may not cluster 834 faces of the 2,600 faces (e.g., 2,600 faces - 1,766 faces = 834 faces), which is an improvement over the conventional clustering method since the face clustering technique of an example embodiment clusters more of the 2,600 faces with around half the number of false clusters.
As such, this example illustrates that the face clustering technique of an example embodiment clusters more faces and results in less false clusters than the conventional/existing clustering method. The fewer false clusters of the face clustering technique of an example embodiment is more preferable since this results in more accurate and efficient face clustering.
While the false clusters obtained using the conventional method is 1.63 times that of the face clustering technique according to an example embodiment, it should be pointed out that the number of faces clustered (e.g., 1,735 faces clustered from approximately 2,600 faces) is also lesser with the conventional method than the number of faces clustered by the proposed method (e.g., 1,766 faces clustered from approximately 2,600 faces).
It should be pointed out that there may not necessarily be a change or increase in the computation complexity in the face clustering and/or face recognition techniques of an example embodiment, since it is the discriminant matrix that is replaced (e.g., a discriminant matrix without undesired intra person variations of faces of an image(s) or an effect of the undesired intra person variations minimized (e.g., degraded)).
In one example embodiment, in an instance in which incremental updating of the discriminant matrix may be desired on the apparatus 50, then it may beneficial for the apparatus 50 to store (for example, in a memory (e.g., memory device 76)) the Eigen vectors that represent the clean face space. In this regard, the Eigen vectors may be updated and clean face space may be updated accordingly for usage in the discriminant matrix.
Although the clean face space may be implemented in the feature domain (e.g., via
LBP histogram), the clean face space may be implemented in a variety of other ways. For example, in one alternative example embodiment, the clean face space may be implemented by the face recognizer 78 in a pixel domain. In this regard, the clean face space(s) (e.g., images 7, 9 of FIGS. 3C-3D in a clean face space) may include Eigen vectors computed from pixel data associated with images (e.g., images 3, 5 of FIG. 3A-3B in the pixel domain) of faces. The training data for discriminant analysis may be projected onto this clean face space in a pixel domain and reconstructed by the face recognizer 78. The resulting image may then be used by the face recognizer 78 for discriminant analysis.
In an alternative example embodiment, the face recognizer 78 may generate the clean face by warping an image(s) of face to a clean face. For example, the face recognizer 78 may warp a training/testing face having intra person variations (e.g., a smile, etc.) to a neutral expression (e.g., an image of the face without the smile) of a frontal face. In addition, the face recognizer 78 may use the clean face generated from the warped image to train/test a pattern classifier.
As an example of the manner in which the face recognizer 78 may warp an image to generate a clean face (e.g., clean face space) consider an image of a face in which a person is smiling (e.g., an intra person variation). In this regard, the face recognizer 78 may warp the image by shrinking the lips of the face such that the face appears to be without expression (e.g., neutral). This warped image with the lips shrunk may become the clean face.
Referring now to FIG. 5, a flowchart of an example method for generating a new subspace onto which face features are projected for determining a discriminant matrix is provided. At operation 500, an apparatus (e.g., apparatus 50) may utilize one or more faces (e.g., faces of images 7, 9) of different people in which the faces are without intra personal variations (e.g., an expression(s) of a person, an emotion(s) of a person, a pose, occlusion, lighting variation, etc.) for learning discriminability between faces of different persons. At operation 505, an apparatus (e.g., apparatus 50) may determine one or more principal component vectors or Eigen vectors by applying principal component analysis (PCA) on data of the clean faces that excludes the undesirable variation features to obtain a clean face space. The clean face space may represent the clean faces (e.g., images 7, 9 of clean space faces) based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
Optionally, at operation 510, an apparatus (e.g., apparatus 50) may determine a discriminant matrix using discriminant analysis (e.g., LDA) based in part on analyzing a different set of labeled training images (e.g., images 3, 5). The different set of training images may include one or more undesirable intra person variation features. Optionally, at operation 515, an apparatus (e.g., apparatus 50) may determine a high dimensionality feature vector based on the different set of training images. Optionally, at operation 520, an apparatus (e.g., apparatus 50) may project the high dimensionality feature vector onto the clean face space to reduce the effect of intra personal variations in the faces of the one or more training images.
It should be pointed out that FIGS. 4 and 5 are flowcharts of a system, method and computer program product according to an example embodiment of the invention. It will be understood that each block of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by various means, such as hardware, firmware, and/or a computer program product including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, in an example embodiment, the computer program instructions which embody the procedures described above are stored by a memory device (e.g., memory device 76) and executed by a processor (e.g., processor 70, face recognizer 78). As will be appreciated, any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the instructions which execute on the computer or other programmable apparatus cause the functions specified in the flowcharts blocks to be implemented. In one embodiment, the computer program instructions are stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instructions which implement the function(s) specified in the flowcharts blocks. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer- implemented process such that the instructions which execute on the computer or other programmable apparatus implement the functions specified in the flowcharts blocks.
Accordingly, blocks of the flowcharts support combinations of means for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, can be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions.
In an example embodiment, an apparatus for performing the methods of FIGS. 4 and 5 above may comprise a processor (e.g., the processor 70, the face recognizer 78) configured to perform some or each of the operations (400 - 425, 500 - 520) described above. The processor may, for example, be configured to perform the operations (400 - 425, 500 - 520) by performing hardware implemented logical functions, executing stored instructions, or executing algorithms for performing each of the operations. Alternatively, the apparatus may comprise means for performing each of the operations described above. In this regard, according to an example embodiment, examples of means for performing operations (400 - 425, 500 - 520) may comprise, for example, the processor 70 (e.g., as means for performing any of the operations described above), the face recognizer 78, and/or a device or circuitry for executing instructions or executing an algorithm for processing information as described above.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe exemplary embodiments in the context of certain exemplary combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

WE CLAIM:
1. A method comprising:
utilizing one or more faces of different people, the faces without intra personal variations for learning discriminability between faces of different persons; and
determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces that excludes the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
2. The method of claim 1 , further comprising:
projecting the undesirable variation features of the faces of the training images to a null space; and
discarding the null space to restrict the undesirable features from being considered during the discriminant analysis or to minimize the consideration of the undesirable variation features during the discriminant analysis.
3. The method of claim 2, wherein prior to projecting the undesirable variation features, the method further comprises:
generating the null space based in part on calculating Eigen vectors corresponding to the undesirable variation features, and wherein
the clean face space comprises a reduced set of vectors that excludes Eigen vectors of the undesirable features.
4. The method of claim 1 , further comprising:
determining a discriminant matrix using discriminant analysis based on analyzing a different set of labeled training images, the different set of training images comprises one or more undesirable intra person variation features;
determining a high dimensionality feature vector based on the different set of training images; and
projecting the high dimensionality feature vector onto the clean face space to reduce the effect of intra personal variations in the faces of the one more or training images.
5. The method of claim 4, wherein the undesirable variation features and the undesirable intra person variation features comprises variability features pertaining to a face of a person based on at least one of an expression of the person, an emotion of the person, a pose, occlusion, or one or more lighting variations.
6. The method of claim 4, wherein the discriminant analysis comprises linear discriminant analysis.
7. The method of claim 4, further comprising:
receiving a detected image of at least one face;
determining whether the at least one face of a person is close or similar to other faces; and
automatically including the at least one face in a group of the other faces of the person in response to determining that the face of the person is similar to the other faces.
8. The method of claim 7, further comprising:
generating the group of the other faces based in part on data of the applied principal component analysis and the discriminant analysis.
9. An apparatus comprising:
at least one processor; and
at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to perform at least the following:
utilize one or more faces of different people, the faces without intra personal variations for learning discriminability between faces of different persons; and
determine one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces that excludes the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
10. The apparatus of claim 9, wherein the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to: project the undesirable variation features of the faces of the training images to a null space; and
discard the null space to restrict the undesirable features from being considered during the discriminant analysis or to minimize the consideration of the undesirable features during the discriminant analysis.
11. The apparatus of claim 10, wherein prior to project the undesirable variation features, the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to:
generate the null space based in part on calculating Eigen vectors corresponding to the undesirable variation features, and wherein
the clean face space comprises a reduced set of vectors that excludes Eigen vectors of the undesirable variation features.
12. The apparatus of claim 9, wherein the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to:
determine a discriminant matrix using discriminant analysis based on analyzing a different set of labeled training images, the different set of training images comprises one or more undesirable intra person variation features;
determine a high dimensionality feature vector based on the different set of training images; and
project the high dimensionality feature vector onto the clean face space to reduce the effect of intra personal variations in the faces of the one more or training images.
13. The apparatus of claim 12, wherein the undesirable variation features and the undesirable intra person variation features comprises variability features pertaining to a face of a person based on at least one of an expression of the person, an emotion of the person, a pose, occlusion, or one or more lighting variations.
14. The apparatus of claim 12, wherein the discriminant analysis comprises linear discriminant analysis.
15. The apparatus of claim 12, wherein the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to: receive a detected image of at least one face;
determine whether the at least one face of a person is close or similar to other faces; and
automatically include the at least one face in a group of the other faces of the person in response to determining that the face of the person is similar to the other faces.
16. The apparatus of claim 15, wherein the at least one memory and the computer program code are further configured to, with the processor, cause the apparatus to:
generate the group of the other faces based in part on data of the applied principal component analysis and the discriminant analysis.
17. A computer program product comprising at least one computer-readable storage medium having computer-executable program code instructions stored therein, the computer-executable program code instructions comprising:
program code instructions configured to utilize one or more faces of different people, the faces without intra personal variations for learning discriminability between faces of different persons; and
program code instructions configured to determine one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces that excludes the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
18. The computer program product of claim 17, further comprising:
program code instructions configured to determine a discriminant matrix using discriminant analysis based on analyzing a different set of labeled training images, the different set of training images comprises one or more undesirable intra person variation features;
program code instructions configured to determine a high dimensionality feature vector based on the different set of training images; and
program code instructions configured to project the high dimensionality feature vector onto the clean face space to reduce the effect of intra personal variations in the faces of the one more or training images.
19. An apparatus comprising:
means for utilizing one or more faces of different people, the faces without intra personal variations for learning discriminability between faces of different persons; and means for determining one or more principal component vectors or Eigen vectors by applying principal component analysis on data of the clean faces that excludes the undesirable variation features to obtain a clean face space representing the clean faces based on the principal component vectors or the Eigen vectors prior to applying discriminant analysis.
20. The apparatus of claim 19, further comprising:
means for determining a discriminant matrix using discriminant analysis based on analyzing a different set of labeled training images, the different set of training images comprises one or more undesirable intra person variation features;
means for determining a high dimensionality feature vector based on the different set of training images; and
means for projecting the high dimensionality feature vector onto the clean face space to reduce the effect of intra personal variations in the faces of the one more or training images.
PCT/FI2012/050948 2011-11-29 2012-10-04 Methods, apparatuses and computer program products for generating a new subspace representation for faces that improves discriminant analysis WO2013079766A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN4107CH2011 2011-11-29
IN4107/CHE/2011 2011-11-29

Publications (1)

Publication Number Publication Date
WO2013079766A1 true WO2013079766A1 (en) 2013-06-06

Family

ID=48534721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2012/050948 WO2013079766A1 (en) 2011-11-29 2012-10-04 Methods, apparatuses and computer program products for generating a new subspace representation for faces that improves discriminant analysis

Country Status (1)

Country Link
WO (1) WO2013079766A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753850A (en) * 2017-11-03 2019-05-14 富士通株式会社 The training method and training equipment of face recognition model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
BELHUMEUR, P. N. ET AL.: "Eigenfaces vs. Fisherfaces: recognition using class specific linear projection", IEEE TRANS. PATT. ANAL. MACH. INT., vol. 19, no. 7, 1997, pages 711 - 720, XP000698170, Retrieved from the Internet <URL:http://ftp.idiap.ch/pub/courses/EE-700/material/17-10-2012/fisherface-pami97.pdf> [retrieved on 20130403] *
FUKUI, K. ET AL.: "Face recognition using multi-viewpoint patterns for robot vision", SPRINGER TRACTS IN ADVANCED ROBOTICS, vol. 15, 2005, pages 192 - 201, XP002432511, Retrieved from the Internet <URL:http://www.cvlab.cs.tsukuba.ac.jp/-kfukui/papers/isrrModifiedwithHeaders.pdf> [retrieved on 20130321] *
MISHIYAMA, M ET AL.: "Face recognition with the multiple constrained mutual subspace method", AVBPA'05 PROC. 5TH INT. CONF. ON AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, pages 71 - 80, XP019013337, Retrieved from the Internet <URL:http://www.cvlab.cs.tsukuba.ac.jp/-kfukui/english/epapers/AVBPA05.pdf> [retrieved on 20130403] *
RAMACHANDRAN, M. ET AL.: "A method for converting a smiling face to a neutral face with applications to face recognition", IEEE INT. CONF. ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2005. PROCEEDINGS. (ICASSP '05)., XP003031507, Retrieved from the Internet <URL:http://ieeexplore.ieee.orglxpl/articleDetails.jsp?arnumber=1415570> [retrieved on 20130403] *
ZHAO, W. ET AL.: "Discriminant analysis of principal components for facial recognition", FG '98 PROCEEDINGS OF THE 3RD INT. CONF. ON FACE & GESTURE RECOGNITION, pages 336 - 341, XP003031508, Retrieved from the Internet <URL:http://ieeexplore.ieee.orglstamp/stamp.jsp?tp=&arnumbei--670971&userType=inst> [retrieved on 20130321] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753850A (en) * 2017-11-03 2019-05-14 富士通株式会社 The training method and training equipment of face recognition model
CN109753850B (en) * 2017-11-03 2022-10-25 富士通株式会社 Training method and training device for face recognition model

Similar Documents

Publication Publication Date Title
Kumar et al. Face detection techniques: a review
US11527055B2 (en) Feature density object classification, systems and methods
KR102560308B1 (en) System and method for exterior search
Ahmad et al. Human action recognition using shape and CLG-motion flow from multi-view image sequences
Agarwal et al. Anubhav: recognizing emotions through facial expression
US20120274755A1 (en) System and method for human detection and counting using background modeling, hog and haar features
US20150010203A1 (en) Methods, apparatuses and computer program products for performing accurate pose estimation of objects
Avgerinakis et al. Recognition of activities of daily living for smart home environments
Hayat et al. An RGB–D based image set classification for robust face recognition from Kinect data
Kalas Real time face detection and tracking using OpenCV
Parde et al. Face and image representation in deep CNN features
CN114898266B (en) Training method, image processing device, electronic equipment and storage medium
JP2013228847A (en) Facial expression analyzing device and facial expression analyzing program
Lahiani et al. Hand pose estimation system based on Viola-Jones algorithm for android devices
EP2659429B1 (en) Methods, apparatuses and computer program products for efficiently recognizing faces of images associated with various illumination conditions
Alshaikhli et al. Face-Fake-Net: The Deep Learning Method for Image Face Anti-Spoofing Detection: Paper ID 45
Minhas et al. Accurate pixel-wise skin segmentation using shallow fully convolutional neural network
Swamy et al. Indian sign language interpreter with android implementation
Madni et al. Hand Gesture Recognition Using Semi Vectorial Multilevel Segmentation Method with Improved ReliefF Algorithm.
Zhang et al. A multi-view camera-based anti-fraud system and its applications
WO2013079766A1 (en) Methods, apparatuses and computer program products for generating a new subspace representation for faces that improves discriminant analysis
Shanmuhappriya Automatic attendance monitoring system using deep learning
US20140254864A1 (en) System and method for gesture detection through local product map
Zhang et al. Unsupervised segmentation of highly dynamic scenes through global optimization of multiscale cues
CN113553877B (en) Depth gesture recognition method and system and electronic equipment thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12854217

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12854217

Country of ref document: EP

Kind code of ref document: A1