WO2012135979A1 - Method, apparatus and computer program product for providing multi-view face alignment - Google Patents

Method, apparatus and computer program product for providing multi-view face alignment Download PDF

Info

Publication number
WO2012135979A1
WO2012135979A1 PCT/CN2011/000616 CN2011000616W WO2012135979A1 WO 2012135979 A1 WO2012135979 A1 WO 2012135979A1 CN 2011000616 W CN2011000616 W CN 2011000616W WO 2012135979 A1 WO2012135979 A1 WO 2012135979A1
Authority
WO
WIPO (PCT)
Prior art keywords
aam
joint
pose
employing
model
Prior art date
Application number
PCT/CN2011/000616
Other languages
French (fr)
Inventor
Tao Xiong
Yong Ma
Yanming Zou
Kongqiao Wang
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/CN2011/000616 priority Critical patent/WO2012135979A1/en
Publication of WO2012135979A1 publication Critical patent/WO2012135979A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • Embodiments of the present invention relate generally to image processing technology and, more particularly, relate to a method, apparatus and computer program product for providing multi-view face alignment.
  • Face detection and recognition is becoming an increasingly more important technology.
  • face detection may be useful in biometrics, user interface, gaming and other areas such as creating context for accessing communities in the mobile domain.
  • Face detection may also be important going forward in relation to initiatives such as metadata standardization. Face detection is also important in relation to face alignment.
  • Face alignment has an impact on face-oriented applications and solutions such as, for example, real time face animation for avatar generation or other face recognition applications. Face alignment can also assist in extraction of accurate location determinations for salient facial features to improve face recognition, face expression tracking, age estimation, gender estimation, and/or the like. Many services and devices are currently being developed with face recognition, detection and/or classification functionalities being contemplated as available features for such services and devices.
  • a method, apparatus and computer program product are therefore provided to enable multi-view face alignment.
  • a mechanism is provided for jointly utilizing two active appearance models (AAMs) to improve face alignment.
  • AAMs active appearance models
  • embodiments of the present invention may provide a relatively robust ability for aligning faces even under multi-view conditions.
  • a method of providing multi-view face alignment may include causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and employing a selected joint model application routine.
  • the selected joint model application routine may be selected based on the classification of the pose.
  • the method may further include employing one model among models employed in the selected joint model application routine to perform face alignment.
  • a computer program product for providing multi- view face alignment includes at least one computer- readable storage medium having computer- executable program code instructions stored therein.
  • the computer-executable program code instructions may include program code instructions for causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and employing a selected joint model application routine.
  • the selected joint model application routine may be selected based on the classification of the pose.
  • the program code instructions may further be for employing one model among models employed in the selected joint model application routine to perform face alignment.
  • an apparatus for providing multi-view face alignment may include at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code may be configured, with the at least one processor, to cause the apparatus to perform at least causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and employing a selected joint model application routine.
  • the selected joint model application routine may be selected based on the classification of the pose.
  • the apparatus may also be configured for employing one model among models employed in the selected joint model application routine to perform face alignment.
  • an apparatus for providing multi-view face alignment may include means for causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, means for causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and means for employing a selected joint model application routine.
  • the selected joint model application routine may be selected based on the classification of the pose.
  • the apparatus may further include means for employing one model among models employed in the selected joint model application routine to perform face alignment.
  • Embodiments of the invention may provide a method, apparatus and computer program product for employment, for example, in mobile or fixed environments. As a result, for example, computing device users may enjoy an improved capability for face detection and recognition.
  • FIG. 1 illustrates a block diagram of a mobile terminal that may benefit from an example embodiment of the present invention
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention.
  • FIG. 3 illustrates an apparatus for enabling the provision of multi-view face alignment are displayed according to an example embodiment of the present invention
  • FIG. 4 shows a block diagram illustrating one example of an apparatus for enabling the provision of multi-view face alignment according to an example embodiment of the present invention
  • FIG. 5 illustrates an example image with manually labeled feature points in accordance with an example embodiment of the present invention
  • FIG. 6 shows examples of correspondences between a 2D and 3D shape model for frontal, left and right models according to an example embodiment of the present invention
  • FIG. 7 which includes FIGS. 7A and 7B, shows an example of single view face alignment in connection with an example embodiment of the present invention
  • FIG. 8 which includes FIGS. 8A and 8B, shows an example in which two different AAMs are jointed for a multi-view case in accordance with an example embodiment of the present invention
  • FIG. 9 illustrates an example usage of two different joint AAMs to smooth model transitions according to an example embodiment of the present invention
  • FIG. 10 which includes FIGS. 10A, 10B, IOC and 10D, illustrates a comparison of single AAM and joint AAM for an exaggerative expression according to an example embodiment of the present invention
  • FIG. 1 which includes FIGS. 1 1 A, 1 IB, 1 1C and 1 ID, illustrates a comparison of single AAM to joint AAM for an example with incorrect initialization according to an example embodiment of the present invention
  • FIG. 12 which includes FIGS 12A to 12L, illustrates a comparison of single joint 2D/2D+3D AAM according to an example embodiment of the present invention.
  • FIG. 13 is a flowchart according to an example method for providing multi-view face alignment according to an example embodiment of the present invention.
  • circuitry refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present.
  • This definition of 'circuitry 1 applies to all uses of this term herein, including in any claims.
  • the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware.
  • the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and / or other computing device.
  • Face alignment is a technique that is based on face detection. Face alignment involves extracting the shape of facial components in a static image or video stream. Multi-view face alignment is useful in multi-view face recognition and face pose and expression tracking.
  • the eyebrows, eyes, nose and mouth are generally the most meaningful features of the face when considering face alignment. In some cases, the eyebrows, eyes, nose and mouth may be represented by key feature points around or inside them and face alignment may be employed to find accurate locations of those same points in a current image.
  • Active shape model (ASM) and active appearance model (AAM) are two mainstream methods for face alignment.
  • ASM generally involves local searching including learning of the shape variation modes and local texture characteristics around each feature point. Texture descriptors may be very discriminative. However, the local search aspect may cause the model to get in local extremum.
  • AAM typically involves not only learning shape, but also global texture variation modes of the whole facial area.
  • a Gauss-Newton iteration may be employed to update both pose and shape parameters simultaneously. Since global texture is considered, AAM may be more robust in the presence of noise and occlusion.
  • AAM also employs an inverse compositional fitting to solve AAM extremely efficiently. However, the employment of the Gauss-Newton iteration may tend to make the convergence aperture of AAM relatively small.
  • the view based on AAMs may diverge, such as when the initial view is farther away than the actual or current view. For example, if there are two AAMs that can handle [-20, 20] and [20, 60] degrees in yaw, respectively, and the actual pose in an image is about 20 degrees in yaw, using either AAM may cause a large error.
  • the face detection (or eye detection) result may not be very accurate and the initialization of an AAM may have large translational, rotational or scale bias from actual conditions. Thus, the AAM may be difficult to converge to a reasonable solution.
  • a view based two dimensional (2D) plus three dimensional (3D) AAM may be a successful approach to realize real time face animation despite large pose and expression variations.
  • the pose information may be obtained and used to guide model selection (left/frontal/right views). However, when the mode! is changed, the transition may not occur smoothly and the estimated pose may become discontinuous. Accordingly, it may be desirable to provide improved face alignment techniques that may work well for both static images and video streams.
  • FIG, 1 one example embodiment of the invention, illustrates a block diagram of a mobile terminal 10 that may benefit from embodiments of the present invention.
  • a mobile terminal as illustrated and hereinafter described is merely illustrative of one type of device that may benefit from embodiments of the present invention and, therefore, should not be taken to limit the scope of embodiments of the present invention. While several embodiments of the mobile terminal 10 may be illustrated and hereinafter described for purposes of example, other types of mobile terminals, such as portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, all types of computers (e.g., laptops or mobile computers), cameras, audio/video players, radio, global positioning system (GPS) devices, or any combination of the aforementioned, and other types of communications systems, may readily employ embodiments of the present invention.
  • PDAs portable digital assistants
  • pagers mobile televisions
  • gaming devices e.g., gaming devices
  • computers e.g., laptops or mobile computers
  • GPS global positioning system
  • the mobile terminal 10 may include an antenna 12 (or multiple antennas) in operable communication with a transmitter 14 and a receiver 16.
  • the mobile terminal 10 may further include an apparatus, such as a controller 20 or other processor, that provides signals to and receives signals from the transmitter 14 and receiver 16, respectively.
  • the signals may include signaling information in accordance with the air interface standard of the applicable cellular system, and/or may also include data corresponding to user speech, received data and/or user generated data.
  • the mobile terminal 10 may be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types.
  • the mobile terminal 10 may be capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like.
  • the mobile terminal 10 may be capable of operating in accordance with second- generation (2G) wireless communication protocols IS- 136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as E-UTRAN (evolved- universal terrestrial radio access network), with fourth-generation (4G) wireless communication protocols or the like.
  • 2G wireless communication protocols IS- 136 (time division multiple access (TDMA)
  • GSM global system for mobile communication
  • CDMA code division multiple access
  • third-generation (3G) wireless communication protocols such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA)
  • 3.9G wireless communication protocol such as E-UTRAN (evol
  • the apparatus may include circuitry implementing, among others, audio and logic functions of the mobile terminal 10,
  • the controller 20 may comprise a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and/or other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities.
  • the controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission.
  • the controller 20 may additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory.
  • the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser.
  • the connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like, for example.
  • WAP Wireless Application Protocol
  • HTTP Hypertext Transfer Protocol
  • the mobile terminal 10 may also comprise a user interface including an output device such as an earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, which may be coupled to the controller 20.
  • the user input interface which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30, a touch display (not shown), a microphone or other input device.
  • the keypad 30 may include numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the mobile terminal 10.
  • the keypad 30 may include a conventional QWERTY keypad arrangement.
  • the keypad 30 may also include various soft keys with associated functions.
  • the mobile terminal 10 may include an interface device such as a joystick or other user input interface.
  • the mobile terminal 10 further includes a battery 34, such as a vibrating battery pack, for powering various circuits that are used to operate the mobile terminal 10, as well as optionally providing mechanical vibration as a detectable output.
  • the mobile terminal 10 includes a media capturing element, such as a camera, video and/or audio module, in communication with the controller 20.
  • the media capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission.
  • the camera module 36 may include a digital camera capable of forming a digital image file from a captured image.
  • the camera module 36 includes all hardware, such as a lens or other optical components), and software necessary for creating a digital image file from a captured image.
  • the camera module 36 may include only the hardware needed to view an image, while a memory device of the mobile terminal 10 stores instructions for execution by the controller 20 in the form of software necessary to create a digital image file from a captured image.
  • the camera module 36 may further include a processing element such as a co-processor which assists the controller 20 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data.
  • the encoder and or decoder may encode and/or decode according to a JPEG standard format or another like format.
  • the camera module 36 may provide live image data to the display 28.
  • the display 28 may be located on one side of the mobile terminal 10 and the camera module 36 may include a lens positioned on the opposite side of the mobile terminal 10 with respect to the display 28 to enable the camera module 36 to capture images on one side of the mobile terminal 10 and present a view of such images to the user positioned on the other side of the mobile terminal 10.
  • the mobile terminal 10 may further include a user identity module (UIM) 38, which may generically be referred to as a smart card.
  • the UIM 38 is typically a memory device having a processor built in.
  • the UIM 38 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), or any other smart card.
  • SIM subscriber identity module
  • UICC universal integrated circuit card
  • USIM universal subscriber identity module
  • R-UIM removable user identity module
  • the UIM 38 typically stores information elements related to a mobile subscriber.
  • the mobile terminal 10 may be equipped with memory.
  • the mobile terminal 10 may include volatile memory 40, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data.
  • RAM volatile Random Access Memory
  • the mobile terminal 10 may also include other non-volatile memory 42, which may be embedded and/or may be removable.
  • the non-volatile memory 42 may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory or the like.
  • EEPROM electrically erasable programmable read only memory
  • the memories may store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10.
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention. Refeixing now to FIG. 2, an illustration of one type of system that would benefit from embodiments of the present invention is provided.
  • a system in accordance with an example embodiment of the present invention includes a first communication device (e.g., mobile terminal 10) and in some cases also a second communication device 48 that may each be capable of communication with a network 50.
  • the first communication device or mobile terminal 10 may be considered to be synonymous with an on-site device associated with an on- site user.
  • the second communication device 48 may be considered to be synonymous with a remote device associated with a remote user.
  • the second communication device 48 may be a remotely located user of another mobile terminal, or a user of a fixed computer or computer terminal (e.g., a personal computer (PC)).
  • a fixed computer or computer terminal e.g., a personal computer (PC)
  • multiple devices may collaborate with each other and thus example embodiments are not limited to scenarios where only two devices collaborate or where devices operate completely independently of one another.
  • the communications devices of the system may be able to communicate with network devices or with each other via the network 50.
  • the network devices with which the communication devices of the system communicate may include a service platform 60.
  • the mobile terminal 10 (and/or the second communication device 48) is enabled to communicate with the service platform 60 to provide, request and/or receive information.
  • the service platform 60 may comprise all the devices illustrated and/or described herein.
  • the network 50 includes a collection of various different nodes, devices or functions that are capable of communication with each other via corresponding wired and/or wireless interfaces.
  • the illustration of FIG. 2 should be understood to be an example of a broad view of certain elements of the system and not an all inclusive or detailed view of the system or the network 50.
  • the network 50 may be capable of supporting communication in accordance with any one or more of a number of first-generation (1G), second-generation (2G), 2.5G, third-generation (3G), 3.5G, 3.9G, fourth-generation (4G) mobile communication protocols, Long Term Evolution (LTE), LTE advanced (LTE-A), and/or the like.
  • One or more communication terminals such as the mobile terminal 10 and the second communication device 48 may be capable of communication with each other via the network 50 and each may include an antenna or antennas for transmitting signals to and for receiving signals from a base site, which could be, for example a base station that is a part of one or more cellular or mobile networks or an access point that may be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN), such as the Internet,
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • processing devices or elements e.g., personal computers, server computers or the like
  • the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the other devices (or each other), for example, according to numerous communication protocols including Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various communication or other functions of the mobile terminal 10 and the second communication device 48, respectively.
  • HTTP Hypertext Transfer Protocol
  • the mobile terminal 10 and the second communication device 48 may communicate in accordance with, for example, radio frequency (RF), Bluetooth (BT), Infrared (IR) or any of a number of different wireline or wireless communication techniques, including LAN, wireless LAN (WLAN), Worldwide Interoperability for Microwave Access (WiMAX), WiFi, ultra-wide band (UWB), Wibree techniques and/or the like.
  • RF radio frequency
  • BT Bluetooth
  • IR Infrared
  • LAN wireless LAN
  • WiMAX Worldwide Interoperability for Microwave Access
  • WiFi WiFi
  • UWB ultra-wide band
  • Wibree techniques and/or the like.
  • the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the network 50 and each other by any of numerous different access mechanisms.
  • the service platform 60 may be a device or node such as a server or other processing device.
  • the service platform 60 may have any number of functions or associations with various services.
  • the service platform 60 may be a platform such as a dedicated server (or server bank) associated with a particular information source or service (e.g., face detection, face alignment, face recognition and/or the like, etc.), or the service platform 60 may be a backend server associated with one or more other functions or services.
  • the service platform 60 represents a potential host for a plurality of different services or information sources.
  • the functionality of the service platform 60 is provided by hardware and/or software components configured to operate in accordance with known techniques for the provision of information to users of communication devices. However, at least some of the functionality provided by the service platform 60 is information provided in accordance with example embodiments of the present invention.
  • the service platform 60 may host an apparatus for providing face alignment services to a device practicing an embodiment of the present invention.
  • the service platform 60 may itself perform example embodiments, while in other embodiments, the service platform 60 may facilitate (e.g., by the provision of image data or processing of image data) operation of an example embodiment at another device (e.g., the mobile terminal 10 and/or the second communication device 48).
  • the service platform 60 may not be included at all. In other words, in some embodiments, operations in accordance with an example embodiment may be performed at the mobile terminal and/or the second communication device 48 without any interaction with the network 50 and/or the service platform 60.
  • FIG. 3 An example embodiment will now be described with reference to FIG. 3, in which certain elements of an apparatus for enabling the provision of multi-view face alignment are displayed.
  • the apparatus of FIG. 3 may be employed, for example, on the service platform 60 or the mobile terminal 10 of FIG. 2.
  • the apparatus of FIG. 3 may also be employed on a variety of other devices. Therefore, example embodiments should not be limited to application on devices such as the service platform 60 or mobile terminal 10 of FIG. 2. Alternatively, embodiments may be employed on a combination of devices including, for example, those listed above.
  • some example embodiments may be embodied wholly at a single device (e.g., the service platform 60, the mobile terminal 10 or the second communication device 48) or by devices in a client/server relationship (e.g., the service platform 60 serving information to the mobile terminal 10 and/or the second communication device 48).
  • the devices or elements described below may not be mandatory and thus some may be omitted in certain embodiments.
  • the apparatus 65 may include or otherwise be in communication with a processor 70, a user interface 72, a communication interface 74 and a memory device 76.
  • the memory device 76 may include, for example, one or more volatile and/or non-volatile memories.
  • the memory device 76 may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processor 70).
  • the memory device 76 may be configured to store information, data, applications, instructions or the like for enabling the apparatus to carry out various functions in accordance with example embodiments of the present invention.
  • the memory device 76 could be configured to buffer input data for processing by the processor 70.
  • the memory device 76 could be configured to store instructions for execution by the processor 70.
  • the apparatus 65 may, in some embodiments, be a network device (e.g., service platform 60) or other devices (e.g., the mobile terminal 10 or the second communication device 48) that may operate independent of or in connection with a network. However, in some embodiments, the apparatus 65 may be instantiated at one or more of the service platform 60, the mobile terminal 10 and the second communication device 48. Thus, the apparatus 65 may be any computing device configured to employ an example embodiment of the present invention. However, in some embodiments, the apparatus 65 may be embodied as a chip or chip set (which may in turn be employed at one of the devices mentioned above).
  • the apparatus 65 may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard).
  • the structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon.
  • the apparatus 65 may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single "system on a chip.”
  • a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • the processor 70 may be embodied in a number of different ways.
  • the processor 70 may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special -purpose computer chip, or the like.
  • the processor 70 may include one or more processing cores configured to perform independently.
  • a multi-core processor may enable multiprocessing within a single physical package.
  • the processor 70 may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
  • the processor 70 may be configured to execute instructions stored in the memory device 76 or otherwise accessible to the processor 70. Alternatively or additionally, the processor 70 may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 70 may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor 70 is embodied as an ASIC, FPGA or the like, the processor 70 may be specifically configured hardware for conducting the operations described herein.
  • the instructions may specifically configure the processor 70 to perform the algorithms and/or operations described herein when the instructions are executed.
  • the processor 70 may be a processor of a specific device (e.g., a mobile terminal or network device) adapted for employing an embodiment of the present invention by further configuration of the processor 70 by instructions for performing the algorithms and/or operations described herein.
  • the processor 70 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 70.
  • ALU arithmetic logic unit
  • the communication interface 74 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 50.
  • the communication interface 74 may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network.
  • the communication interface 74 may alternatively or also support wired communication.
  • the communication interface 74 may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
  • the user interface 72 may be in communication with the processor 70 to receive an indication of a user input at the user interface 72 and/or to provide an audible, visual, mechanical or other output to the user.
  • the user interface 72 may include, for example, a keyboard, a mouse, a joystick, a display, a touch screen(s), touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms.
  • the apparatus 65 is embodied as a server or some other network devices, the user interface 72 may be limited, or eliminated.
  • the user interface 72 may include, among other devices or elements, any or all of a speaker, a microphone, a display, and a keyboard or the like.
  • the processor 70 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface, such as, for example, a speaker, ringer, microphone, display, and/or the like.
  • the processor 70 and/or user interface circuitry comprising the processor 70 may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor 70 (e.g., memory device 76, and or the like).
  • computer program instructions e.g., software and/or firmware
  • a memory accessible to the processor 70 e.g., memory device 76, and or the like.
  • the processor 70 may be embodied as, include or otherwise control a face detector 80, a pose classifier 82, and a joint model manager 84. As such, in some embodiments, the processor 70 may be said to cause, direct or control the execution or occurrence of the various functions attributed to the face detector 80, the pose classifier 82, and the joint model manager 84, respectively, as described herein.
  • the face detector 80, the pose classifier 82, and the joint model manager 84 may each be any means such as a device or circuitry operating in accordance with software or otherwise embodied in hardware or a combination of hardware and software (e.g., processor 70 operating under software control, the processor 70 embodied as an ASIC or FPGA specifically configured to perform the operations described herein, or a combination thereof) thereby configuring the device or circuitry to perform the corresponding functions of the face detector 80, the pose classifier 82, and the joint model manager 84, respectively, as described herein.
  • a device or circuitry e.g., the processor 70 in one example
  • executing the software forms the structure associated with such means.
  • the face detector 80 may be configured to identify and/or isolate faces within an image (or series of images in the case of video analysis). The face detector 80 may therefore be configured to detect face candidates in images without regard to pose. Face detections may be determined based on employing statistical analysis methods of any kind. In an example embodiment, the face detector 80 may employ Adaboost (adaptive boosting) based statistical methods or other statistical methods (e.g., Gentle-Boost, RealBoost, FloatBoost, and/or the like) to provide coarse pose estimation. The coarse pose estimation may be provided to the pose classifier 82 to perform more detailed pose estimation.
  • Adaboost adaptive boosting
  • Other statistical methods e.g., Gentle-Boost, RealBoost, FloatBoost, and/or the like
  • the pose classifier 82 may be configured to classify the pose (e.g., a coarse pose determined by the face detector 80) into one of a plurality of categories.
  • the pose classifier 82 may be configured to classify the pose as following into one of the following five categories in terms of degrees in yaw, [-60, -30], [-30, -10], [-10, 10], [10, 30] and [30, 60].
  • An AAM may be trained (e.g., via the joint model manager 84) for each of three respective different ranges (e.g., [-60, -20], [-20, 20] and [20, 60] degrees, respectively).
  • joint AAMs may be run based on which of the categories the pose classifier 82 selects for classifying the pose.
  • the joint model manager 84 may be employed after pose classification to run joint AAMs using selected ones among the trained AAMs based on the classification of the pose relative to the categories above as described in greater detail below.
  • the joint model manager 84 may be configured to select a joint AAM application routine based on the classification of the pose. In some cases, there may be two joint AAM application routines that may be selected for application based on the classification of the pose.
  • a first joint AAM application routine may include running joint AAMs where one (e.g., the same) AAM is used, but the joint AAMs have different initial solutions.
  • a second joint AAM application routine may include running joint AAMs where two different AAMs are employed-
  • the joint model manager 84 may be configured to employ the first joint AAM application routine with respect to the one of the AAMs that corresponds to the pose classification.
  • the second joint AAM application routine may be employed by jointing the first two AAMs (e.g., the AAMs corresponding to [-60, -20] and [-20, -20]) or by jointing the last two AAMs (e.g., the AAMs corresponding to [-20, 20] and [20, 60]). All of the AAMs may be initialized during face detection.
  • the joint model manager 84 may be configured to select one of the jointly run AAMs (e.g., the one with the lowest error) and then continue to run the selected AAM until convergence in order to achieve face alignment.
  • the joint model manager 84 may be configured to perform face alignment using more than one AAM jointed by shape constraint items that may be fitted by employing a project-out inverse compositional (POIC) algorithm. Since POIC fitting uses a Gauss-Newton iteration algorithm to find a minimum, a Hessian matrix may be approximated by the product of a Jacobi matrix and its transpose. When the solution is close to a minimum, the Jacobi matrix may approach zeros, which may lead to large parameter increments for certain parameters.
  • POIC project-out inverse compositional
  • a maximum iteration time 2 ⁇ may be introduced.
  • the parameters with minimal error among each time are considered to be optimal.
  • the shape of both AAMs may be close to each other and the one with less error may be chosen to be run T s times.
  • FIG. 4 illustrates a flowchart indicating operation of the apparatus 65 according to an example embodiment for face alignment.
  • image data may be input into the system for face detection at operation 100.
  • the face detection process may employ Adaboost in order to give rough position and pose of the face.
  • pose classification may be performed (e.g., into one of the five categories described above) at operation 1 10.
  • joint AAMs may be run with different initial solutions at operation 120 or joint AAMs may be run with different AAMs at operation 130.
  • One AAM may then be selected at operation 140 (e.g., based on having less error).
  • the selected AAM may then continue to be run until an optimal output is achieved.
  • the apparatus 65 may operate to perform face alignment according to example embodiments.
  • One example that employs 2D AAMs will be described hereafter.
  • the shape of human face can be represented by some key feature points with their locations as shown in the example of FIG.5 in which 88 manually labeled feature points are identified.
  • / denotes the image.
  • 3 ⁇ 4,(» impart) > 3 ⁇ 4 ef denotes the mean appearance in a normalized template T.
  • PC A principal component analysis
  • p and q denote the shape and pose parameters, respectively
  • W ⁇ x;p) and N ⁇ x;q) denote the warp caused by shape variations and pose variations, respectively.
  • more than one AAM may be employed.
  • Superscript ⁇ may be used to distinguish between the AAMs.
  • the error function may include three parts, for example, appearance error j i .m ) of b 0 th AAMs and shape error J s between them.
  • N [fV ⁇ v ⁇ ; p u) );q li1 ) and v(w(v ⁇ - , ; /j (I> );? ( - :i ) may represent the warped shapes by both AAMs, respectively.
  • AT is a positive empirical coefficient, which may be related to the scale of the face.
  • K may be set to the quotient of 0.02 and the initial scale.
  • the dimension of parameters in equation (8) may be doubled (if ⁇ ⁇ and p i2) have the same length, and so do q w and q (2) ).
  • the fitting of equation (8) may be performed in a higher dimension parameter space, which may assist in accessing the optimal solution.
  • POIC technique may be employed.
  • the parameter updating mode may be forward additive: );* ⁇ + ⁇ 2 ' )f (6) 1,2 (7)
  • AAM ( 1> and AAM (2) from equation (5) may be the same AAM with different initial solutions.
  • a frontal AAM may be chosen.
  • two initial shapes can be chosen with some differences in rigid or non-rigid shape variation. If K is suitably chosen, at the end of iteration, p > ( q w ) is close to or equal to ⁇ ⁇ ( q (2> ). If two initial solutions are the same one, then J is zero and equation (5) is equivalent to equation (1).
  • AAM (1) and AAM t2) may be selected to be two different AAMs.
  • three AAMs have been trained to handle [-60 -20], [-20 20] and [20 60] degrees in yaw, respectively. If the yaw of input face is roughly located in the range of [-30 -10] degrees, then the first two AAMs may be selected. Due to the small convergence aperture problem, both of the AAMs may become divergent, since their initial yaws are -40 and 0 degrees, respectively. However, using both of the AAMs together, the accuracy may be improved.
  • equation 5 in order to extend equation 5 to joint M AAMs, s can be rewritten as follows:
  • M AAMs may be the same and others may be different.
  • the POIC fitting technique with Gauss-Newton iteration algorithm can also be used to minimize equation (11) relatively efficiently.
  • a 3D shape model may be introduced to impose stronger constraints on the shape. Both the robustness and convergence speed may be improved.
  • an extended 3D Candide model may be used as 3D shape constraints.
  • the maps of feature points between 2D and 3D shape models may be designed as shown in FIG. 6.
  • FIG. 6 shows examples of correspondences between a 2D and 3D shape model for frontal, left and right models.
  • a 2D+3D AAM may minimize the error defined in equation (12).
  • the subscript ' may be used to indicate a 3D shape model with its parameters.
  • Ml tt-I W " (v a ; p ) represents deformation caused by shape and animation units.
  • N ' (v n ' ;q ) is the rigid transform with rotation, translation and scale, q is 3D pose parameter vector containing six independent elements. Similar to equation (5), equation (12) can be minimized by POIC fitting algorithm efficiently.
  • more than one 2D AAM may be employed. Since a different view AAM has a different relation with a 3D shape model, a different v n may be selected to correspond to a different AAM.
  • the shape variation in the 3D model is only caused by the non-rigid variation of face, e.g., expression, the parameters of 3D model, p and , may be unique. Then the error can be represented as follows:
  • a multi-view face alignment strategy may be developed based on joint-AAM in order to handle the poses in various range categories between [-60, 60] degrees in yaw.
  • Three different AAMs are trained to handle [-60, -20], [-20, 20] and [20, 60] degrees, respectively.
  • Adaboost based multi-view face detection may be employed to give the rough position and pose of the face. Then the face is classified into one of 5 categories, e.g., [-60, -30], [-30, -10], [-10, 10], [10, 30], [30, 60] degrees in yaw.
  • the face falls into the 1 st , 3 rd or 5 th category (e.g., [-60, -30], [-10, 10], or [30, 60] degrees in yaw)
  • one of the above 3 AAMs is selected with different initial solutions.
  • the face falls into one of the other two categories (e.g., [-30, -10] or [10, 30] degrees in yaw)
  • the first two or last two AAMs are jointed. All AAMs are initialized by face detection box.
  • Some example embodiments may therefore be used in connection with 2D joint- AAM face alignment such that more than one AAM may be jointed by some shape constraint items to be fitted by an efficient POIC algorithm.
  • a 3D shape model may be introduced to impose stronger shape constraints on the joint AAM that can also be fitted by the POIC algorithm.
  • jointed AAMs may be the same, but may have different initial solutions to improve robustness for poor initializations or exaggerative expressions.
  • FIG. 8 which includes FIGS. 8A and 8B, shows an example in which two different AAMs are jointed for a multi-view case.
  • 2D+3D joint AAM may be employed to make a model transition smoother and produce continuous pose variation as shown in the example of FIG. 9.
  • the AAMs may be set with different scales, 0.9 3 ⁇ 4 and 1.1 s 0 .
  • the initial scale s 0 used for a single AAM, may be estimated by the width of face box.
  • FIG. 10 which includes FIGS. 10A, 10B, IOC and 10D, illustrates a comparison of single AAM and joint AAM for an exaggerative expression.
  • FIG. 10 illustrates a comparison of single AAM and joint AAM for an exaggerative expression.
  • FIG. 10A which includes FIGS. 11A, I IB, 1 1C and 11D, illustrates a comparison of single AAM to joint AAM for an example with incorrect initialization.
  • FIG. 1 1 A the initial shape is too high due to an incorrect face detection box. All the key facial organs are mismatched by single AAM as shown in FIG. 1 IB.
  • FIG. 1 1C By jointing two different shapes in FIG. 1 1C, the iteration process terminates at a good solution where the closed eyes and open mouth are correctly matched in FIG. 1 1 D.
  • FIG. 12 which includes FIGS 12A to 12L.
  • the yaw is about -22 degrees.
  • the left AAM (e.g., corresponding to [-60, -20]) may be aligned to the image.
  • the initialization may be much better than that of frontal AAM as shown in FIG. 12D and thus the aligned shape may be relatively closer to actual.
  • FIGS. 12E and 12F it may be appreciated that the face contour is still not matched and part of ear is covered.
  • equations (5) and (14) may be used to joint both frontal and left AAMs together, without and with the 3D shape constraint, respectively, as shown in FIGS. 12G and 12 J.
  • the penalty on the shape differences make the frontal shape turn left and the left one turn right. They nearly overlap when the optimal solution is reached.
  • FIGS. 12H and 12K both shapes can be seen to be aligned almost perfectly to the face.
  • the final results are shown in FIGS. 121 and 12L, respectively.
  • Accuracy of different types of AAMs may be measured by comparing the aligned shape with a prior manually labeled shape. The distance between each pair of aligned and labeled feature points in each testing image may then be recorded. To unify the scale, all the distances may be rescaled into a warped template of each AAM.
  • Some example embodiments may increase the robustness and accuracy of single view face alignment under poor initialization or for exaggerated expressions. Some example embodiments may also increase the robustness and accuracy of multi-view face alignment when the face view is between the views of two AAMs. Some example embodiments may also smooth the model transition in 2D+3D AAM based real time facial animation capture in a video stream.
  • FIG. 13 is a flowchart of a system, method and program product according to example embodiments of the invention. It will be understood that each block of the flowchart, and combinations of blocks in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device of an apparatus employing an embodiment of the present invention and executed by a processor in the apparatus.
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus embody a mechanism for implementing the functions specified in the flowchart block(s).
  • These computer program instructions may also be stored in a computer-readable storage memory (as opposed to a transmission medium such as a carrier wave or electromagnetic signal) that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the flowchart block(s).
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart block(s).
  • blocks of the flowchart support combinations of means for performing the specified functions, combinations of operations for performing the specified functions and program instructions for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions or operations, or combinations of special purpose hardware and computer instructions.
  • one embodiment of a method according to an example embodiment as shown in FIG. 13 may include causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data at operation 200, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories at operation 210 and employing a selected joint model application routine at operation 220.
  • the selected joint model application routine may be selected based on the classification of the pose.
  • the method may further include employing one model among models employed in the selected joint model application routine to perform face alignment at operation 230. This employment may continue until convergence.
  • certain ones of the operations above may be modified or further amplified as described below. Moreover, in some embodiments additional optional operations may also be included (an example of which is shown in dashed lines in FIG. 13). It should be appreciated that each of the modifications, optional additions or amplifications below may be included with the operations above either alone or in combination with any others among the features described herein.
  • the method may further include selecting the one model based on the one model having less error among the models employed in the selected joint model application routine at operation 225.
  • causing performance of pose classification comprises classifying the pose of the face into one of five categories in terms of degrees in yaw.
  • employing the selected joint model application routine comprises employing a first joint active appearance model (AAM) application routine including running joint models where one AAM is used different initial solutions or employing a second joint AAM application routine including running joint models where two different AAMs are employed.
  • AAM joint active appearance model
  • employing the first joint AAM application routine may include employing one instance of a frontal AAM, a left AAM or a right AAM jointly with another instance of the frontal AAM, the left AAM or the right AAM with a different initial solution in response to the pose being classified in a category corresponding to angles corresponding to frontal, left and right poses (e.g., corresponding to categories with poses in the ranges of [-60, -30], [-10, 10], or [30, 60] degrees of yaw).
  • employing the second joint AAM application routing comprises employing a frontal AAM and either a left AAM or a right AAM in response to the pose being classified into a category corresponding to angles between the frontal, left and right poses (e.g., corresponding to categories with poses in the ranges of [-30, -10] or [10, 30] degrees of yaw).
  • employing the one model may include running a selected active appearance model (AAM) among joint AAMs for a maximum iteration time to perform face alignment.
  • employing the selected joint model application routine may include imposing a shape constraint with respect to joint active appearance models (AAMs).
  • an apparatus for performing the method of FIG. 13 above may comprise a processor (e.g., the processor 70) configured to perform some or each of the operations (200-230) described above.
  • the processor 70 may, for example, be configured to perform the operations (200-230) by performing hardware implemented logical functions, executing stored instructions, or executing algorithms for performing each of the operations.
  • the apparatus may comprise means for performing each of the operations described above.
  • examples of means for performing operations 200-230 may comprise, for example, respective ones of the face detector 80, the pose classifier 82, and the joint model manager 84.
  • the processor 70 may be configured to control or even be embodied as the face detector 80, the pose classifier 82, and the joint model manager 84, the processor 70 and/or a device or circuitry for executing instructions or executing an algorithm for processing information as described above may also form example means for performing operations 200- 230.
  • An example of an apparatus may include at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code may be configured to, with the at least one processor, cause the apparatus to perform the operations 200-230 (with or without the modifications and amplifications described above in any combination).
  • An example of a computer program product may include at least one computer-readable storage medium having computer-executable program code portions stored therein.
  • the computer-executable program code portions may include program code instructions for performing operation 200-230 (with or without the modifications and amplifications described above in any combination).
  • the operations (200-230) described above, along with any of the modifications may be implemented in a method that involves facilitating access to at least one interface to allow access to at least one service via at least one network. In such cases, the at least one service may be said to perform at least operations 200 to 230.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

A method for providing multi-view face alignment may include causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and employing a selected joint model application routine. The selected joint model application routine may be selected based on the classification of the pose. The method may further include employing one model among models employed in the selected joint model application routine to perform face alignment. An apparatus and computer program product corresponding to the method are also provided.

Description

METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR PROVIDING MULTI-VIEW FACE ALIGNMENT
TECHNOLOGICAL FIELD
[0001] Embodiments of the present invention relate generally to image processing technology and, more particularly, relate to a method, apparatus and computer program product for providing multi-view face alignment.
BACKGROUND
[0002] Face detection and recognition is becoming an increasingly more important technology. In this regard, for example, face detection may be useful in biometrics, user interface, gaming and other areas such as creating context for accessing communities in the mobile domain. Face detection may also be important going forward in relation to initiatives such as metadata standardization. Face detection is also important in relation to face alignment.
[0003] Face alignment has an impact on face-oriented applications and solutions such as, for example, real time face animation for avatar generation or other face recognition applications. Face alignment can also assist in extraction of accurate location determinations for salient facial features to improve face recognition, face expression tracking, age estimation, gender estimation, and/or the like. Many services and devices are currently being developed with face recognition, detection and/or classification functionalities being contemplated as available features for such services and devices.
[0004] Accordingly, the tendency for developing devices with continued increases in their capacity to create content, store content and/or receive content relatively quickly upon request, the trend toward electronic devices (e.g., mobile electronic devices such as mobile phones) becoming increasingly ubiquitous in the modern world, and the drive for continued improvements in interface and access mechanisms to unlock the capabilities of such devices, may make it desirable to provide further improvements in the area of face recognition and detection. BRIEF SUMMARY OF SOME EXAMPLES
[0005] A method, apparatus and computer program product are therefore provided to enable multi-view face alignment. In this regard, in some example embodiments, a mechanism is provided for jointly utilizing two active appearance models (AAMs) to improve face alignment. As such, embodiments of the present invention may provide a relatively robust ability for aligning faces even under multi-view conditions.
[0006] In an example embodiment, a method of providing multi-view face alignment is provided. The method may include causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and employing a selected joint model application routine. The selected joint model application routine may be selected based on the classification of the pose. The method may further include employing one model among models employed in the selected joint model application routine to perform face alignment.
[0007] In another example embodiment, a computer program product for providing multi- view face alignment is provided. The computer program product includes at least one computer- readable storage medium having computer- executable program code instructions stored therein. The computer-executable program code instructions may include program code instructions for causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and employing a selected joint model application routine. The selected joint model application routine may be selected based on the classification of the pose. The program code instructions may further be for employing one model among models employed in the selected joint model application routine to perform face alignment.
[0008] In another example embodiment, an apparatus for providing multi-view face alignment is provided. The apparatus may include at least one processor and at least one memory including computer program code. The at least one memory and the computer program code may be configured, with the at least one processor, to cause the apparatus to perform at least causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and employing a selected joint model application routine. The selected joint model application routine may be selected based on the classification of the pose. The apparatus may also be configured for employing one model among models employed in the selected joint model application routine to perform face alignment.
[0009} In yet another example embodiment, an apparatus for providing multi-view face alignment is provided. The apparatus may include means for causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data, means for causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories, and means for employing a selected joint model application routine. The selected joint model application routine may be selected based on the classification of the pose. The apparatus may further include means for employing one model among models employed in the selected joint model application routine to perform face alignment.
[0010] Embodiments of the invention may provide a method, apparatus and computer program product for employment, for example, in mobile or fixed environments. As a result, for example, computing device users may enjoy an improved capability for face detection and recognition.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S)
(0011] Having thus described some embodiments of the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
[0012] FIG. 1 illustrates a block diagram of a mobile terminal that may benefit from an example embodiment of the present invention;
[0013] FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention;
[0014] FIG. 3 illustrates an apparatus for enabling the provision of multi-view face alignment are displayed according to an example embodiment of the present invention; [0015] FIG. 4 shows a block diagram illustrating one example of an apparatus for enabling the provision of multi-view face alignment according to an example embodiment of the present invention;
[0016] FIG. 5 illustrates an example image with manually labeled feature points in accordance with an example embodiment of the present invention;
[0017] FIG. 6 shows examples of correspondences between a 2D and 3D shape model for frontal, left and right models according to an example embodiment of the present invention;
[0018] FIG. 7, which includes FIGS. 7A and 7B, shows an example of single view face alignment in connection with an example embodiment of the present invention;
J0019] FIG. 8, which includes FIGS. 8A and 8B, shows an example in which two different AAMs are jointed for a multi-view case in accordance with an example embodiment of the present invention;
[0020] FIG. 9 illustrates an example usage of two different joint AAMs to smooth model transitions according to an example embodiment of the present invention;
[0021] FIG. 10, which includes FIGS. 10A, 10B, IOC and 10D, illustrates a comparison of single AAM and joint AAM for an exaggerative expression according to an example embodiment of the present invention;
[0022] FIG. 1 1, which includes FIGS. 1 1 A, 1 IB, 1 1C and 1 ID, illustrates a comparison of single AAM to joint AAM for an example with incorrect initialization according to an example embodiment of the present invention;
[0023] FIG. 12, which includes FIGS 12A to 12L, illustrates a comparison of single joint 2D/2D+3D AAM according to an example embodiment of the present invention; and
[0024] FIG. 13 is a flowchart according to an example method for providing multi-view face alignment according to an example embodiment of the present invention.
DETAILED DESCRIPTION OF SOME EMBODIMENTS
[0025] Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms "data." "content," "information" and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
[0026] Additionally, as used herein, the term 'circuitry' refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of 'circuitry1 applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
[0027] As defined herein a "computer-readable storage medium," which refers to a non- transitory, physical storage medium (e.g., volatile or non-volatile memory device), can be differentiated from a "computer-readable transmission medium," which refers to an electromagnetic signal.
[0028) Face alignment is a technique that is based on face detection. Face alignment involves extracting the shape of facial components in a static image or video stream. Multi-view face alignment is useful in multi-view face recognition and face pose and expression tracking. The eyebrows, eyes, nose and mouth are generally the most meaningful features of the face when considering face alignment. In some cases, the eyebrows, eyes, nose and mouth may be represented by key feature points around or inside them and face alignment may be employed to find accurate locations of those same points in a current image. Active shape model (ASM) and active appearance model (AAM) are two mainstream methods for face alignment. ASM generally involves local searching including learning of the shape variation modes and local texture characteristics around each feature point. Texture descriptors may be very discriminative. However, the local search aspect may cause the model to get in local extremum.
[0029] AAM typically involves not only learning shape, but also global texture variation modes of the whole facial area. A Gauss-Newton iteration may be employed to update both pose and shape parameters simultaneously. Since global texture is considered, AAM may be more robust in the presence of noise and occlusion. AAM also employs an inverse compositional fitting to solve AAM extremely efficiently. However, the employment of the Gauss-Newton iteration may tend to make the convergence aperture of AAM relatively small. For static images, in a multi-view case, even if the face pose can be judged roughly by some classifiers after face detection (e.g., profile, half profile, frontal, etc.) the view based on AAMs may diverge, such as when the initial view is farther away than the actual or current view. For example, if there are two AAMs that can handle [-20, 20] and [20, 60] degrees in yaw, respectively, and the actual pose in an image is about 20 degrees in yaw, using either AAM may cause a large error. Also for a static image in a single-view case (e.g., where the refined pose is known from prior knowledge), the face detection (or eye detection) result may not be very accurate and the initialization of an AAM may have large translational, rotational or scale bias from actual conditions. Thus, the AAM may be difficult to converge to a reasonable solution. For video streams, a view based two dimensional (2D) plus three dimensional (3D) AAM may be a successful approach to realize real time face animation despite large pose and expression variations. The pose information may be obtained and used to guide model selection (left/frontal/right views). However, when the mode! is changed, the transition may not occur smoothly and the estimated pose may become discontinuous. Accordingly, it may be desirable to provide improved face alignment techniques that may work well for both static images and video streams.
[0030] Some embodiments of the present invention may therefore be employed to, for example, utilize the joint application of models in connection with performing face alignment By using joint models, face alignment in multi-view embodiments may be improved. In some embodiments, two different AAM models may be jointly run, while in other example embodiments, the same AAM models may be run jointly, but with different initial solutions. In either case, after jointly running AAMs, one may be selected, based on having less error, and the selected AAM may be run multiple times to achieve an optimal or otherwise desirable solution. [0031] FIG, 1, one example embodiment of the invention, illustrates a block diagram of a mobile terminal 10 that may benefit from embodiments of the present invention. It should be understood, however, that a mobile terminal as illustrated and hereinafter described is merely illustrative of one type of device that may benefit from embodiments of the present invention and, therefore, should not be taken to limit the scope of embodiments of the present invention. While several embodiments of the mobile terminal 10 may be illustrated and hereinafter described for purposes of example, other types of mobile terminals, such as portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, all types of computers (e.g., laptops or mobile computers), cameras, audio/video players, radio, global positioning system (GPS) devices, or any combination of the aforementioned, and other types of communications systems, may readily employ embodiments of the present invention.
[0032] The mobile terminal 10 may include an antenna 12 (or multiple antennas) in operable communication with a transmitter 14 and a receiver 16. The mobile terminal 10 may further include an apparatus, such as a controller 20 or other processor, that provides signals to and receives signals from the transmitter 14 and receiver 16, respectively. The signals may include signaling information in accordance with the air interface standard of the applicable cellular system, and/or may also include data corresponding to user speech, received data and/or user generated data. In this regard, the mobile terminal 10 may be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the mobile terminal 10 may be capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like. For example, the mobile terminal 10 may be capable of operating in accordance with second- generation (2G) wireless communication protocols IS- 136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as E-UTRAN (evolved- universal terrestrial radio access network), with fourth-generation (4G) wireless communication protocols or the like. As an alternative (or additionally), the mobile terminal 10 may be capable of operating in accordance with non- cellular communication mechanisms. For example, the mobile terminal 10 may be capable of communication in a wireless local area network (WLAN) or other communication networks.
[0033] It is understood that the apparatus, such as the controller 20, may include circuitry implementing, among others, audio and logic functions of the mobile terminal 10, For example, the controller 20 may comprise a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and/or other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities. The controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. The controller 20 may additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory. For example, the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like, for example.
[0034] The mobile terminal 10 may also comprise a user interface including an output device such as an earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, which may be coupled to the controller 20. The user input interface, which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30, a touch display (not shown), a microphone or other input device. In embodiments including the keypad 30, the keypad 30 may include numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the mobile terminal 10. Alternatively, the keypad 30 may include a conventional QWERTY keypad arrangement. The keypad 30 may also include various soft keys with associated functions. In addition, or alternatively, the mobile terminal 10 may include an interface device such as a joystick or other user input interface. The mobile terminal 10 further includes a battery 34, such as a vibrating battery pack, for powering various circuits that are used to operate the mobile terminal 10, as well as optionally providing mechanical vibration as a detectable output.
[0035] In some embodiments, the mobile terminal 10 includes a media capturing element, such as a camera, video and/or audio module, in communication with the controller 20. The media capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission. For example, in an example embodiment in which the media capturing element is a camera module 36, the camera module 36 may include a digital camera capable of forming a digital image file from a captured image. As such, the camera module 36 includes all hardware, such as a lens or other optical components), and software necessary for creating a digital image file from a captured image. Alternatively, the camera module 36 may include only the hardware needed to view an image, while a memory device of the mobile terminal 10 stores instructions for execution by the controller 20 in the form of software necessary to create a digital image file from a captured image. In an example embodiment, the camera module 36 may further include a processing element such as a co-processor which assists the controller 20 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and or decoder may encode and/or decode according to a JPEG standard format or another like format. In some cases, the camera module 36 may provide live image data to the display 28. Moreover, in an example embodiment, the display 28 may be located on one side of the mobile terminal 10 and the camera module 36 may include a lens positioned on the opposite side of the mobile terminal 10 with respect to the display 28 to enable the camera module 36 to capture images on one side of the mobile terminal 10 and present a view of such images to the user positioned on the other side of the mobile terminal 10.
[0036} The mobile terminal 10 may further include a user identity module (UIM) 38, which may generically be referred to as a smart card. The UIM 38 is typically a memory device having a processor built in. The UIM 38 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), or any other smart card. The UIM 38 typically stores information elements related to a mobile subscriber. In addition to the UIM 38, the mobile terminal 10 may be equipped with memory. For example, the mobile terminal 10 may include volatile memory 40, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data. The mobile terminal 10 may also include other non-volatile memory 42, which may be embedded and/or may be removable. The non-volatile memory 42 may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory or the like. The memories may store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10.
[0037] FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention. Refeixing now to FIG. 2, an illustration of one type of system that would benefit from embodiments of the present invention is provided. As shown in FIG. 2, a system in accordance with an example embodiment of the present invention includes a first communication device (e.g., mobile terminal 10) and in some cases also a second communication device 48 that may each be capable of communication with a network 50. For purposes of the discussion herein, the first communication device or mobile terminal 10 may be considered to be synonymous with an on-site device associated with an on- site user. The second communication device 48 may be considered to be synonymous with a remote device associated with a remote user. The second communication device 48 may be a remotely located user of another mobile terminal, or a user of a fixed computer or computer terminal (e.g., a personal computer (PC)). However, it should be appreciated that, in some examples, multiple devices (either locally or remotely) may collaborate with each other and thus example embodiments are not limited to scenarios where only two devices collaborate or where devices operate completely independently of one another. Thus, there may be multiplicity with respect to instances of other devices that may be included in the network 50 and that may practice example embodiments. The communications devices of the system may be able to communicate with network devices or with each other via the network 50. In some cases, the network devices with which the communication devices of the system communicate may include a service platform 60. In an example embodiment, the mobile terminal 10 (and/or the second communication device 48) is enabled to communicate with the service platform 60 to provide, request and/or receive information. However, in some embodiments, not all systems that employ embodiments of the present invention may comprise all the devices illustrated and/or described herein.
[0038] In an example embodiment, the network 50 includes a collection of various different nodes, devices or functions that are capable of communication with each other via corresponding wired and/or wireless interfaces. As such, the illustration of FIG. 2 should be understood to be an example of a broad view of certain elements of the system and not an all inclusive or detailed view of the system or the network 50. Although not necessary, in some embodiments, the network 50 may be capable of supporting communication in accordance with any one or more of a number of first-generation (1G), second-generation (2G), 2.5G, third-generation (3G), 3.5G, 3.9G, fourth-generation (4G) mobile communication protocols, Long Term Evolution (LTE), LTE advanced (LTE-A), and/or the like.
[0039] One or more communication terminals such as the mobile terminal 10 and the second communication device 48 may be capable of communication with each other via the network 50 and each may include an antenna or antennas for transmitting signals to and for receiving signals from a base site, which could be, for example a base station that is a part of one or more cellular or mobile networks or an access point that may be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN), such as the Internet, In turn, other devices such as processing devices or elements (e.g., personal computers, server computers or the like) may be coupled to the mobile terminal 10 and the second communication device 48 via the network 50. By directly or indirectly connecting the mobile terminal 10, the second communication device 48 and other devices to the network 50, the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the other devices (or each other), for example, according to numerous communication protocols including Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various communication or other functions of the mobile terminal 10 and the second communication device 48, respectively.
[0040J Furthermore, although not shown in FIG. 2, the mobile terminal 10 and the second communication device 48 may communicate in accordance with, for example, radio frequency (RF), Bluetooth (BT), Infrared (IR) or any of a number of different wireline or wireless communication techniques, including LAN, wireless LAN (WLAN), Worldwide Interoperability for Microwave Access (WiMAX), WiFi, ultra-wide band (UWB), Wibree techniques and/or the like. As such, the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the network 50 and each other by any of numerous different access mechanisms. For example, mobile access mechanisms such as wideband code division multiple access (W-CDMA), CDMA2000, global system for mobile communications (GSM), general packet radio service (GPRS) and/or the like may be supported as well as wireless access mechanisms such as WLAN, WiMAX, and/or the like and fixed access mechanisms such as digital subscriber line (DSL), cable modems, Ethernet and/or the like. [0041] In an example embodiment, the service platform 60 may be a device or node such as a server or other processing device. The service platform 60 may have any number of functions or associations with various services. As such, for example, the service platform 60 may be a platform such as a dedicated server (or server bank) associated with a particular information source or service (e.g., face detection, face alignment, face recognition and/or the like, etc.), or the service platform 60 may be a backend server associated with one or more other functions or services. As such, the service platform 60 represents a potential host for a plurality of different services or information sources. In some embodiments, the functionality of the service platform 60 is provided by hardware and/or software components configured to operate in accordance with known techniques for the provision of information to users of communication devices. However, at least some of the functionality provided by the service platform 60 is information provided in accordance with example embodiments of the present invention.
[0042J In an example embodiment, the service platform 60 may host an apparatus for providing face alignment services to a device practicing an embodiment of the present invention. As such, in some embodiments, the service platform 60 may itself perform example embodiments, while in other embodiments, the service platform 60 may facilitate (e.g., by the provision of image data or processing of image data) operation of an example embodiment at another device (e.g., the mobile terminal 10 and/or the second communication device 48). In still other example embodiments, the service platform 60 may not be included at all. In other words, in some embodiments, operations in accordance with an example embodiment may be performed at the mobile terminal and/or the second communication device 48 without any interaction with the network 50 and/or the service platform 60.
[0043] An example embodiment will now be described with reference to FIG. 3, in which certain elements of an apparatus for enabling the provision of multi-view face alignment are displayed. The apparatus of FIG. 3 may be employed, for example, on the service platform 60 or the mobile terminal 10 of FIG. 2. However, it should be noted that the apparatus of FIG. 3, may also be employed on a variety of other devices. Therefore, example embodiments should not be limited to application on devices such as the service platform 60 or mobile terminal 10 of FIG. 2. Alternatively, embodiments may be employed on a combination of devices including, for example, those listed above. Accordingly, some example embodiments may be embodied wholly at a single device (e.g., the service platform 60, the mobile terminal 10 or the second communication device 48) or by devices in a client/server relationship (e.g., the service platform 60 serving information to the mobile terminal 10 and/or the second communication device 48). Furthermore, it should be noted that the devices or elements described below may not be mandatory and thus some may be omitted in certain embodiments.
[0044] Referring now to FIG. 3, an apparatus 65 for enabling the provision of multi-view face alignment is provided. The apparatus 65 may include or otherwise be in communication with a processor 70, a user interface 72, a communication interface 74 and a memory device 76. The memory device 76 may include, for example, one or more volatile and/or non-volatile memories. In other words, for example, the memory device 76 may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processor 70). The memory device 76 may be configured to store information, data, applications, instructions or the like for enabling the apparatus to carry out various functions in accordance with example embodiments of the present invention. For example, the memory device 76 could be configured to buffer input data for processing by the processor 70. Additionally or alternatively, the memory device 76 could be configured to store instructions for execution by the processor 70.
[0045] The apparatus 65 may, in some embodiments, be a network device (e.g., service platform 60) or other devices (e.g., the mobile terminal 10 or the second communication device 48) that may operate independent of or in connection with a network. However, in some embodiments, the apparatus 65 may be instantiated at one or more of the service platform 60, the mobile terminal 10 and the second communication device 48. Thus, the apparatus 65 may be any computing device configured to employ an example embodiment of the present invention. However, in some embodiments, the apparatus 65 may be embodied as a chip or chip set (which may in turn be employed at one of the devices mentioned above). In other words, the apparatus 65 may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus 65 may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single "system on a chip." As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein. [0046] The processor 70 may be embodied in a number of different ways. For example, the processor 70 may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special -purpose computer chip, or the like. As such, in some embodiments, the processor 70 may include one or more processing cores configured to perform independently. A multi-core processor may enable multiprocessing within a single physical package. Additionally or alternatively, the processor 70 may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
[0047] in an example embodiment, the processor 70 may be configured to execute instructions stored in the memory device 76 or otherwise accessible to the processor 70. Alternatively or additionally, the processor 70 may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 70 may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor 70 is embodied as an ASIC, FPGA or the like, the processor 70 may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, when the processor 70 is embodied as an executor of software instructions, the instructions may specifically configure the processor 70 to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor 70 may be a processor of a specific device (e.g., a mobile terminal or network device) adapted for employing an embodiment of the present invention by further configuration of the processor 70 by instructions for performing the algorithms and/or operations described herein. The processor 70 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 70.
[0048] Meanwhile, the communication interface 74 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 50. In this regard, the communication interface 74 may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. In some environments, the communication interface 74 may alternatively or also support wired communication. As such, for example, the communication interface 74 may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
[0049] The user interface 72 may be in communication with the processor 70 to receive an indication of a user input at the user interface 72 and/or to provide an audible, visual, mechanical or other output to the user. As such, the user interface 72 may include, for example, a keyboard, a mouse, a joystick, a display, a touch screen(s), touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms. In an example embodiment in which the apparatus 65 is embodied as a server or some other network devices, the user interface 72 may be limited, or eliminated. However, in an embodiment in which the apparatus 65 is embodied as a communication device (e.g., the mobile terminal 10), the user interface 72 may include, among other devices or elements, any or all of a speaker, a microphone, a display, and a keyboard or the like. In this regard, for example, the processor 70 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface, such as, for example, a speaker, ringer, microphone, display, and/or the like. The processor 70 and/or user interface circuitry comprising the processor 70 may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor 70 (e.g., memory device 76, and or the like).
[0050] In an example embodiment, the processor 70 may be embodied as, include or otherwise control a face detector 80, a pose classifier 82, and a joint model manager 84. As such, in some embodiments, the processor 70 may be said to cause, direct or control the execution or occurrence of the various functions attributed to the face detector 80, the pose classifier 82, and the joint model manager 84, respectively, as described herein. The face detector 80, the pose classifier 82, and the joint model manager 84 may each be any means such as a device or circuitry operating in accordance with software or otherwise embodied in hardware or a combination of hardware and software (e.g., processor 70 operating under software control, the processor 70 embodied as an ASIC or FPGA specifically configured to perform the operations described herein, or a combination thereof) thereby configuring the device or circuitry to perform the corresponding functions of the face detector 80, the pose classifier 82, and the joint model manager 84, respectively, as described herein. Thus, in examples in which software is employed, a device or circuitry (e.g., the processor 70 in one example) executing the software forms the structure associated with such means.
[0051] The face detector 80 may be configured to identify and/or isolate faces within an image (or series of images in the case of video analysis). The face detector 80 may therefore be configured to detect face candidates in images without regard to pose. Face detections may be determined based on employing statistical analysis methods of any kind. In an example embodiment, the face detector 80 may employ Adaboost (adaptive boosting) based statistical methods or other statistical methods (e.g., Gentle-Boost, RealBoost, FloatBoost, and/or the like) to provide coarse pose estimation. The coarse pose estimation may be provided to the pose classifier 82 to perform more detailed pose estimation.
[0052J The pose classifier 82 may be configured to classify the pose (e.g., a coarse pose determined by the face detector 80) into one of a plurality of categories. In an example embodiment, the pose classifier 82 may be configured to classify the pose as following into one of the following five categories in terms of degrees in yaw, [-60, -30], [-30, -10], [-10, 10], [10, 30] and [30, 60]. An AAM may be trained (e.g., via the joint model manager 84) for each of three respective different ranges (e.g., [-60, -20], [-20, 20] and [20, 60] degrees, respectively). After pose classification, joint AAMs may be run based on which of the categories the pose classifier 82 selects for classifying the pose. In this regard, the joint model manager 84 may be employed after pose classification to run joint AAMs using selected ones among the trained AAMs based on the classification of the pose relative to the categories above as described in greater detail below.
[0053] In an example embodiment, the joint model manager 84 may be configured to select a joint AAM application routine based on the classification of the pose. In some cases, there may be two joint AAM application routines that may be selected for application based on the classification of the pose. A first joint AAM application routine may include running joint AAMs where one (e.g., the same) AAM is used, but the joint AAMs have different initial solutions. As an alternative, a second joint AAM application routine may include running joint AAMs where two different AAMs are employed- In this regard, for example, if the pose is classified as [-60, -30], [-10, 10], or [30, 60], the joint model manager 84 may be configured to employ the first joint AAM application routine with respect to the one of the AAMs that corresponds to the pose classification. On the other hand, for example, if the pose is classified as [-30, -10] or [10, 30], then the second joint AAM application routine may be employed by jointing the first two AAMs (e.g., the AAMs corresponding to [-60, -20] and [-20, -20]) or by jointing the last two AAMs (e.g., the AAMs corresponding to [-20, 20] and [20, 60]). All of the AAMs may be initialized during face detection.
[0054J In an example embodiment, the joint model manager 84 may be configured to select one of the jointly run AAMs (e.g., the one with the lowest error) and then continue to run the selected AAM until convergence in order to achieve face alignment. In an example embodiment, the joint model manager 84 may be configured to perform face alignment using more than one AAM jointed by shape constraint items that may be fitted by employing a project-out inverse compositional (POIC) algorithm. Since POIC fitting uses a Gauss-Newton iteration algorithm to find a minimum, a Hessian matrix may be approximated by the product of a Jacobi matrix and its transpose. When the solution is close to a minimum, the Jacobi matrix may approach zeros, which may lead to large parameter increments for certain parameters. To avoid having errors increase or oscillate, a maximum iteration time 2} may be introduced. When the maximum iteration time is reached, the parameters with minimal error among each time are considered to be optimal. After 2}, the shape of both AAMs may be close to each other and the one with less error may be chosen to be run Ts times.
[0055] FIG. 4 illustrates a flowchart indicating operation of the apparatus 65 according to an example embodiment for face alignment. As shown in FIG. 4, image data may be input into the system for face detection at operation 100. In an example embodiment, the face detection process may employ Adaboost in order to give rough position and pose of the face. After the rough pose is determined, pose classification may be performed (e.g., into one of the five categories described above) at operation 1 10. Based on the category of classification, joint AAMs may be run with different initial solutions at operation 120 or joint AAMs may be run with different AAMs at operation 130. One AAM may then be selected at operation 140 (e.g., based on having less error). At operation 150, the selected AAM may then continue to be run until an optimal output is achieved. [0056] There are a number of situations in which the apparatus 65 may operate to perform face alignment according to example embodiments. One example that employs 2D AAMs will be described hereafter. In this regard, the shape of human face can be represented by some key feature points with their locations
Figure imgf000019_0001
as shown in the example of FIG.5 in which 88 manually labeled feature points are identified. In this example, / denotes the image. ¾,(»„)>¾ ef denotes the mean appearance in a normalized template T. Ai{ult),i = \,....,l denotes the appearance variation mode, which may be obtained by principal component analysis (PC A) in training process and 2, denotes the appearance coefficient p and q denote the shape and pose parameters, respectively, W{x;p) and N{x;q) denote the warp caused by shape variations and pose variations, respectively. Then the goal of AAM is to find the optimal p, q and t to minimize the following error:
J =∑[Ao ) +∑44 («-.. )-/(ΛΤ(Ι («. ; p);q))} (1 )
Due to project-out principles, minimizing equation (1) is equivalent to minimizing equation (2)
Figure imgf000019_0002
where span (4) denotes the linear subspace spanned by a collection of vectors { }^ ; and denotes the orthogonal complement of span (4). The above problem can be solved very efficiently by POIC fitting techniques with Gauss-Newton iteration as follows, since most of the computation in can be moved to a pre-computation step. mmJ=∑ (N(lV(Un-, py,Aq))-!(N(lV(^,p) q))) (3)
N{W(Up^ ,q^)^N(W(npy,q) N{W((<Ap ,Aq)-S (4)
[0057] In an example embodiment in which 2D joint-AAMs are employed, more than one AAM may be employed. Superscript ω may be used to distinguish between the AAMs. First, consider two AAMs. The error function may include three parts, for example, appearance error ji.m) of b0th AAMs and shape error Js between them.
Figure imgf000020_0001
a f r-> (5)
Figure imgf000020_0002
N [fV{v^ ; pu) );qli1 ) and v(w(v<-,; /j(I>);?(-:i ) may represent the warped shapes by both AAMs, respectively. AT is a positive empirical coefficient, which may be related to the scale of the face. In some embodiments, K may be set to the quotient of 0.02 and the initial scale. Comparing to equation (1), the dimension of parameters in equation (8) may be doubled (if ρω and pi2) have the same length, and so do qw and q(2) ). Thus, the fitting of equation (8) may be performed in a higher dimension parameter space, which may assist in accessing the optimal solution. To solve equation (5), POIC technique may be employed. For the third item, the parameter updating mode may be forward additive: );*ω +< 2' )f (6)
Figure imgf000020_0003
1,2 (7)
To translate inverse compositional parameter increments A to forward additive parameter increments Δ& , Δ^,ρ "* and q may be approximated in a first order as J^{ p("" and /^Ag'""' , respectively, where J^- and are square matrices. Then
N (w (lipin> + J ApM ); qM + N ( W (14 pM ) ; qM ) ° <V (>F (_} ApM ) ; Aq{"' )"' (8) so that the optimal (pM, qi∞ ) may be obtained efficiently by performing a POIC fitting technique and Gauss-Newton iteration algorithm on (5).
[0058] ¾ an example embodiment, AAM( 1> and AAM(2) from equation (5) may be the same AAM with different initial solutions. For example, in a single-view case where the yaw of an input face is estimated roughly in the [-10, 10] degree category, a frontal AAM may be chosen. To improve the convergence, two initial shapes can be chosen with some differences in rigid or non-rigid shape variation. If K is suitably chosen, at the end of iteration, p > ( qw ) is close to or equal to ρω ( q(2> ). If two initial solutions are the same one, then J is zero and equation (5) is equivalent to equation (1). On the other hand, in an alternative embodiment, AAM(1) and AAMt2) may be selected to be two different AAMs. in a multi-view case, for example, three AAMs have been trained to handle [-60 -20], [-20 20] and [20 60] degrees in yaw, respectively. If the yaw of input face is roughly located in the range of [-30 -10] degrees, then the first two AAMs may be selected. Due to the small convergence aperture problem, both of the AAMs may become divergent, since their initial yaws are -40 and 0 degrees, respectively. However, using both of the AAMs together, the accuracy may be improved. Thus, for example, in order to extend equation 5) to joint M AAMs, s can be rewritten as follows:
Figure imgf000021_0001
where ∑A'(,,'(,, " '/'w);?w) (10} is the mean shape of both AAMs. Thus, if Af AAMs are jointed, the error can be formulated as
Figure imgf000021_0002
In this case, some of M AAMs may be the same and others may be different. The POIC fitting technique with Gauss-Newton iteration algorithm can also be used to minimize equation (11) relatively efficiently.
[0059] To utilize joint AAMs for 2D and 3D cases, a 3D shape model may be introduced to impose stronger constraints on the shape. Both the robustness and convergence speed may be improved. In an example embodiment, an extended 3D Candide model may be used as 3D shape constraints. In different views, the maps of feature points between 2D and 3D shape models may be designed as shown in FIG. 6. In this regard, FIG. 6 shows examples of correspondences between a 2D and 3D shape model for frontal, left and right models. A 2D+3D AAM may minimize the error defined in equation (12). The subscript ' may be used to indicate a 3D shape model with its parameters. = Σ\ ^ ∑λΛ Μ- ΐ {Χ{ν{<^ Ρ)-, )) \ + K∑∑ {N(W (vn,P);q)- N- (fV (y a ; p );q ))' (12)
Ml tt-I W" (va ; p ) represents deformation caused by shape and animation units. N' (vn ' ;q ) is the rigid transform with rotation, translation and scale, q is 3D pose parameter vector containing six independent elements. Similar to equation (5), equation (12) can be minimized by POIC fitting algorithm efficiently. In 2D+3D joint AAMs, more than one 2D AAM may be employed. Since a different view AAM has a different relation with a 3D shape model, a different vn may be selected to correspond to a different AAM. However, since the shape variation in the 3D model is only caused by the non-rigid variation of face, e.g., expression, the parameters of 3D model, p and , may be unique. Then the error can be represented as follows:
Figure imgf000022_0001
Comparing to equation (12), the parameters become /?", qM , p and q from p , q , p and q . Comparing to equation ( 1 1), the mean is replaced by the 3D shape
Figure imgf000022_0002
N' {fV' (ν^'; p ) ;q ) . This is a stronger shape constraint which helps to generate valid facial shape during the iterations. Similarly, the 2D AAMs may be different from each other or the same one with different initial solutions. Using a POIC fitting algorithm, both 2D and 3D model parameters may be optimized synchronously. The updating rule of p and q is forward additive, for example,
p - p ÷Ap', q <^ q'÷Aq' (14) and that of p(m) , qWl is the same as equations (7) and (8).
[0060] Accordingly, in an example embodiment, a multi-view face alignment strategy may be developed based on joint-AAM in order to handle the poses in various range categories between [-60, 60] degrees in yaw. Three different AAMs are trained to handle [-60, -20], [-20, 20] and [20, 60] degrees, respectively. For a given face image, Adaboost based multi-view face detection may be employed to give the rough position and pose of the face. Then the face is classified into one of 5 categories, e.g., [-60, -30], [-30, -10], [-10, 10], [10, 30], [30, 60] degrees in yaw. If the face falls into the 1st, 3rd or 5th category (e.g., [-60, -30], [-10, 10], or [30, 60] degrees in yaw), then one of the above 3 AAMs is selected with different initial solutions. If the face falls into one of the other two categories (e.g., [-30, -10] or [10, 30] degrees in yaw), then the first two or last two AAMs are jointed. All AAMs are initialized by face detection box.
[0061] Since POIC fitting uses Gauss-Newton iteration algorithm to find the minimum, the Hessian matrix is approximated by the product of Jacobi matrix and its transpose. When the solution is close to a minimum, Jacobi matrix comes near zeros, which lead to large parameter increments Ap and &q . Thus, the error may increase or starts to oscillate. Accordingly, a maximal iteration time 7] is assigned so that when the iteration ends at 7], parameters with minimal error among each time may be regarded as being optimal. After Tj, the shape of both AAMs may be close to each other and one AAM may be selected based on having less error and the selected AAM may be run for Γ5 times. Similarly, parameters may be chosen with minimal error as the optimal solution p and q .
[0062] Some example embodiments may therefore be used in connection with 2D joint- AAM face alignment such that more than one AAM may be jointed by some shape constraint items to be fitted by an efficient POIC algorithm. For 2D+3D joint AAM face alignment, a 3D shape model may be introduced to impose stronger shape constraints on the joint AAM that can also be fitted by the POIC algorithm. For a single view face alignment in a static image, jointed AAMs may be the same, but may have different initial solutions to improve robustness for poor initializations or exaggerative expressions. FIG. 7, which includes FIGS. 7 A and 7B, shows an example of single view face alignment as described above.
[0063] For multi-view face alignment in static images, two jointed AAMs may be trained from different views to improve the convergence rate and accuracy of face alignment when the view of the face is in the middle of the views of both AAMs. FIG. 8, which includes FIGS. 8A and 8B, shows an example in which two different AAMs are jointed for a multi-view case. For video streams, 2D+3D joint AAM may be employed to make a model transition smoother and produce continuous pose variation as shown in the example of FIG. 9.
[0064] Thus, for example, in a single view case, if the face is classified into a category corresponding to [-60, -30], [-10, 10] or [30, 60] degrees in yaw, then a corresponding one of the three trained AAMs is selected. To make two initial solutions different, for example, the AAMs may be set with different scales, 0.9 ¾ and 1.1 s0. The initial scale s0 , used for a single AAM, may be estimated by the width of face box. FIG. 10, which includes FIGS. 10A, 10B, IOC and 10D, illustrates a comparison of single AAM and joint AAM for an exaggerative expression. In FIG. 10A., the expression is exaggerative with a large open mouth. Using one initial shape shown in FIG. 10A, the eyes and eyebrows are mismatched by the output shape of FIG. 10B. However, using two initial shapes in FIG. IOC, both output shapes are acceptable in FIG. 10D. 10065] FIG. 1 1, which includes FIGS. 11A, I IB, 1 1C and 11D, illustrates a comparison of single AAM to joint AAM for an example with incorrect initialization. In FIG. 1 1 A, the initial shape is too high due to an incorrect face detection box. All the key facial organs are mismatched by single AAM as shown in FIG. 1 IB. By jointing two different shapes in FIG. 1 1C, the iteration process terminates at a good solution where the closed eyes and open mouth are correctly matched in FIG. 1 1 D.
[0066] In a multi-view case, if the face is classified into a category corresponding to [-30, - 10] (or [10, 30]) degrees in yaw, then the left (or right) and frontal AAMs are jointed. An example is given in FIG. 12, which includes FIGS 12A to 12L. The yaw is about -22 degrees. Thus, if the frontal AAM is employed, the shape is initialized as shown in FIG 12A. After convergence, the shape is skewed and part of background is covered as shown in FIG. 12B. Even under the shape constraint of a 3D shape model, the solution may still be relatively poor as shown in FIG. 12C. Using two different AAMs according to an example embodiment, the left AAM (e.g., corresponding to [-60, -20]) may be aligned to the image. The initialization may be much better than that of frontal AAM as shown in FIG. 12D and thus the aligned shape may be relatively closer to actual. From FIGS. 12E and 12F, it may be appreciated that the face contour is still not matched and part of ear is covered. Alternatively, equations (5) and (14) may be used to joint both frontal and left AAMs together, without and with the 3D shape constraint, respectively, as shown in FIGS. 12G and 12 J. The penalty on the shape differences make the frontal shape turn left and the left one turn right. They nearly overlap when the optimal solution is reached. From FIGS. 12H and 12K, both shapes can be seen to be aligned almost perfectly to the face. Then using a single AAM selected as having the least error, the final results are shown in FIGS. 121 and 12L, respectively.
[0067] Accuracy of different types of AAMs may be measured by comparing the aligned shape with a prior manually labeled shape. The distance between each pair of aligned and labeled feature points in each testing image may then be recorded. To unify the scale, all the distances may be rescaled into a warped template of each AAM. Some example embodiments may increase the robustness and accuracy of single view face alignment under poor initialization or for exaggerated expressions. Some example embodiments may also increase the robustness and accuracy of multi-view face alignment when the face view is between the views of two AAMs. Some example embodiments may also smooth the model transition in 2D+3D AAM based real time facial animation capture in a video stream.
[0068] FIG. 13 is a flowchart of a system, method and program product according to example embodiments of the invention. It will be understood that each block of the flowchart, and combinations of blocks in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device of an apparatus employing an embodiment of the present invention and executed by a processor in the apparatus. As will be appreciated, any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus embody a mechanism for implementing the functions specified in the flowchart block(s). These computer program instructions may also be stored in a computer-readable storage memory (as opposed to a transmission medium such as a carrier wave or electromagnetic signal) that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the flowchart block(s). The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart block(s).
[0069] Accordingly, blocks of the flowchart support combinations of means for performing the specified functions, combinations of operations for performing the specified functions and program instructions for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions or operations, or combinations of special purpose hardware and computer instructions.
[0070] In this regard, one embodiment of a method according to an example embodiment as shown in FIG. 13 may include causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data at operation 200, causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories at operation 210 and employing a selected joint model application routine at operation 220. The selected joint model application routine may be selected based on the classification of the pose. The method may further include employing one model among models employed in the selected joint model application routine to perform face alignment at operation 230. This employment may continue until convergence.
[0071] In some embodiments, certain ones of the operations above may be modified or further amplified as described below. Moreover, in some embodiments additional optional operations may also be included (an example of which is shown in dashed lines in FIG. 13). It should be appreciated that each of the modifications, optional additions or amplifications below may be included with the operations above either alone or in combination with any others among the features described herein. In some embodiments, the method may further include selecting the one model based on the one model having less error among the models employed in the selected joint model application routine at operation 225. In an example embodiment, causing performance of pose classification comprises classifying the pose of the face into one of five categories in terms of degrees in yaw. in some embodiments, employing the selected joint model application routine comprises employing a first joint active appearance model (AAM) application routine including running joint models where one AAM is used different initial solutions or employing a second joint AAM application routine including running joint models where two different AAMs are employed. In some cases, employing the first joint AAM application routine may include employing one instance of a frontal AAM, a left AAM or a right AAM jointly with another instance of the frontal AAM, the left AAM or the right AAM with a different initial solution in response to the pose being classified in a category corresponding to angles corresponding to frontal, left and right poses (e.g., corresponding to categories with poses in the ranges of [-60, -30], [-10, 10], or [30, 60] degrees of yaw). In another example case, employing the second joint AAM application routing comprises employing a frontal AAM and either a left AAM or a right AAM in response to the pose being classified into a category corresponding to angles between the frontal, left and right poses (e.g., corresponding to categories with poses in the ranges of [-30, -10] or [10, 30] degrees of yaw). In an example embodiment, employing the one model may include running a selected active appearance model (AAM) among joint AAMs for a maximum iteration time to perform face alignment. In some embodiments, employing the selected joint model application routine may include imposing a shape constraint with respect to joint active appearance models (AAMs).
[0072] In an example embodiment, an apparatus for performing the method of FIG. 13 above may comprise a processor (e.g., the processor 70) configured to perform some or each of the operations (200-230) described above. The processor 70 may, for example, be configured to perform the operations (200-230) by performing hardware implemented logical functions, executing stored instructions, or executing algorithms for performing each of the operations. Alternatively, the apparatus may comprise means for performing each of the operations described above. In this regard, according to an example embodiment, examples of means for performing operations 200-230 may comprise, for example, respective ones of the face detector 80, the pose classifier 82, and the joint model manager 84. Additionally or alternatively, at least by virtue of the fact that the processor 70 may be configured to control or even be embodied as the face detector 80, the pose classifier 82, and the joint model manager 84, the processor 70 and/or a device or circuitry for executing instructions or executing an algorithm for processing information as described above may also form example means for performing operations 200- 230.
[0073] An example of an apparatus according to an example embodiment may include at least one processor and at least one memory including computer program code. The at least one memory and the computer program code may be configured to, with the at least one processor, cause the apparatus to perform the operations 200-230 (with or without the modifications and amplifications described above in any combination).
[0074] An example of a computer program product according to an example embodiment may include at least one computer-readable storage medium having computer-executable program code portions stored therein. The computer-executable program code portions may include program code instructions for performing operation 200-230 (with or without the modifications and amplifications described above in any combination). [0075] In some cases, the operations (200-230) described above, along with any of the modifications may be implemented in a method that involves facilitating access to at least one interface to allow access to at least one service via at least one network. In such cases, the at least one service may be said to perform at least operations 200 to 230.
[0076] Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

WHAT IS CLAIMED IS;
1. A method comprising:
causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data;
causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories;
employing a selected joint model application routine, the selected joint model application routine being selected based on the classification of the pose; and
employing one model among models employed in the selected joint model application routine to perform face alignment.
2. The method of claim 1, wherein causing performance of pose classification comprises classifying the pose of the face into one of five categories in terms of degrees in yaw.
3. The method of claim 1, wherein employing the selected joint model application routine comprises employing a first joint active appearance model (AAM) application routine including running joint models where one AAM is used different initial solutions or employing a second joint AAM application routine including running joint models where two different AAMs are employed.
4. The method of claim 3, wherein employing the first joint AAM application routine comprises employing one instance of a frontal AAM, a left AAM or a right AAM jointly with another instance of the frontal AAM, the left AAM or the right AAM with a different initial solution in response to the pose being classified in a category corresponding to angles corresponding to frontal, left and right poses.
5. The method of claim 3, wherein employing the second joint AAM application routing comprises employing a frontal AAM and either a left AAM or a right AAM in response to the pose being classified into a category corresponding to poses in between frontal, left and right poses.
6. The method of claim 1, further comprising selecting the one model based on the one model having less error among the models employed in the selected joint model application routine.
7. The method of claim 1, wherein employing the one model comprises running a selected active appearance model (AAM) among joint AAMs for a maximum iteration time to perform face alignment.
8. The method of claim 1, wherein employing the selected joint model application routine comprises imposing a shape constraint with respect to joint active appearance models (AAMs).
9. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least:
causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data;
cause performance of pose classification to classify the pose of the face into one of a plurality of pose categories;
employ a selected joint model application routine, the selected joint model application routing being selected based on the classification of the pose; and
employ one model among models employed in the selected joint model application routine to perform face alignment.
10. The apparatus of claim 9, wherein the at least one memory and computer program code are configured to, with the at least one processor, cause the apparatus to cause performance of pose classification by classifying the pose of the face into one of five categories in terms of degrees in yaw.
1 1. The apparatus of claim 9, wherein the at least one memory and computer program code are configured to, with the at least one processor, cause the apparatus to employ the selected joint model application routine by employing a first joint active appearance model (AAM) application routine including running joint models where one AAM is used different initial solutions or by employing a second joint AAM application routine including running joint models where two different AAMs are employed.
12. The apparatus of claim 1 1, wherein the at least one memory and computer program code are further configured to, with the at least one processor, cause the apparatus to employ the first joint AAM application routine by employing one instance of a frontal AAM, a left AAM or a right AAM jointly with another instance of the frontal AAM, the left AAM or the right AAM with a different initial solution in response to the pose being classified in a category corresponding to angles corresponding to frontal, left and right poses.
13. The apparatus of claim 11, wherein the at least one memory and computer program code are further configured to, with the at least one processor, cause the apparatus to employ the second joint AAM application routing by employing a frontal AAM and either a left AAM or a right AAM in response to the pose being classified into a category corresponding to poses in between frontal, left and right poses.
14. The apparatus of claim 9, wherein the at least one memory and computer program code are further configured to, with the at least one processor, cause the apparatus to select the one model based on the one model having less error among the models employed in the selected joint model application routine.
15. The apparatus of claim 9, wherein the at least one memory and computer program code are configured to, with the at least one processor, cause the apparatus to employ the one model by running a selected active appearance model (AAM) among joint AAMs for a maximum iteration time to perform face alignment.
16. The apparatus of claim 9, wherein the at least one memory and computer program code are configured to, with the at least one processor, cause the apparatus to employ the selected joint model application routine by imposing a shape constraint with respect to joint active appearance models (AAMs).
17. A computer program product comprising a computer-readable medium bearing computer program code embodied therein for use with a computer, the computer program code comprising code for:
causing performance of face detection with respect to image data to determine a rough position and pose of a face in the image data;
causing performance of pose classification to classify the pose of the face into one of a plurality of pose categories;
employing a selected joint model application routine, the selected joint model application routine being selected based on the classification of the pose; and
employing one model among models employed in the selected joint model application routine to perform face alignment.
18. The computer program product of claim 17» wherein program code for employing the selected joint model application routine includes instructions for employing a first joint active appearance model (AAM) application routine including running joint models where one AAM is used different initial solutions or employing a second joint AAM application routine including running joint models where two different AAMs are employed.
19. The computer program product of claim 17, further comprising program code for selecting the one model based on the one model having less error among the models employed in the selected joint model application routine,
20. The computer program product of claim 17, wherein program code for employing the one model includes instructions for running a selected active appearance model (AAM) among joint AAMs for a maximum iteration time to perform face alignment.
PCT/CN2011/000616 2011-04-08 2011-04-08 Method, apparatus and computer program product for providing multi-view face alignment WO2012135979A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/000616 WO2012135979A1 (en) 2011-04-08 2011-04-08 Method, apparatus and computer program product for providing multi-view face alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/000616 WO2012135979A1 (en) 2011-04-08 2011-04-08 Method, apparatus and computer program product for providing multi-view face alignment

Publications (1)

Publication Number Publication Date
WO2012135979A1 true WO2012135979A1 (en) 2012-10-11

Family

ID=46968520

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/000616 WO2012135979A1 (en) 2011-04-08 2011-04-08 Method, apparatus and computer program product for providing multi-view face alignment

Country Status (1)

Country Link
WO (1) WO2012135979A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104240274A (en) * 2014-09-29 2014-12-24 小米科技有限责任公司 Face image processing method and device
CN104809468A (en) * 2015-04-20 2015-07-29 东南大学 Multi-view classification method based on indefinite kernels
CN105488371A (en) * 2014-09-19 2016-04-13 中兴通讯股份有限公司 Face recognition method and device
CN107808111A (en) * 2016-09-08 2018-03-16 北京旷视科技有限公司 For pedestrian detection and the method and apparatus of Attitude estimation
CN108898115A (en) * 2018-07-03 2018-11-27 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN113326818A (en) * 2021-08-02 2021-08-31 湖南高至科技有限公司 Method, system, device and medium for identifying massive human faces in video coding
WO2022213349A1 (en) * 2021-04-09 2022-10-13 鸿富锦精密工业(武汉)有限公司 Method and apparatus for recognizing face with mask, and computer storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731416A (en) * 2005-08-04 2006-02-08 上海交通大学 Method of quick and accurate human face feature point positioning
CN1786980A (en) * 2005-12-08 2006-06-14 上海交通大学 Melthod for realizing searching new position of person's face feature point by tow-dimensional profile
CN100349173C (en) * 2005-12-15 2007-11-14 上海交通大学 Method for searching new position of feature point using support vector processor multiclass classifier
CN100383807C (en) * 2006-06-22 2008-04-23 上海交通大学 Feature point positioning method combined with active shape model and quick active appearance model
CN100389430C (en) * 2006-06-13 2008-05-21 北京中星微电子有限公司 AAM-based head pose real-time estimating method and system
CN101593272A (en) * 2009-06-18 2009-12-02 电子科技大学 A kind of human face characteristic positioning method based on the ASM algorithm
US20100214290A1 (en) * 2009-02-25 2010-08-26 Derek Shiell Object Model Fitting Using Manifold Constraints

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731416A (en) * 2005-08-04 2006-02-08 上海交通大学 Method of quick and accurate human face feature point positioning
CN1786980A (en) * 2005-12-08 2006-06-14 上海交通大学 Melthod for realizing searching new position of person's face feature point by tow-dimensional profile
CN100349173C (en) * 2005-12-15 2007-11-14 上海交通大学 Method for searching new position of feature point using support vector processor multiclass classifier
CN100389430C (en) * 2006-06-13 2008-05-21 北京中星微电子有限公司 AAM-based head pose real-time estimating method and system
CN100383807C (en) * 2006-06-22 2008-04-23 上海交通大学 Feature point positioning method combined with active shape model and quick active appearance model
US20100214290A1 (en) * 2009-02-25 2010-08-26 Derek Shiell Object Model Fitting Using Manifold Constraints
CN101593272A (en) * 2009-06-18 2009-12-02 电子科技大学 A kind of human face characteristic positioning method based on the ASM algorithm

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105488371A (en) * 2014-09-19 2016-04-13 中兴通讯股份有限公司 Face recognition method and device
CN104240274A (en) * 2014-09-29 2014-12-24 小米科技有限责任公司 Face image processing method and device
CN104240274B (en) * 2014-09-29 2017-08-25 小米科技有限责任公司 Face image processing process and device
CN104809468A (en) * 2015-04-20 2015-07-29 东南大学 Multi-view classification method based on indefinite kernels
CN107808111A (en) * 2016-09-08 2018-03-16 北京旷视科技有限公司 For pedestrian detection and the method and apparatus of Attitude estimation
CN108898115A (en) * 2018-07-03 2018-11-27 北京大米科技有限公司 data processing method, storage medium and electronic equipment
CN108898115B (en) * 2018-07-03 2021-06-04 北京大米科技有限公司 Data processing method, storage medium and electronic device
WO2022213349A1 (en) * 2021-04-09 2022-10-13 鸿富锦精密工业(武汉)有限公司 Method and apparatus for recognizing face with mask, and computer storage medium
CN113326818A (en) * 2021-08-02 2021-08-31 湖南高至科技有限公司 Method, system, device and medium for identifying massive human faces in video coding
CN113326818B (en) * 2021-08-02 2021-09-24 湖南高至科技有限公司 Method, system, device and medium for identifying massive human faces in video coding

Similar Documents

Publication Publication Date Title
US8917911B2 (en) Method and apparatus for local binary pattern based facial feature localization
WO2012135979A1 (en) Method, apparatus and computer program product for providing multi-view face alignment
US11720994B2 (en) High-resolution portrait stylization frameworks using a hierarchical variational encoder
KR101643573B1 (en) Method for face recognition, recording medium and device for performing the method
US20120321193A1 (en) Method, apparatus, and computer program product for image clustering
Vojir et al. Robust scale-adaptive mean-shift for tracking
US8718324B2 (en) Method, apparatus and computer program product for providing object tracking using template switching and feature adaptation
US9020186B2 (en) Method and apparatus for detecting object using volumetric feature vector and 3D haar-like filters
US9575566B2 (en) Technologies for robust two-dimensional gesture recognition
Tie et al. Automatic landmark point detection and tracking for human facial expressions
US8965051B2 (en) Method and apparatus for providing hand detection
US9196055B2 (en) Method and apparatus for providing a mechanism for gesture recognition
JP2016535353A (en) Object detection and segmentation method, apparatus, and computer program product
US20140294360A1 (en) Methods and systems for action recognition using poselet keyframes
WO2023284182A1 (en) Training method for recognizing moving target, method and device for recognizing moving target
US20220292877A1 (en) Systems, methods, and storage media for creating image data embeddings to be used for image recognition
Chang et al. 2d–3d pose consistency-based conditional random fields for 3d human pose estimation
WO2012140315A1 (en) Method, apparatus and computer program product for providing incremental clustering of faces in digital images
US8610831B2 (en) Method and apparatus for determining motion
WO2013079772A1 (en) Method, apparatus and computer program product for classification of objects
US20140314273A1 (en) Method, Apparatus and Computer Program Product for Object Detection
Ma et al. A local-global coupled-layer puppet model for robust online human pose tracking
US9952671B2 (en) Method and apparatus for determining motion
WO2012131149A1 (en) Method apparatus and computer program product for detection of facial expressions
Zhang et al. Unsupervised segmentation of highly dynamic scenes through global optimization of multiscale cues

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11863040

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11863040

Country of ref document: EP

Kind code of ref document: A1