WO2012139273A1 - Method of detecting facial attributes - Google Patents

Method of detecting facial attributes Download PDF

Info

Publication number
WO2012139273A1
WO2012139273A1 PCT/CN2011/072597 CN2011072597W WO2012139273A1 WO 2012139273 A1 WO2012139273 A1 WO 2012139273A1 CN 2011072597 W CN2011072597 W CN 2011072597W WO 2012139273 A1 WO2012139273 A1 WO 2012139273A1
Authority
WO
WIPO (PCT)
Prior art keywords
facial
image
facial image
component
local
Prior art date
Application number
PCT/CN2011/072597
Other languages
French (fr)
Inventor
Jianguo Li
Tao Wang
Yangzhou Du
Qiang Li
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to CN201180070557.7A priority Critical patent/CN103503029B/en
Priority to PCT/CN2011/072597 priority patent/WO2012139273A1/en
Priority to EP20110863314 priority patent/EP2697775A4/en
Priority to US13/997,310 priority patent/US8805018B2/en
Priority to TW101112312A priority patent/TWI470563B/en
Publication of WO2012139273A1 publication Critical patent/WO2012139273A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/164Detection; Localisation; Normalisation using holistic features

Definitions

  • the present disclosure generally relates to the field of image processing. More particularly, an embodiment of the invention relates to facial attributes detection processing executed by a processor in a processing system for analyzing facial images.
  • Facial attributes may include whether a face is smiling or not, whether a face is the face of a man or a woman, whether the eyes are closed or not, or whether a face belongs to a child, young adult, middle aged adult, or a senior citizen. Other facial attributes may also be detected. Facial attribute detection has many usages.
  • smile detection can be used as a smile shutter activation in camera imaging, or used as a mood detection capability in an automated favorite advertising survey.
  • Gender detection can be used in automated advertising selection for smart digital signage.
  • Facial attribute detection can also be used in other areas such as video surveillance, visual search, and content analysis, etc. Thus, fast and efficient techniques for detection of facial attributes in facial images are desired.
  • Figure 1 is a diagram of a facial image processing system according to an embodiment of the present invention.
  • Figure 2 is a diagram of facial analysis processing according to an embodiment of the present invention.
  • Figure 3 is an example of detected facial landmarks in a facial image according to an embodiment of the present invention.
  • Figure 4 is an example of alignment and normalization of a selected portion of an example facial image according to an embodiment of the present invention.
  • Figure 5 is an example of selected local regions for smile detection according to an embodiment of the present invention.
  • Figure 6 is an example of selected local regions for gender detection according to an embodiment of the present invention.
  • Figure 7 is a diagram of the structure of a multi-layer perceptron (MLP) according to an embodiment of the present invention.
  • MLP multi-layer perceptron
  • Figure 8 is a diagram of a computing model at each node of the MLP according to an embodiment of the present invention.
  • FIGS 9 and 10 illustrate block diagrams of embodiments of processing systems, which may be utilized to implement some embodiments discussed herein.
  • Embodiments of the present invention provide for fast and accurate detection of facial attributes (such as smile/mood, gender, age, blink, etc.) in an image, which may be used in camera imaging, digital signage and other applications.
  • Embodiments of the present invention achieve state-of-the-art accuracy (96% in smile detection, 94% in gender detection, etc., in one test), require very small memory usage (less than 400KB), and execute quickly (more than 800 faces per second detection on a processing system having a 1.6GHz Atom processor in one test).
  • Embodiments of the present invention comprise a unified framework for facial attribute detection.
  • Embodiments provide at least several advantages over known facial attribute detection implementations, such as for smile detection and gender detection.
  • First, embodiments of the present invention provide for a unified and general technique which works well for detection of all facial attributes with high accuracy, while the known techniques are only applicable for one specific facial attribute category.
  • Second, the present method has very small memory and computing resource requirements. Therefore, facial attribute detection according to embodiments may be applied to a variety of computing platforms from a personal computer (PC) to embedded devices.
  • PC personal computer
  • FIG. 1 is a diagram of a processing system 100 in accordance with some embodiments of the invention.
  • Processing system includes application 102, camera 104, and display 111.
  • the processing system may be a personal computer (PC), a laptop computer, a netbook, a tablet computer, a handheld computer, a smart phone, a mobile Internet device (MID), or any other stationary or mobile processing device.
  • the camera may be integral with the processing system.
  • the camera may be a still camera or a video camera. In other embodiments, the camera may be external to the processing system but communicatively coupled with the processing system.
  • images captured by a camera may be communicated over a network, or wired or wireless interface, to the processing system for analysis.
  • Application 102 may be an application program to be executed on the processing system.
  • the application program may be a standalone program, or a part of another program (such as a plug-in, for example), for a web browser, image processing application, game, or multimedia application, for example.
  • Application 102 may include facial analysis component 106 to analyze images captured by the camera to detect human faces.
  • application 102 and/or facial analysis component 106 may be implemented as a hardware component, firmware component, software component or combination of one or more of hardware, firmware, and/or software components, as part of processing system 100.
  • a user may operate processing system 100 to capture one or more images from camera 104.
  • the captured one or more images may be input to application 102 for various purposes.
  • Application 102 may pass the one or more images to facial analysis component 106 for determining facial characteristics in the one or more images.
  • facial analysis component 106 may detect facial attributes in the one or more images. Results of application processing, including facial analysis, may be shown on display 111.
  • FIG. 2 is a diagram of facial analysis processing 200 performed by facial analysis component 106 according to an embodiment of the present invention.
  • An image 202 may be input to the facial analysis process.
  • a face may be detected in the image by performing a face detection process by face detection component 205 to locate a face rectangle region for each detected face in the image.
  • facial landmarks may be detected in each face rectangle region by performing a facial landmark detection process by landmark detection component 207.
  • the facial landmarks comprise six points: the corners of the eyes and mouth.
  • the face rectangle region may be aligned and normalized by a face alignment process in alignment component 209 based at least in part on the detected facial landmarks to a predetermined size.
  • the resulting size of the aligned and normalized image may be less than the size of the facial image.
  • the size is 64 pixels by 64 pixels.
  • local features may be extracted from selected local regions of the normalized face images by extraction component 211.
  • a local region is a subset of a normalized face image.
  • a local region may be converted to a set of numbers by a transformation process.
  • a local binary patterns (LBP) process may be used as a feature extractor.
  • a histogram of orient gradient (HoG) process may be used as a feature extractor.
  • other feature extraction processes may be used.
  • a local feature is a set of these numbers that represents a local region.
  • each local region is represented by the extracted local features, and the extracted local features may be inputted to a weak classifier component based on a multi-layer perceptron (MLP) structure for prediction of a selected facial attribute by prediction component 213.
  • MLP multi-layer perceptron
  • output data from the weak classifier for each local feature may be aggregated as a final detection score 214 by aggregation component 215.
  • the final detection score may be an indication as to whether the facial attribute is detected in the image, the final detection score being in a range of 0.0 to 1.0, where the large the value, the higher the confidence in detecting a selected facial attribute.
  • Face detection processing 204 may be performed on an input image to detect a face in the image.
  • any known face detection process may be used as long as the face detection process produces a rectangle image of the detected face.
  • the input data comprises one or more 2D images.
  • the 2D image is a still image.
  • the 2D images comprise a sequence of video frames at a certain frame rate fps with each video frame having an image resolution (WxH).
  • WxH image resolution
  • an existing face detection approach following the well known Viola-Jones framework as shown in "Rapid Object Detection Using a Boosted Cascade of Simple Features,” by Paul Viola and Michael Jones, Conference on Computer Vision and Pattern Recognition, 2001 may be used.
  • embodiments of the present invention detect accurate positions of facial landmarks, such as the mouth, and corners of the eyes.
  • a landmark is a point of interest within a face.
  • the left eye, right eye, and nose base are all examples of landmarks.
  • facial landmark detection processing 206 may be performed as disclosed in the co-pending patent application entitled "Method of Facial Landmark Detection," by Ang Liu, Yangzhou Du, Tao Wang, Jianguo Li, Qiang Li, and Yimin Zhang, docket number P37155, commonly assigned to the assignee of the present application.
  • Figure 3 is an example of detected facial landmarks in a facial image according to an embodiment of the present invention.
  • the rectangle region denotes the detected face as a result of facial detection processing 204 and the diamonds indicate the six detected facial landmarks as a result of facial landmark detection processing 206.
  • face alignment processing 208 the detected faces may be converted to gray-scale, aligned and normalized to a pre-defined size, such as 64 x 64 (64 pixels in width and height). In an embodiment, the alignment may be done in the following steps:
  • the scaled image may be histogram equalized.
  • Figure 4 is an example of alignment and normalization of a selected portion of the facial image according to an embodiment of the present invention.
  • Local feature extraction processing 210 may be performed as follows. Local features may be extracted from local regions of aligned and normalized facial images.
  • the local features may be represented as a Local Binary Patterns (LBP) histogram.
  • LBP Local Binary Patterns
  • a suitable LBP technique is disclosed in "Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns," by T. Ojala, M. Pietkainen, and T. Maenpaa, in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2002, pages 971-987.
  • the local features may be represented as a Histogram of oriented Gradients (HoG).
  • HoG Histogram of oriented Gradients
  • a suitable HoG technique is disclosed in "Histograms of Oriented Gradients for Human Detection," by N. Dala and B. Triggs, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
  • other feature extraction techniques may be used.
  • the technique used for extracting local features may be different for different facial attributes. For instance, when performing smile detection, LBP is preferred over other techniques; when performing gender or age detection, HoG is preferred over other techniques.
  • Figure 5 is an example of selected local regions for smile detection according to an embodiment of the present invention.
  • Figure 6 is an example of selected local regions for gender detection according to an embodiment of the present invention.
  • a local region may be defined as a quadruple (x, y, w, h) where (x, y) is the top- left corner point of the local region, and (w, h) are the width and height of the rectangle of the local region.
  • a boosting process may be used to select differentiated regions for facial attribute detection from a training dataset.
  • a size-scalable window is slid over the normalized facial image to generate candidate local regions.
  • a 64 x 64 normalized facial image as an example, in an embodiment, one may start with 16 x 16 window, and step every 4 pixels in the face image. The window size may then be increased by 4 pixels (such as from a 16 x 16 window to a 20 x 20 window), when the previous scan is finished.
  • the boosting algorithm may be adopted to simultaneously select a subset of useful local regions from these candidate local regions and train weak classifiers from the local region based representation.
  • the boosting training procedure is listed in Table 1.
  • Step 1 for N face images, extract local features (such as LBP for smile) in each local region R ;.
  • Step 2 Initial weight for each training samples
  • the final prediction classifier Given a test sample, the final prediction classifier is
  • a classifier is trained to perform the weak classification.
  • the base classifier used is a multi-layer perceptron (MLP), rather than support vector machines (SVM).
  • MLP multi-layer perceptron
  • SVM support vector machines
  • a multi-layer perceptron (MLP) is a feed-forward artificial neural network model that maps sets of input data onto a set of appropriate output data.
  • An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function.
  • MLP utilizes a supervised learning technique called back- propagation for training the network.
  • MLP is a modification of the standard linear perceptron, which can distinguish data that is not linearly separable.
  • MLP is used due to several aspects. MLP can provide similar performance to state-of-the-art SVM based algorithms.
  • the model size of MLP is much smaller than SVM since MLP only stores network weights as models while SVM stores sparse training samples (i.e., support vectors).
  • the prediction of MLP is very fast since it only contains vector product operations. MLP directly gives probability and score outputs for prediction confidence.
  • the MLP is the most commonly used type of neural network.
  • the MLP comprises an input layer, an output layer and one hidden layer.
  • d is the dimension of local features, such as 59 for a LBP histogram
  • two nodes at the output layer for smile detection, the two nodes indicate the prediction for smiling or non-smiling, for gender detection, the two nodes indicate the prediction for male or female
  • the number of nodes in the hidden layer is a tuned parameter and determined by the training procedure.
  • Figure 7 is a diagram of the structure of a multi-layer perceptron (MLP) according to an embodiment of the present invention.
  • all of the nodes (neurons) in the MLP are similar.
  • the MLP takes the output values from several nodes in the previous layer on input, and passes the responses to neurons in the next layer.
  • the values retrieved from the previous layer may be summed with trained weights for each node, plus a bias term, and the sum may be transformed using an activation function/.
  • Figure 8 illustrates the computing architecture of each node.
  • SIMD single instruction, multiple data
  • the MLP may be used as a weak classifier for each selected local feature.
  • Each selected local feature will associate with one MLP classifier.
  • Each MLP classifier outputs a value between 0.0 and 1.0 indicating the likelihood that the selected local feature detects the selected facial attribute (e.g., smile detection, gender detection, etc.).
  • the final classification may be determined by aggregation processing 214 based on an aggregating rule:
  • arithmetic mean is used for the aggregation, however, in other embodiments other techniques may be used, such as a weighted average.
  • the final detection score may be shown on the display along with the facial image, and/or used in processing by the application or other components of the processing system.
  • the applicants implemented an embodiment of the present invention in the C/C++ programming language on a X86-based computing platform for smile/gender/age detection processing.
  • Testing of smile detection was performed on the GENKI public dataset (which contains 2,000 smiling faces and 2,000 non-smiling faces), and compared with existing techniques such as a global-feature based method and a Haar-cascade based method.
  • the classifiers were trained on a collection of 8,000 face images, which contained approximately 3,000 smiling faces. The comparison results are listed in Table 2.
  • the classifiers were trained on a collection of 17,700 face images, which contained about 8,100 female and 9,600 male faces.
  • embodiments of the present invention were applied to a FERET dataset with 1,024 faces, embodiments of the present invention achieved 94% accuracy with 140 fps on a processing system including a 1.6G Hz Atom processor.
  • Table 3 The comparison results are shown in Table 3.
  • Embodiments of the present invention aggregate local region features instead of a global feature for facial attribute detection.
  • the local regions are automatically selected by a boosting procedure to ensure maximum accuracy.
  • the MLP weak classifiers are trained over each local feature.
  • the results from the weak classifiers of each of the extracted local features are aggregated to produce the final detection result.
  • the number of selected local regions is typically less than 20; and the boosting MLP classifier performance is not only accurate in prediction, but also very fast and small in size.
  • Embodiments of the present invention overcome disadvantages of the prior art in at least four aspects.
  • First, embodiments of the present invention provide for aggregating results from weak classifiers on local features to produce a strong facial attribute detector.
  • Second, relating to model size, the global features (such as Gabor) are usually of very high dimensions (the feature vector may be more than 23,000 dimensions), which makes the trained classifiers quite large in terms of storage (according to our reimplementation of a known technique, the SVM classifier size is more than 30 MB).
  • implementation of embodiments of the present invention for the local region feature has low dimensions (using LBP, only 59 dimensions for each weak classifier), and the trained classifier implementation size in one embodiment is less than 400 KB.
  • implementation of embodiments of the present invention for the local-region based classifier is relatively stronger than a Haar-classifier; and the classifier size is much smaller.
  • the processing speed is better than prior approaches.
  • Embodiments of the present invention can achieve real-time processing speed even on low-end Atom processors; while other known solutions cannot realize real-time processing speed even on high-end PCs.
  • Embodiments of the present invention comprise a unified solution for many facial attribute detection problems. Embodiments of the present invention may be broadly used in tasks like smile detection, gender detection, blink detection, age detection, etc.
  • Figure 9 illustrates a block diagram of an embodiment of a processing system 900.
  • one or more of the components of the system 900 may be provided in various electronic computing devices capable of performing one or more of the operations discussed herein with reference to some embodiments of the invention.
  • one or more of the components of the processing system 900 may be used to perform the operations discussed with reference to Figures 1-8, e.g., by processing instructions, executing subroutines, etc. in accordance with the operations discussed herein.
  • various storage devices discussed herein e.g., with reference to Figure 9 and/or Figure 10) may be used to store data, operation results, etc.
  • data received over the network 903 may be stored in caches (e.g., LI caches in an embodiment) present in processors 902 (and/or 1002 of Figure 10). These processors may then apply the operations discussed herein in accordance with various embodiments of the invention. More particularly, processing system 900 may include one or more processing unit(s) 902 or processors that communicate via an interconnection network 904. Hence, various operations discussed herein may be performed by a processor in some embodiments.
  • the processors 902 may include a general purpose processor, a network processor (that processes data communicated over a computer network 903, or other types of a processor (including a reduced instruction set computer (RISC) processor or a complex instruction set computer (CISC)).
  • the processors 902 may have a single or multiple core design.
  • the processors 902 with a multiple core design may integrate different types of processor cores on the same integrated circuit (IC) die.
  • the processors 902 with a multiple core design may be implemented as symmetrical or asymmetrical multiprocessors.
  • the operations discussed with reference to Figures 1-8 may be performed by one or more components of the system 900.
  • a processor may comprise facial analysis component 106, and/or application 102 as hardwired logic (e.g., circuitry) or microcode.
  • hardwired logic e.g., circuitry
  • microcode e.g., microcode
  • multiple components shown in Figure 9 may be included on a single integrated circuit (e.g., system on a chip (SOC).
  • a chipset 906 may also communicate with the interconnection network 904.
  • the chipset 906 may include a graphics and memory control hub (GMCH) 908.
  • the GMCH 908 may include a memory controller 910 that communicates with a memory 912.
  • the memory 912 may store data, such as images 202 from camera 104. The data may include sequences of instructions that are executed by the processor 902 or any other device included in the processing system 900.
  • memory 912 may store one or more of the programs such as facial analysis component 106, instructions corresponding to executables, mappings, etc. The same or at least a portion of this data (including instructions, camera images, and temporary storage arrays) may be stored in disk drive 928 and/or one or more caches within processors 902.
  • the memory 912 may include one or more volatile storage (or memory) devices such as random access memory (RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), static RAM (SRAM), or other types of storage devices.
  • volatile storage or memory
  • Nonvolatile memory may also be utilized such as a hard disk. Additional devices may communicate via the interconnection network 904, such as multiple processors and/or multiple system memories.
  • the GMCH 908 may also include a graphics interface 914 that communicates with a display 916.
  • the graphics interface 914 may communicate with the display 916 via an accelerated graphics port (AGP).
  • AGP accelerated graphics port
  • the display 916 may be a flat panel display that communicates with the graphics interface 914 through, for example, a signal converter that translates a digital representation of an image stored in a storage device such as video memory or system memory into display signals that are interpreted and displayed by the display 916.
  • the display signals produced by the interface 914 may pass through various control devices before being interpreted by and subsequently displayed on the display 916.
  • camera images, facial images, and indicators of smile, gender, age, or other facial attribute detection processed by facial analysis component 106 may be shown on the display to a user.
  • a hub interface 918 may allow the GMCH 908 and an input/output (I/O) control hub (ICH) 920 to communicate.
  • the ICH 920 may provide an interface to I/O devices that communicate with the processing system 900.
  • the ICH 920 may communicate with a link 922 through a peripheral bridge (or controller) 924, such as a peripheral component interconnect (PCI) bridge, a universal serial bus (USB) controller, or other types of peripheral bridges or controllers.
  • the bridge 924 may provide a data path between the processor 902 and peripheral devices. Other types of topologies may be utilized.
  • multiple links may communicate with the ICH 920, e.g., through multiple bridges or controllers.
  • peripherals in communication with the ICH 920 may include, in various embodiments of the invention, integrated drive electronics (IDE) or small computer system interface (SCSI) hard drive(s), USB port(s), a keyboard, a mouse, parallel port(s), serial port(s), floppy disk drive(s), digital output support (e.g., digital video interface (DVI)), camera 104, or other devices.
  • IDE integrated drive electronics
  • SCSI small computer system interface
  • the link 922 may communicate with an audio device 926, one or more disk drive(s)
  • the device 930 may be a network interface controller (NIC) capable of wired or wireless communication. Other devices may communicate via the link 922. Also, various components (such as the network interface device 930) may communicate with the GMCH 908 in some embodiments of the invention.
  • the processor 902, the GMCH 908, and/or the graphics interface 914 may be combined to form a single chip.
  • images 202, and/or facial analysis component 106 may be received from computer network 903.
  • the facial analysis component 106 may be a plug-in for a web browser executed by processor 902.
  • nonvolatile memory may include one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically EPROM (EEPROM), a disk drive (e.g., 928), a floppy disk, a compact disk ROM (CD-ROM), a digital versatile disk (DVD), flash memory, a magneto- optical disk, or other types of nonvolatile machine-readable media that are capable of storing electronic data (e.g., including instructions).
  • ROM read-only memory
  • PROM programmable ROM
  • EPROM erasable PROM
  • EEPROM electrically EPROM
  • a disk drive e.g., 928
  • floppy disk e.g., floppy disk
  • CD-ROM compact disk ROM
  • DVD digital versatile disk
  • flash memory e.g., a magneto- optical disk, or other types of nonvolatile machine-readable media that are capable of storing electronic data (e.g., including
  • components of the system 900 may be arranged in a point-to- point (PtP) configuration such as discussed with reference to Figure 10.
  • processors, memory, and/or input/output devices may be interconnected by a number of point-to-point interfaces.
  • Figure 10 illustrates a processing system 1000 that is arranged in a point-to-point (PtP) configuration, according to an embodiment of the invention.
  • Figure 10 shows a system where processors, memory, and input/output devices are interconnected by a number of point-to-point interfaces.
  • the operations discussed with reference to Figures 1-8 may be performed by one or more components of the system 1000.
  • the system 1000 may include multiple processors, of which only two, processors 1002 and 1004 are shown for clarity.
  • the processors 1002 and 1004 may each include a local memory controller hub (MCH) 1006 and 1008 (which may be the same or similar to the GMCH 908 of Figure 9 in some embodiments) to couple with memories 1010 and 1012.
  • MCH memory controller hub
  • the memories 1010 and/or 1012 may store various data such as those discussed with reference to the memory 912 of Figure 9.
  • the processors 1002 and 1004 may be any suitable processor such as those discussed with reference to processors 902 of Figure 9.
  • the processors 1002 and 1004 may exchange data via a point-to-point (PtP) interface 1014 using PtP interface circuits 1016 and 1018, respectively.
  • the processors 1002 and 1004 may each exchange data with a chipset 1020 via individual PtP interfaces 1022 and 1024 using point to point interface circuits 1026, 1028, 1030, and 1032.
  • the chipset 1020 may also exchange data with a high-performance graphics circuit 1034 via a high-performance graphics interface 1036, using a PtP interface circuit 1037.
  • At least one embodiment of the invention may be provided by utilizing the processors 1002 and 1004.
  • the processors 1002 and/or 1004 may perform one or more of the operations of Figures 1-18.
  • Other embodiments of the invention may exist in other circuits, logic units, or devices within the system 1000 of Figure 10.
  • other embodiments of the invention may be distributed throughout several circuits, logic units, or devices illustrated in Figure 10.
  • the chipset 1020 may be coupled to a link 1040 using a PtP interface circuit 1041.
  • the link 1040 may have one or more devices coupled to it, such as bridge 1042 and I/O devices 1043. Via link 1044, the bridge 1043 may be coupled to other devices such as a keyboard/mouse 1045, the network interface device 1030 discussed with reference to Figure 9 (such as modems, network interface cards (NICs), or the like that may be coupled to the computer network 1003), audio I/O device 1047, and/or a data storage device 1048.
  • the data storage device 1048 may store, in an embodiment, facial analysis component code 1049 that may be executed by the processors 1002 and/or 1004.
  • the operations discussed herein may be implemented as hardware (e.g., logic circuitry), software (including, for example, micro-code that controls the operations of a processor such as the processors discussed with reference to Figures 9 and 10), firmware, or combinations thereof, which may be provided as a computer program product, e.g., including a tangible machine-readable or computer-readable medium having stored thereon instructions (or software procedures) used to program a computer (e.g., a processor or other logic of a computing device) to perform an operation discussed herein.
  • the machine-readable medium may include a storage device such as those discussed herein.
  • Coupled may mean that two or more elements are in direct physical or electrical contact.
  • Coupled may mean that two or more elements are in direct physical or electrical contact.
  • coupled may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other.
  • Such computer-readable media may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals, via a communication link (e.g., a bus, a modem, or a network connection).
  • a remote computer e.g., a server
  • a requesting computer e.g., a client
  • a communication link e.g., a bus, a modem, or a network connection

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Detection of a facial attribute such as a smile or gender in a human face in an image is performed by embodiments of the present invention in a computationally efficient manner. First, a face in the image is detected to produce a facial image. Facial landmarks are detected in the facial image. The facial image is aligned and normalized based on the detected facial landmarks to produce a normalized facial image. Local features from selected local regions are extracted from the normalized facial image. A facial attribute is predicted in each selected local region by inputting each selected local feature into a weak classifier having a multi-layer perceptron (MLP) structure. Finally, output data is aggregated from each weak classifier component to generate an indication that the facial attribute is detected in the facial image.

Description

METHOD OF DETECTING FACIAL ATTRIBUTES
FIELD
The present disclosure generally relates to the field of image processing. More particularly, an embodiment of the invention relates to facial attributes detection processing executed by a processor in a processing system for analyzing facial images.
BACKGROUND
With the advancement of increased computing power, face recognition applications are becoming more and more popular, e.g., Auto focus/Auto white balance/ Auto exposure (3A) processing and smile shutter in digital cameras, avatar-based communications on smart phones, face recognition login capabilities on handheld computing devices, and so on. In these facial analysis applications, it may be desirable to detect facial attributes. Facial attributes may include whether a face is smiling or not, whether a face is the face of a man or a woman, whether the eyes are closed or not, or whether a face belongs to a child, young adult, middle aged adult, or a senior citizen. Other facial attributes may also be detected. Facial attribute detection has many usages. For instance, smile detection can be used as a smile shutter activation in camera imaging, or used as a mood detection capability in an automated favorite advertising survey. Gender detection can be used in automated advertising selection for smart digital signage. Facial attribute detection can also be used in other areas such as video surveillance, visual search, and content analysis, etc. Thus, fast and efficient techniques for detection of facial attributes in facial images are desired.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is provided with reference to the accompanying figures. The use of the same reference numbers in different figures indicates similar or identical items.
Figure 1 is a diagram of a facial image processing system according to an embodiment of the present invention.
Figure 2 is a diagram of facial analysis processing according to an embodiment of the present invention.
Figure 3 is an example of detected facial landmarks in a facial image according to an embodiment of the present invention.
Figure 4 is an example of alignment and normalization of a selected portion of an example facial image according to an embodiment of the present invention.
Figure 5 is an example of selected local regions for smile detection according to an embodiment of the present invention.
Figure 6 is an example of selected local regions for gender detection according to an embodiment of the present invention. Figure 7 is a diagram of the structure of a multi-layer perceptron (MLP) according to an embodiment of the present invention.
Figure 8 is a diagram of a computing model at each node of the MLP according to an embodiment of the present invention.
Figures 9 and 10 illustrate block diagrams of embodiments of processing systems, which may be utilized to implement some embodiments discussed herein.
DETAILED DESCRIPTION
Embodiments of the present invention provide for fast and accurate detection of facial attributes (such as smile/mood, gender, age, blink, etc.) in an image, which may be used in camera imaging, digital signage and other applications. Embodiments of the present invention achieve state-of-the-art accuracy (96% in smile detection, 94% in gender detection, etc., in one test), require very small memory usage (less than 400KB), and execute quickly (more than 800 faces per second detection on a processing system having a 1.6GHz Atom processor in one test).
Embodiments of the present invention comprise a unified framework for facial attribute detection. Embodiments provide at least several advantages over known facial attribute detection implementations, such as for smile detection and gender detection. First, embodiments of the present invention provide for a unified and general technique which works well for detection of all facial attributes with high accuracy, while the known techniques are only applicable for one specific facial attribute category. Second, the present method has very small memory and computing resource requirements. Therefore, facial attribute detection according to embodiments may be applied to a variety of computing platforms from a personal computer (PC) to embedded devices.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of various embodiments. However, various embodiments of the invention may be practiced without the specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to obscure the particular embodiments of the invention. Further, various aspects of embodiments of the invention may be performed using various means, such as integrated semiconductor circuits ("hardware"), computer-readable instructions organized into one or more programs stored on a computer readable storage medium ("software"), or some combination of hardware and software. For the purposes of this disclosure reference to "logic" shall mean either hardware, software (including for example micro-code that controls the operations of a processor), firmware, or some combination thereof.
Embodiments of the present invention process face images captured from a camera or previously stored in a processing system. Figure 1 is a diagram of a processing system 100 in accordance with some embodiments of the invention. Processing system includes application 102, camera 104, and display 111. In various embodiments, the processing system may be a personal computer (PC), a laptop computer, a netbook, a tablet computer, a handheld computer, a smart phone, a mobile Internet device (MID), or any other stationary or mobile processing device. In some embodiments, the camera may be integral with the processing system. The camera may be a still camera or a video camera. In other embodiments, the camera may be external to the processing system but communicatively coupled with the processing system. In an embodiment, images captured by a camera may be communicated over a network, or wired or wireless interface, to the processing system for analysis. Application 102 may be an application program to be executed on the processing system. In various embodiments, the application program may be a standalone program, or a part of another program (such as a plug-in, for example), for a web browser, image processing application, game, or multimedia application, for example. Application 102 may include facial analysis component 106 to analyze images captured by the camera to detect human faces. In an embodiment, application 102 and/or facial analysis component 106 may be implemented as a hardware component, firmware component, software component or combination of one or more of hardware, firmware, and/or software components, as part of processing system 100.
In an embodiment, a user may operate processing system 100 to capture one or more images from camera 104. The captured one or more images may be input to application 102 for various purposes. Application 102 may pass the one or more images to facial analysis component 106 for determining facial characteristics in the one or more images. In an embodiment, facial analysis component 106 may detect facial attributes in the one or more images. Results of application processing, including facial analysis, may be shown on display 111.
Figure 2 is a diagram of facial analysis processing 200 performed by facial analysis component 106 according to an embodiment of the present invention. An image 202 may be input to the facial analysis process. At block 204, a face may be detected in the image by performing a face detection process by face detection component 205 to locate a face rectangle region for each detected face in the image. At block 206, facial landmarks may be detected in each face rectangle region by performing a facial landmark detection process by landmark detection component 207. In an embodiment, the facial landmarks comprise six points: the corners of the eyes and mouth. At block 208, the face rectangle region may be aligned and normalized by a face alignment process in alignment component 209 based at least in part on the detected facial landmarks to a predetermined size. In one embodiment, the resulting size of the aligned and normalized image may be less than the size of the facial image. In an embodiment, the size is 64 pixels by 64 pixels. At block 210, local features may be extracted from selected local regions of the normalized face images by extraction component 211. A local region is a subset of a normalized face image. In an embodiment, a local region may be converted to a set of numbers by a transformation process. In one embodiment, a local binary patterns (LBP) process may be used as a feature extractor. In another embodiment, a histogram of orient gradient (HoG) process may be used as a feature extractor. In other embodiments, other feature extraction processes may be used. In an embodiment, a local feature is a set of these numbers that represents a local region. At block 212, each local region is represented by the extracted local features, and the extracted local features may be inputted to a weak classifier component based on a multi-layer perceptron (MLP) structure for prediction of a selected facial attribute by prediction component 213. At block 214, output data from the weak classifier for each local feature may be aggregated as a final detection score 214 by aggregation component 215. In an embodiment, the final detection score may be an indication as to whether the facial attribute is detected in the image, the final detection score being in a range of 0.0 to 1.0, where the large the value, the higher the confidence in detecting a selected facial attribute.
Face detection processing 204 may be performed on an input image to detect a face in the image. In an embodiment, any known face detection process may be used as long as the face detection process produces a rectangle image of the detected face. The input data comprises one or more 2D images. In an embodiment, the 2D image is a still image. In another embodiment, the 2D images comprise a sequence of video frames at a certain frame rate fps with each video frame having an image resolution (WxH). In an embodiment, an existing face detection approach following the well known Viola-Jones framework as shown in "Rapid Object Detection Using a Boosted Cascade of Simple Features," by Paul Viola and Michael Jones, Conference on Computer Vision and Pattern Recognition, 2001, may be used.
After locating the face regions, embodiments of the present invention detect accurate positions of facial landmarks, such as the mouth, and corners of the eyes. A landmark is a point of interest within a face. The left eye, right eye, and nose base are all examples of landmarks. In an embodiment, facial landmark detection processing 206 may be performed as disclosed in the co-pending patent application entitled "Method of Facial Landmark Detection," by Ang Liu, Yangzhou Du, Tao Wang, Jianguo Li, Qiang Li, and Yimin Zhang, docket number P37155, commonly assigned to the assignee of the present application. Figure 3 is an example of detected facial landmarks in a facial image according to an embodiment of the present invention. The rectangle region denotes the detected face as a result of facial detection processing 204 and the diamonds indicate the six detected facial landmarks as a result of facial landmark detection processing 206. During face alignment processing 208, the detected faces may be converted to gray-scale, aligned and normalized to a pre-defined size, such as 64 x 64 (64 pixels in width and height). In an embodiment, the alignment may be done in the following steps:
Compute the rotation angle Θ between eye-corner lines and a horizontal line;
Rotate the image by angle Θ to make the eye-corner parallel to horizontal line;
Compute the distance between the two eye-centers (w) and the eye-to-mouth distance (h);
Crop a (2w x 2h) rectangle from the face region, to make the left eye center at (0.5w, 0.5h), the right eye center at (1.5w, 0.5h), and the mouth center at (w, 1.5h).
Scale the cropped rectangle to the pre-defined size (such as 64x64).
To alleviate lighting difference among images, the scaled image may be histogram equalized. Figure 4 is an example of alignment and normalization of a selected portion of the facial image according to an embodiment of the present invention.
Local feature extraction processing 210 may be performed as follows. Local features may be extracted from local regions of aligned and normalized facial images. In one embodiment, the local features may be represented as a Local Binary Patterns (LBP) histogram. A suitable LBP technique is disclosed in "Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns," by T. Ojala, M. Pietkainen, and T. Maenpaa, in IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2002, pages 971-987. In another embodiment, the local features may be represented as a Histogram of oriented Gradients (HoG). A suitable HoG technique is disclosed in "Histograms of Oriented Gradients for Human Detection," by N. Dala and B. Triggs, in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005. In other embodiments, other feature extraction techniques may be used. In various embodiments of the present invention, the technique used for extracting local features may be different for different facial attributes. For instance, when performing smile detection, LBP is preferred over other techniques; when performing gender or age detection, HoG is preferred over other techniques. Figure 5 is an example of selected local regions for smile detection according to an embodiment of the present invention. Figure 6 is an example of selected local regions for gender detection according to an embodiment of the present invention. A local region may be defined as a quadruple (x, y, w, h) where (x, y) is the top- left corner point of the local region, and (w, h) are the width and height of the rectangle of the local region. There are many possible local regions within a normalized facial image. In embodiments of the present invention, a boosting process may be used to select differentiated regions for facial attribute detection from a training dataset.
In a training procedure, a size-scalable window is slid over the normalized facial image to generate candidate local regions. Taking a 64 x 64 normalized facial image as an example, in an embodiment, one may start with 16 x 16 window, and step every 4 pixels in the face image. The window size may then be increased by 4 pixels (such as from a 16 x 16 window to a 20 x 20 window), when the previous scan is finished.
There are approximately several hundred candidate windows (local regions) which may be identified according to this scheme when the normalized facial image is 64 pixels by 64 pixels. However, only a few local regions may be useful for the final classification. The boosting algorithm may be adopted to simultaneously select a subset of useful local regions from these candidate local regions and train weak classifiers from the local region based representation. The boosting training procedure is listed in Table 1.
Table 1. Boosting procedure for local region selection and weak classifier training
Input: Candidate local region {R. }f=1 ;
Training face image set, N face images and corresponding labels y; (in smile detection, = 1 for smiling, -1 for non-smile)
Suppose Na positive faces, and Nb negative faces
Step 1 : for N face images, extract local features (such as LBP for smile) in each local region R;.
Step 2: Initial weight for each training samples
Figure imgf000010_0001
W(Xj ) = 1/Nb otherwise
Step 3: for k = 1 :K // boosting round
1) Sampling with N samples according to the weight W(.) to get a subset
2) For each local region R;. a) represent face image with LBP histogram feature in local region; b) train an MLP classifier on the formulated subset c) output total prediction error si
3) Pick the local region with the lowest prediction error as current round classifier Ck ( x)
4) Use j (jc) ~ Ck (x) to predict all training samples
5) Aggregating l~k-th round classifier together
C (x) = -Yk C (x)
Suppose ckj = Ck (Xj ) indicates the aggregating classifier output for sample j,
The total error of the aggregating classifier on the training set is sk Update the sampling weight by following formula a = log(-—
W(Xj ) = W(Xj ) exp(-ackj ) if correct predicted
W (Xj ) = W(Xj ) εχρ(ο¾:λ;. ) otherwise
7) Check whether ¾ is converged or not
Output: j( ) ~ CK(x) and corresponding selected regions.
Given a test sample, the final prediction classifier is
For each extracted local feature, in an embodiment, a classifier is trained to perform the weak classification. In embodiments of the present invention, the base classifier used is a multi-layer perceptron (MLP), rather than support vector machines (SVM). A multi-layer perceptron (MLP) is a feed-forward artificial neural network model that maps sets of input data onto a set of appropriate output data. An MLP consists of multiple layers of nodes in a directed graph, with each layer fully connected to the next one. Except for the input nodes, each node is a neuron (or processing element) with a nonlinear activation function. MLP utilizes a supervised learning technique called back- propagation for training the network. MLP is a modification of the standard linear perceptron, which can distinguish data that is not linearly separable.
In embodiments of the present invention, MLP is used due to several aspects. MLP can provide similar performance to state-of-the-art SVM based algorithms. The model size of MLP is much smaller than SVM since MLP only stores network weights as models while SVM stores sparse training samples (i.e., support vectors). The prediction of MLP is very fast since it only contains vector product operations. MLP directly gives probability and score outputs for prediction confidence.
MLP is the most commonly used type of neural network. In an embodiment of the present invention, the MLP comprises an input layer, an output layer and one hidden layer. Suppose there are d nodes at the input layer (where d is the dimension of local features, such as 59 for a LBP histogram); two nodes at the output layer (for smile detection, the two nodes indicate the prediction for smiling or non-smiling, for gender detection, the two nodes indicate the prediction for male or female), while the number of nodes in the hidden layer is a tuned parameter and determined by the training procedure. Figure 7 is a diagram of the structure of a multi-layer perceptron (MLP) according to an embodiment of the present invention.
In an embodiment, all of the nodes (neurons) in the MLP are similar. The MLP takes the output values from several nodes in the previous layer on input, and passes the responses to neurons in the next layer. The values retrieved from the previous layer may be summed with trained weights for each node, plus a bias term, and the sum may be transformed using an activation function/. Figure 8 illustrates the computing architecture of each node.
The active function / may be a sigmoid function, for instance f(x)=e'xa/( l+e'xa ) . The output of this function is in the range 0.0 to 1.0. It is apparent that at each node, the computation is a vector product between a weight vector and an input vector from the previous layer, mathematically y = /(w - x) where w is the weight vector and x is the input vector. These computations can be accelerated by using single instruction, multiple data (SIMD) instructions or other accelerators. Hence, the classification by the MLP is very efficient.
As described above, the MLP may be used as a weak classifier for each selected local feature. Each selected local feature will associate with one MLP classifier. Each MLP classifier outputs a value between 0.0 and 1.0 indicating the likelihood that the selected local feature detects the selected facial attribute (e.g., smile detection, gender detection, etc.).
In an embodiment, the final classification may be determined by aggregation processing 214 based on an aggregating rule:
Given a test sample x;
For each selected local region k, extract local features ¾ at that local region, then use a weak MLP classifier Ck (xk ) to do the prediction; The final output is the aggregated result C (x) =— ^ Cn (x)
In an embodiment, arithmetic mean is used for the aggregation, however, in other embodiments other techniques may be used, such as a weighted average. In an embodiment, the final detection score may be shown on the display along with the facial image, and/or used in processing by the application or other components of the processing system.
The applicants implemented an embodiment of the present invention in the C/C++ programming language on a X86-based computing platform for smile/gender/age detection processing. Testing of smile detection was performed on the GENKI public dataset (which contains 2,000 smiling faces and 2,000 non-smiling faces), and compared with existing techniques such as a global-feature based method and a Haar-cascade based method. The classifiers were trained on a collection of 8,000 face images, which contained approximately 3,000 smiling faces. The comparison results are listed in Table 2.
Table 2. Comparison of smile detection performance on 1.6G Hz Atom CPU, 1G Memory
Figure imgf000013_0001
Note the accuracy number in Table 2 is the "area-under-roc" curve accuracy, which is a standard metric for measuring detection performance. It is apparent that embodiments of the present invention are considerably more computing/memory efficient and accurate than two other known techniques. For a smile detector, the applicants also made a live comparison with digital camera/video recorder on several videos. The results showed that embodiments of the present invention can achieve a 98% hit-rate with only 1.6% false alarms; while that of known digital camera/digital video recorder (DC/DVR) (tuned with the highest sensitive level) only achieved a 80% hit-rate and more than 15% rate of false alarms.
For gender detection, the classifiers were trained on a collection of 17,700 face images, which contained about 8,100 female and 9,600 male faces. When embodiments of the present invention were applied to a FERET dataset with 1,024 faces, embodiments of the present invention achieved 94% accuracy with 140 fps on a processing system including a 1.6G Hz Atom processor. The comparison results are shown in Table 3.
Table 3. Comparison of gender detection performance
Figure imgf000014_0001
Embodiments of the present invention aggregate local region features instead of a global feature for facial attribute detection. The local regions are automatically selected by a boosting procedure to ensure maximum accuracy. The MLP weak classifiers are trained over each local feature. The results from the weak classifiers of each of the extracted local features are aggregated to produce the final detection result. The number of selected local regions is typically less than 20; and the boosting MLP classifier performance is not only accurate in prediction, but also very fast and small in size.
Embodiments of the present invention overcome disadvantages of the prior art in at least four aspects. First, embodiments of the present invention provide for aggregating results from weak classifiers on local features to produce a strong facial attribute detector. Second, relating to model size, the global features (such as Gabor) are usually of very high dimensions (the feature vector may be more than 23,000 dimensions), which makes the trained classifiers quite large in terms of storage (according to our reimplementation of a known technique, the SVM classifier size is more than 30 MB). In contrast, implementation of embodiments of the present invention for the local region feature has low dimensions (using LBP, only 59 dimensions for each weak classifier), and the trained classifier implementation size in one embodiment is less than 400 KB. The boosting cascade classifiers of the prior art usually take thousands of features and the classifier size is also quite large (>= 10MB in our reimplementation of a known technique). In contrast, implementation of embodiments of the present invention for the local-region based classifier is relatively stronger than a Haar-classifier; and the classifier size is much smaller. Third, the processing speed is better than prior approaches. Embodiments of the present invention can achieve real-time processing speed even on low-end Atom processors; while other known solutions cannot realize real-time processing speed even on high-end PCs.
Embodiments of the present invention comprise a unified solution for many facial attribute detection problems. Embodiments of the present invention may be broadly used in tasks like smile detection, gender detection, blink detection, age detection, etc. Figure 9 illustrates a block diagram of an embodiment of a processing system 900.
In various embodiments, one or more of the components of the system 900 may be provided in various electronic computing devices capable of performing one or more of the operations discussed herein with reference to some embodiments of the invention. For example, one or more of the components of the processing system 900 may be used to perform the operations discussed with reference to Figures 1-8, e.g., by processing instructions, executing subroutines, etc. in accordance with the operations discussed herein. Also, various storage devices discussed herein (e.g., with reference to Figure 9 and/or Figure 10) may be used to store data, operation results, etc. In one embodiment, data received over the network 903 (e.g., via network interface devices 930 and/or 1030) may be stored in caches (e.g., LI caches in an embodiment) present in processors 902 (and/or 1002 of Figure 10). These processors may then apply the operations discussed herein in accordance with various embodiments of the invention. More particularly, processing system 900 may include one or more processing unit(s) 902 or processors that communicate via an interconnection network 904. Hence, various operations discussed herein may be performed by a processor in some embodiments. Moreover, the processors 902 may include a general purpose processor, a network processor (that processes data communicated over a computer network 903, or other types of a processor (including a reduced instruction set computer (RISC) processor or a complex instruction set computer (CISC)). Moreover, the processors 902 may have a single or multiple core design. The processors 902 with a multiple core design may integrate different types of processor cores on the same integrated circuit (IC) die. Also, the processors 902 with a multiple core design may be implemented as symmetrical or asymmetrical multiprocessors. Moreover, the operations discussed with reference to Figures 1-8 may be performed by one or more components of the system 900. In an embodiment, a processor (such as processor 1 902-1) may comprise facial analysis component 106, and/or application 102 as hardwired logic (e.g., circuitry) or microcode. In an embodiment, multiple components shown in Figure 9 may be included on a single integrated circuit (e.g., system on a chip (SOC).
A chipset 906 may also communicate with the interconnection network 904. The chipset 906 may include a graphics and memory control hub (GMCH) 908. The GMCH 908 may include a memory controller 910 that communicates with a memory 912. The memory 912 may store data, such as images 202 from camera 104. The data may include sequences of instructions that are executed by the processor 902 or any other device included in the processing system 900. Furthermore, memory 912 may store one or more of the programs such as facial analysis component 106, instructions corresponding to executables, mappings, etc. The same or at least a portion of this data (including instructions, camera images, and temporary storage arrays) may be stored in disk drive 928 and/or one or more caches within processors 902. In one embodiment of the invention, the memory 912 may include one or more volatile storage (or memory) devices such as random access memory (RAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), static RAM (SRAM), or other types of storage devices. Nonvolatile memory may also be utilized such as a hard disk. Additional devices may communicate via the interconnection network 904, such as multiple processors and/or multiple system memories.
The GMCH 908 may also include a graphics interface 914 that communicates with a display 916. In one embodiment of the invention, the graphics interface 914 may communicate with the display 916 via an accelerated graphics port (AGP). In an embodiment of the invention, the display 916 may be a flat panel display that communicates with the graphics interface 914 through, for example, a signal converter that translates a digital representation of an image stored in a storage device such as video memory or system memory into display signals that are interpreted and displayed by the display 916. The display signals produced by the interface 914 may pass through various control devices before being interpreted by and subsequently displayed on the display 916. In an embodiment, camera images, facial images, and indicators of smile, gender, age, or other facial attribute detection processed by facial analysis component 106 may be shown on the display to a user.
A hub interface 918 may allow the GMCH 908 and an input/output (I/O) control hub (ICH) 920 to communicate. The ICH 920 may provide an interface to I/O devices that communicate with the processing system 900. The ICH 920 may communicate with a link 922 through a peripheral bridge (or controller) 924, such as a peripheral component interconnect (PCI) bridge, a universal serial bus (USB) controller, or other types of peripheral bridges or controllers. The bridge 924 may provide a data path between the processor 902 and peripheral devices. Other types of topologies may be utilized. Also, multiple links may communicate with the ICH 920, e.g., through multiple bridges or controllers. Moreover, other peripherals in communication with the ICH 920 may include, in various embodiments of the invention, integrated drive electronics (IDE) or small computer system interface (SCSI) hard drive(s), USB port(s), a keyboard, a mouse, parallel port(s), serial port(s), floppy disk drive(s), digital output support (e.g., digital video interface (DVI)), camera 104, or other devices. The link 922 may communicate with an audio device 926, one or more disk drive(s)
928, and a network interface device 930, which may be in communication with the computer network 903 (such as the Internet, for example). In an embodiment, the device 930 may be a network interface controller (NIC) capable of wired or wireless communication. Other devices may communicate via the link 922. Also, various components (such as the network interface device 930) may communicate with the GMCH 908 in some embodiments of the invention. In addition, the processor 902, the GMCH 908, and/or the graphics interface 914 may be combined to form a single chip. In an embodiment, images 202, and/or facial analysis component 106 may be received from computer network 903. In an embodiment, the facial analysis component 106 may be a plug-in for a web browser executed by processor 902.
Furthermore, the processing system 900 may include volatile and/or nonvolatile memory (or storage). For example, nonvolatile memory may include one or more of the following: read-only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically EPROM (EEPROM), a disk drive (e.g., 928), a floppy disk, a compact disk ROM (CD-ROM), a digital versatile disk (DVD), flash memory, a magneto- optical disk, or other types of nonvolatile machine-readable media that are capable of storing electronic data (e.g., including instructions).
In an embodiment, components of the system 900 may be arranged in a point-to- point (PtP) configuration such as discussed with reference to Figure 10. For example, processors, memory, and/or input/output devices may be interconnected by a number of point-to-point interfaces. More specifically, Figure 10 illustrates a processing system 1000 that is arranged in a point-to-point (PtP) configuration, according to an embodiment of the invention. In particular, Figure 10 shows a system where processors, memory, and input/output devices are interconnected by a number of point-to-point interfaces. The operations discussed with reference to Figures 1-8 may be performed by one or more components of the system 1000. As illustrated in Figure 10, the system 1000 may include multiple processors, of which only two, processors 1002 and 1004 are shown for clarity. The processors 1002 and 1004 may each include a local memory controller hub (MCH) 1006 and 1008 (which may be the same or similar to the GMCH 908 of Figure 9 in some embodiments) to couple with memories 1010 and 1012. The memories 1010 and/or 1012 may store various data such as those discussed with reference to the memory 912 of Figure 9.
The processors 1002 and 1004 may be any suitable processor such as those discussed with reference to processors 902 of Figure 9. The processors 1002 and 1004 may exchange data via a point-to-point (PtP) interface 1014 using PtP interface circuits 1016 and 1018, respectively. The processors 1002 and 1004 may each exchange data with a chipset 1020 via individual PtP interfaces 1022 and 1024 using point to point interface circuits 1026, 1028, 1030, and 1032. The chipset 1020 may also exchange data with a high-performance graphics circuit 1034 via a high-performance graphics interface 1036, using a PtP interface circuit 1037.
At least one embodiment of the invention may be provided by utilizing the processors 1002 and 1004. For example, the processors 1002 and/or 1004 may perform one or more of the operations of Figures 1-18. Other embodiments of the invention, however, may exist in other circuits, logic units, or devices within the system 1000 of Figure 10. Furthermore, other embodiments of the invention may be distributed throughout several circuits, logic units, or devices illustrated in Figure 10. The chipset 1020 may be coupled to a link 1040 using a PtP interface circuit 1041.
The link 1040 may have one or more devices coupled to it, such as bridge 1042 and I/O devices 1043. Via link 1044, the bridge 1043 may be coupled to other devices such as a keyboard/mouse 1045, the network interface device 1030 discussed with reference to Figure 9 (such as modems, network interface cards (NICs), or the like that may be coupled to the computer network 1003), audio I/O device 1047, and/or a data storage device 1048. The data storage device 1048 may store, in an embodiment, facial analysis component code 1049 that may be executed by the processors 1002 and/or 1004.
In various embodiments of the invention, the operations discussed herein, e.g., with reference to Figures 1-10, may be implemented as hardware (e.g., logic circuitry), software (including, for example, micro-code that controls the operations of a processor such as the processors discussed with reference to Figures 9 and 10), firmware, or combinations thereof, which may be provided as a computer program product, e.g., including a tangible machine-readable or computer-readable medium having stored thereon instructions (or software procedures) used to program a computer (e.g., a processor or other logic of a computing device) to perform an operation discussed herein. The machine-readable medium may include a storage device such as those discussed herein.
Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least an implementation. The appearances of the phrase "in one embodiment" in various places in the specification may or may not be all referring to the same embodiment.
Also, in the description and claims, the terms "coupled" and "connected," along with their derivatives, may be used. In some embodiments of the invention, "connected" may be used to indicate that two or more elements are in direct physical or electrical contact with each other. "Coupled" may mean that two or more elements are in direct physical or electrical contact. However, "coupled" may also mean that two or more elements may not be in direct contact with each other, but may still cooperate or interact with each other.
Additionally, such computer-readable media may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals, via a communication link (e.g., a bus, a modem, or a network connection).
Thus, although embodiments of the invention have been described in language specific to structural features and/or methodological acts, it is to be understood that claimed subject matter may not be limited to the specific features or acts described. Rather, the specific features and acts are disclosed as sample forms of implementing the claimed subject matter.

Claims

1. A method of detecting a facial attribute in an image comprising: detecting a face in the image to produce a facial image; detecting facial landmarks in the facial image; aligning and normalizing the facial image based at least in part on the detected facial landmarks to produce a normalized facial image; extracting a plurality of local features from selected local regions of the normalized facial image; predicting the facial attribute in each selected local region by inputting each selected local feature into one of a plurality of weak classifier components, each weak classifier component having a multi-layer perceptron (MLP) structure; and aggregating output data from each weak classifier component to generate an indication that the facial attribute is detected in the facial image.
2. The method of claim 1 , wherein the facial attribute is a smile.
3. The method of claim 1, wherein the facial attribute is gender.
4. The method of claim 1, wherein the facial attribute is age.
5. The method of claim 1, wherein aligning and normalizing the facial image comprises converting the facial image to gray scale, aligning the facial image using the detected facial landmarks, and normalizing the gray scale aligned facial image to a predetermined size to produce the normalized facial image, the predetermined size being less than the size of the facial image.
6. The method of claim 1, wherein the local features are represented as a Local Binary Patterns (LBP) histogram.
7. The method of claim 1 , wherein the local features are represented as a Histogram of oriented Gradients (HoG).
8. The method of claim 1, further comprising sliding a size-scalable window over the normalized facial image to generate candidate local regions, and selecting a subset of the candidate local regions as selected local regions.
9. The method of claim 1, further comprising training a classifier to perform the weak classification for each extracted local feature.
10. A processing system to perform image analysis processing, comprising: a face detection component to analyze an image to detect a face in the image; a facial landmark detection component to analyze the facial image to detect facial landmarks; an aligning and normalizing component to align and normalize the facial image based at least in part on the detected facial landmarks to produce a normalized facial image; an extraction component to extract a plurality of local features from selected local regions of the normalized facial image; a prediction component to predict the facial attribute in each selected local region by inputting each selected local feature into one of a plurality of weak classifier components, each weak classifier component having a multi-layer perceptron (MLP) structure; and an aggregation component to output data from each weak classifier component to generate an indication that the facial attribute is detected in the facial image.
1 1. The processing system of claim 10, wherein the facial attribute is a smile.
12. The processing system of claim 10, wherein the facial attribute is gender.
13. The processing system of claim 10, wherein aligning and normalizing component is adapted to convert the facial image to gray scale, align the facial image using the detected facial landmarks, and normalize the gray scale aligned facial image to a predetermined size to produce the normalized facial image, the predetermined size being less than the size of the facial image.
14. The processing system of claim 10, wherein the local features are represented as a Local Binary Patterns (LBP) histogram.
15. The processing system of claim 10, wherein the local features are represented as a Histogram of oriented Gradients (HoG).
16. The processing system of claim 10, wherein the extraction component is adapted to slide a size-scalable window over the normalized facial image to generate candidate local regions, and select a subset of the candidate local regions as selected local regions.
17. A processing system to perform image analysis processing, comprising: a camera to capture an image; a face detection component to analyze the image to detect a face in the image; a facial landmark detection component to analyze the facial image to detect facial landmarks; an aligning and normalizing component to align and normalize the facial image based at least in part on the detected facial landmarks to produce a normalized facial image; an extraction component to extract a plurality of local features from selected local regions of the normalized facial image; a prediction component to predict the facial attribute in each selected local region by inputting each selected local feature into one of a plurality of weak classifier components, each weak classifier component having a multi-layer perceptron (MLP) structure; an aggregation component to output data from each weak classifier component to generate an indication that the facial attribute is detected in the facial image; and a display to show the image and the indication.
18. The processing system of claim 17, wherein the facial attribute is a smile.
19. The processing system of claim 17, wherein the facial attribute is gender.
20. The processing system of claim 17, wherein the local features are represented as a Local Binary Patterns (LBP) histogram.
21. The processing system of claim 17, wherein the local features are represented as a Histogram of oriented Gradients (HoG).
22. Machine-readable instructions arranged, when executed, to implement a method or realize an apparatus as claimed in any preceding claim.
23. Machine-readable storage storing machine-readable instructions as claimed in claim 12.
PCT/CN2011/072597 2011-04-11 2011-04-11 Method of detecting facial attributes WO2012139273A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
CN201180070557.7A CN103503029B (en) 2011-04-11 2011-04-11 The method of detection facial characteristics
PCT/CN2011/072597 WO2012139273A1 (en) 2011-04-11 2011-04-11 Method of detecting facial attributes
EP20110863314 EP2697775A4 (en) 2011-04-11 2011-04-11 Method of detecting facial attributes
US13/997,310 US8805018B2 (en) 2011-04-11 2011-04-11 Method of detecting facial attributes
TW101112312A TWI470563B (en) 2011-04-11 2012-04-06 Method of detecting attributes in an image, processing system to perform image analysis processing, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/072597 WO2012139273A1 (en) 2011-04-11 2011-04-11 Method of detecting facial attributes

Publications (1)

Publication Number Publication Date
WO2012139273A1 true WO2012139273A1 (en) 2012-10-18

Family

ID=47008778

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/072597 WO2012139273A1 (en) 2011-04-11 2011-04-11 Method of detecting facial attributes

Country Status (5)

Country Link
US (1) US8805018B2 (en)
EP (1) EP2697775A4 (en)
CN (1) CN103503029B (en)
TW (1) TWI470563B (en)
WO (1) WO2012139273A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781221B2 (en) 2011-04-11 2014-07-15 Intel Corporation Hand gesture recognition system
US8805018B2 (en) 2011-04-11 2014-08-12 Intel Corporation Method of detecting facial attributes
CN109657582A (en) * 2018-12-10 2019-04-19 平安科技(深圳)有限公司 Recognition methods, device, computer equipment and the storage medium of face mood
CN109844761A (en) * 2016-10-19 2019-06-04 斯纳普公司 The neural net model establishing of face

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9268995B2 (en) * 2011-04-11 2016-02-23 Intel Corporation Smile detection techniques
JP5913940B2 (en) * 2011-12-01 2016-05-11 キヤノン株式会社 Image recognition apparatus, image recognition apparatus control method, and program
WO2014088125A1 (en) * 2012-12-04 2014-06-12 엘지전자 주식회사 Image photographing device and method for same
TWI716344B (en) * 2014-02-24 2021-01-21 日商花王股份有限公司 Aging analyzing method, aging care counselling method using aging analyzing method, aging analyzing device and computer readable recording medium
US9444999B2 (en) 2014-08-05 2016-09-13 Omnivision Technologies, Inc. Feature detection in image capture
EP3183689A4 (en) * 2014-08-22 2017-08-23 Microsoft Technology Licensing, LLC Face alignment with shape regression
US9762834B2 (en) 2014-09-30 2017-09-12 Qualcomm Incorporated Configurable hardware for computing computer vision features
US9838635B2 (en) 2014-09-30 2017-12-05 Qualcomm Incorporated Feature computation in a sensor element array
US9554100B2 (en) 2014-09-30 2017-01-24 Qualcomm Incorporated Low-power always-on face detection, tracking, recognition and/or analysis using events-based vision sensor
US10515284B2 (en) 2014-09-30 2019-12-24 Qualcomm Incorporated Single-processor computer vision hardware control and application execution
US10728450B2 (en) 2014-09-30 2020-07-28 Qualcomm Incorporated Event based computer vision computation
US9923004B2 (en) 2014-09-30 2018-03-20 Qualcomm Incorporated Hardware acceleration of computer vision feature detection
US20170132466A1 (en) 2014-09-30 2017-05-11 Qualcomm Incorporated Low-power iris scan initialization
US9940533B2 (en) 2014-09-30 2018-04-10 Qualcomm Incorporated Scanning window for isolating pixel values in hardware for computer vision operations
US9830503B1 (en) * 2014-12-31 2017-11-28 Morphotrust Usa, Llc Object detection in videos
US9268465B1 (en) 2015-03-31 2016-02-23 Guguly Corporation Social media system and methods for parents
CN107924452B (en) 2015-06-26 2022-07-19 英特尔公司 Combined shape regression for face alignment in images
US10037456B2 (en) * 2015-09-04 2018-07-31 The Friedland Group, Inc. Automated methods and systems for identifying and assigning attributes to human-face-containing subimages of input images
US10366277B2 (en) * 2015-09-22 2019-07-30 ImageSleuth, Inc. Automated methods and systems for identifying and characterizing face tracks in video
US9858498B2 (en) 2015-09-23 2018-01-02 Qualcomm Incorporated Systems and methods for incremental object detection using dual-threshold local binary pattern operators
US10860887B2 (en) * 2015-11-16 2020-12-08 Samsung Electronics Co., Ltd. Method and apparatus for recognizing object, and method and apparatus for training recognition model
US10579860B2 (en) * 2016-06-06 2020-03-03 Samsung Electronics Co., Ltd. Learning model for salient facial region detection
US10375317B2 (en) 2016-07-07 2019-08-06 Qualcomm Incorporated Low complexity auto-exposure control for computer vision and imaging systems
US10614332B2 (en) 2016-12-16 2020-04-07 Qualcomm Incorportaed Light source modulation for iris size adjustment
US10984235B2 (en) 2016-12-16 2021-04-20 Qualcomm Incorporated Low power data generation for iris-related detection and authentication
KR102331651B1 (en) * 2017-03-20 2021-11-30 후아웨이 테크놀러지 컴퍼니 리미티드 Method and apparatus for recognizing descriptive properties of apparent features
US10552474B2 (en) 2017-08-16 2020-02-04 Industrial Technology Research Institute Image recognition method and device thereof
CN110395260B (en) * 2018-04-20 2021-12-07 比亚迪股份有限公司 Vehicle, safe driving method and device
WO2020113326A1 (en) 2018-12-04 2020-06-11 Jiang Ruowei Automatic image-based skin diagnostics using deep learning
TWI772627B (en) 2019-03-19 2022-08-01 財團法人工業技術研究院 Person re-identification method, person re-identification system and image screening method
CN111191569A (en) * 2019-12-26 2020-05-22 深圳市优必选科技股份有限公司 Face attribute recognition method and related device thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6016148A (en) * 1997-06-06 2000-01-18 Digital Equipment Corporation Automated mapping of facial images to animation wireframes topologies
US6031539A (en) * 1997-03-10 2000-02-29 Digital Equipment Corporation Facial image method and apparatus for semi-automatically mapping a face on to a wireframe topology
KR20070117922A (en) * 2006-06-09 2007-12-13 삼성전자주식회사 Method and system for fast and accurate face detection and face detection learning

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5642431A (en) * 1995-06-07 1997-06-24 Massachusetts Institute Of Technology Network-based system and method for detection of faces and the like
US7127087B2 (en) * 2000-03-27 2006-10-24 Microsoft Corporation Pose-invariant face recognition system and process
KR100442834B1 (en) * 2002-07-19 2004-08-02 삼성전자주식회사 Method and system for face detecting using classifier learned decision boundary with face/near-face images
US7505621B1 (en) * 2003-10-24 2009-03-17 Videomining Corporation Demographic classification using image components
JP2005190400A (en) * 2003-12-26 2005-07-14 Seiko Epson Corp Face image detection method, system, and program
CA2590227A1 (en) * 2004-12-07 2006-06-07 Clean Earth Technologies, Llc Method and apparatus for standoff detection of liveness
TWI318108B (en) * 2005-11-30 2009-12-11 Univ Nat Kaohsiung Applied Sci A real-time face detection under complex backgrounds
US7689011B2 (en) * 2006-09-26 2010-03-30 Hewlett-Packard Development Company, L.P. Extracting features from face regions and auxiliary identification regions of images for person recognition and other applications
CN101038686B (en) * 2007-01-10 2010-05-19 北京航空航天大学 Method for recognizing machine-readable travel certificate
TW200842733A (en) * 2007-04-17 2008-11-01 Univ Nat Chiao Tung Object image detection method
WO2012139273A1 (en) 2011-04-11 2012-10-18 Intel Corporation Method of detecting facial attributes

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6031539A (en) * 1997-03-10 2000-02-29 Digital Equipment Corporation Facial image method and apparatus for semi-automatically mapping a face on to a wireframe topology
US6016148A (en) * 1997-06-06 2000-01-18 Digital Equipment Corporation Automated mapping of facial images to animation wireframes topologies
KR20070117922A (en) * 2006-06-09 2007-12-13 삼성전자주식회사 Method and system for fast and accurate face detection and face detection learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP2697775A4 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8781221B2 (en) 2011-04-11 2014-07-15 Intel Corporation Hand gesture recognition system
US8805018B2 (en) 2011-04-11 2014-08-12 Intel Corporation Method of detecting facial attributes
CN109844761A (en) * 2016-10-19 2019-06-04 斯纳普公司 The neural net model establishing of face
CN109844761B (en) * 2016-10-19 2023-08-25 斯纳普公司 Method and system for facial modeling
EP3529747B1 (en) * 2016-10-19 2023-10-11 Snap Inc. Neural networks for facial modeling
EP4266249A3 (en) * 2016-10-19 2024-01-17 Snap Inc. Neural networks for facial modeling
CN109657582A (en) * 2018-12-10 2019-04-19 平安科技(深圳)有限公司 Recognition methods, device, computer equipment and the storage medium of face mood
CN109657582B (en) * 2018-12-10 2023-10-31 平安科技(深圳)有限公司 Face emotion recognition method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
TWI470563B (en) 2015-01-21
US8805018B2 (en) 2014-08-12
TW201305923A (en) 2013-02-01
EP2697775A4 (en) 2015-03-04
CN103503029B (en) 2016-08-17
EP2697775A1 (en) 2014-02-19
CN103503029A (en) 2014-01-08
US20140003663A1 (en) 2014-01-02

Similar Documents

Publication Publication Date Title
US8805018B2 (en) Method of detecting facial attributes
Saypadith et al. Real-time multiple face recognition using deep learning on embedded GPU system
US10534957B2 (en) Eyeball movement analysis method and device, and storage medium
US9471829B2 (en) Method of facial landmark detection
WO2016138838A1 (en) Method and device for recognizing lip-reading based on projection extreme learning machine
Lu et al. Face detection and recognition algorithm in digital image based on computer vision sensor
Rahimpour et al. Person re-identification using visual attention
Ban et al. Tiny and blurred face alignment for long distance face recognition
Vadlapati et al. Facial recognition using the OpenCV Libraries of Python for the pictures of human faces wearing face masks during the COVID-19 pandemic
Allaert et al. Optical flow techniques for facial expression analysis: Performance evaluation and improvements
Alshaikhli et al. Face-Fake-Net: The Deep Learning Method for Image Face Anti-Spoofing Detection: Paper ID 45
Lu et al. A smart system for face detection with spatial correlation improvement in IoT environment
Fung-Lung et al. An image acquisition method for face recognition and implementation of an automatic attendance system for events
US11138417B2 (en) Automatic gender recognition utilizing gait energy image (GEI) images
CN111753583A (en) Identification method and device
Venkata Kranthi et al. Real-time facial recognition using deep learning and local binary patterns
CN114140718A (en) Target tracking method, device, equipment and storage medium
Li An improved face detection method based on face recognition application
Hutagalung et al. The Effectiveness Of OpenCV Based Face Detection In Low-Light Environments
TWI632509B (en) Face recognition apparatus and method thereof, method for increasing image recognition accuracy, and computer-readable storage medium
Thomas et al. Real Time Face Mask Detection and Recognition using Python
Shanmuhappriya Automatic attendance monitoring system using deep learning
Srivastava et al. Face Verification System with Liveness Detection
Charran et al. Real-Time Identity Censorship of Videos to Enable Live Telecast Using NVIDIA Jetson Nano
Hbali et al. Object detection based on HOG features: Faces and dual-eyes augmented reality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11863314

Country of ref document: EP

Kind code of ref document: A1

DPE1 Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)
WWE Wipo information: entry into national phase

Ref document number: 13997310

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2011863314

Country of ref document: EP