WO2012131149A1 - Method apparatus and computer program product for detection of facial expressions - Google Patents

Method apparatus and computer program product for detection of facial expressions Download PDF

Info

Publication number
WO2012131149A1
WO2012131149A1 PCT/FI2012/050135 FI2012050135W WO2012131149A1 WO 2012131149 A1 WO2012131149 A1 WO 2012131149A1 FI 2012050135 W FI2012050135 W FI 2012050135W WO 2012131149 A1 WO2012131149 A1 WO 2012131149A1
Authority
WO
WIPO (PCT)
Prior art keywords
eye
face
locations
facial expression
sample faces
Prior art date
Application number
PCT/FI2012/050135
Other languages
French (fr)
Inventor
Veldandi Muninder
Shivaprasad ACHARYA
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Publication of WO2012131149A1 publication Critical patent/WO2012131149A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • Various implementations relate generally to method, apparatus, and computer program product for detecting facial expressions in media content.
  • Media content such as video and still pictures are widely accessed in variety of multimedia and other electronic devices.
  • Such media content may feature a variety of subject faces and their various facial expressions.
  • the facial expressions may include emotions such as happiness, anger, romance, shock, and joy.
  • user may desire to access certain frames of video, or like to access pictures having particular facial expressions.
  • user may desire to access scenes of interest such as scenes including smiling faces.
  • it may be useful if different facial expression may be determined in the images and/or videos, and this may enable images and/or videos to be further sorted and categorized according to the facial expressions of subjects.
  • a method comprising: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
  • an apparatus comprising: at least one processor; and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
  • a computer program product comprising at least one computer- readable storage medium, the computer-readable storage medium comprising a set of instructions, which, when executed by one or more processors, cause an apparatus at least to perform: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
  • an apparatus comprising: means for detecting a first eye location and a second eye location of a face in a media file; means for determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and means for generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
  • a computer program comprising program instructions which when executed by an apparatus, cause the apparatus to: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
  • FIGURE 1 illustrates a device in accordance with an example embodiment
  • FIGURE 2 illustrates an apparatus configured to detect facial expressions, in accordance with an example embodiment
  • FIGURE 3 is a schematic diagram representing an example of generating multiple sample faces corresponding to a face
  • FIGURE 4 is a plot 400 illustrative of detection of a facial expression, in accordance with an example embodiment
  • FIGURE 5 is a plot 500 illustrative of rejection of a presence state of the facial expression, in accordance with an example embodiment
  • FIGURE 6 is a flowchart depicting an example method 600 for detecting presence of a facial expression, in accordance with an example embodiment.
  • FIGURE 7 is a flowchart depicting an example method 700 for detecting presence of a facial expression, in accordance with another example embodiment.
  • FIGURES 1 through 7 of the drawings Example embodiments and their potential effects are understood by referring to FIGURES 1 through 7 of the drawings.
  • FIGURE 1 illustrates a device 100 in accordance with an example embodiment. It should be understood, however, that the device 100 as illustrated and hereinafter described is merely illustrative of one type of device that may benefit from various embodiments, therefore, should not be taken to limit the scope of the embodiments. As such, it should be appreciated that at least some of the components described below in connection with the device 100 may be optional and in an example embodiment may include more, less or different components than those described in connection with the example embodiment of FIGURE 1.
  • the device 100 could be any of a number of types of mobile electronic devices, for example, portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, cellular phones, all types of computers (for example, laptops, mobile computers or desktops), cameras, audio/video players, radios, global positioning system (GPS) devices, media players, mobile digital assistants, or any combination of the aforementioned, and other types of communications devices.
  • PDAs portable digital assistants
  • pagers mobile televisions
  • gaming devices for example, laptops, mobile computers or desktops
  • computers for example, laptops, mobile computers or desktops
  • GPS global positioning system
  • media players media players
  • mobile digital assistants or any combination of the aforementioned, and other types of communications devices.
  • the device 100 may include an antenna 102 (or multiple antennas) in operable communication with a transmitter 104 and a receiver 106.
  • the device 100 may further include an apparatus, such as a controller 108 or other processing device that provides signals to and receives signals from the transmitter 104 and receiver 106, respectively.
  • the signals may include signaling information in accordance with the air interface standard of the applicable cellular system, and/or may also include data corresponding to user speech, received data and/or user generated data.
  • the device 100 may be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types.
  • the device 100 may be capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like.
  • the device 100 may be capable of operating in accordance with second-generation (2G) wireless communication protocols IS- 136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA1000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as evolved- universal terrestrial radio access network (E-UTRAN), with fourth-generation (4G) wireless communication protocols, or the like.
  • 2G wireless communication protocols IS- 136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)
  • third-generation (3G) wireless communication protocols such as Universal Mobile Telecommunications System (UMTS), CDMA1000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as evolved- universal terrestrial
  • computer networks such as the Internet, local area network, wide area networks, and the like; short range wireless communication networks such as include Bluetooth ® networks, Zigbee ® networks, Institute of Electric and Electronic Engineers (IEEE) 802.1 lx networks, and the like; wireline telecommunication networks such as public switched telephone network (PSTN).
  • PSTN public switched telephone network
  • the controller 108 may include circuitry implementing, among others, audio and logic functions of the device 100.
  • the controller 108 may include, but are not limited to, one or more digital signal processor devices, one or more microprocessor devices, one or more processor(s) with accompanying digital signal processor(s), one or more processor(s) without accompanying digital signal processor(s), one or more special-purpose computer chips, one or more field-programmable gate arrays (FPGAs), one or more controllers, one or more application- specific integrated circuits (ASICs), one or more computer(s), various analog to digital converters, digital to analog converters, and/or other support circuits. Control and signal processing functions of the device 100 are allocated between these devices according to their respective capabilities.
  • the controller 108 may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission.
  • the controller 108 may additionally include an internal voice coder, and may include an internal data modem.
  • the controller 108 may include functionality to operate one or more software programs, which may be stored in a memory.
  • the controller 108 may be capable of operating a connectivity program, such as a conventional Web browser.
  • the connectivity program may then allow the device 100 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like.
  • WAP Wireless Application Protocol
  • HTTP Hypertext Transfer Protocol
  • the controller 108 may be embodied as a multi-core processor such as a dual or quad core processor. However, any number of processors may be included in the controller 108.
  • the device 100 may also comprise a user interface including an output device such as a ringer 110, an earphone or speaker 112, a microphone 1 14, a display 116, and a user input interface, which may be coupled to the controller 108.
  • the user input interface which allows the device 100 to receive data, may include any of a number of devices allowing the device 100 to receive data, such as a keypad 118, a touch display, a microphone or other input device.
  • the keypad 118 may include numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the device 100.
  • the keypad 118 may include a conventional QWERTY keypad arrangement.
  • the keypad 118 may also include various soft keys with associated functions.
  • the device 100 may include an interface device such as a joystick or other user input interface.
  • the device 100 further includes a battery 120, such as a vibrating battery pack, for powering various circuits that are used to operate the device 100, as well as optionally providing mechanical vibration as a detectable output.
  • the device 100 includes a media capturing element, such as a camera, video and/or audio module, in communication with the controller 108.
  • the media capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission.
  • the camera module 122 may include a digital camera capable of forming a digital image file from a captured image.
  • the camera module 122 includes all hardware, such as a lens or other optical component(s), and software for creating a digital image file from a captured image.
  • the camera module 122 may include only the hardware needed to view an image, while a memory device of the device 100 stores instructions for execution by the controller 108 in the form of software to create a digital image file from a captured image.
  • the camera module 122 may further include a processing element such as a co-processor, which assists the controller 108 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data.
  • the encoder and/or decoder may encode and/or decode according to a JPEG standard format or another like format.
  • the encoder and/or decoder may employ any of a plurality of standard formats such as, for example, standards associated with H.261 , H.262/ MPEG-2, H.263, H.264, H.264/MPEG-4, MPEG-4, and the like.
  • the camera module 122 may provide live image data to the display 116.
  • the display 116 may be located on one side of the device 100 and the camera module 122 may include a lens positioned on the opposite side of the device 100 with respect to the display 1 16 to enable the camera module 122 to capture images on one side of the device 100 and present a view of such images to the user positioned on the other side of the device 100.
  • the device 100 may further include a user identity module (UIM) 124.
  • the UIM 124 may be a memory device having a processor built in.
  • the UIM 124 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USEVI), a removable user identity module (R-UIM), or any other smart card.
  • SIM subscriber identity module
  • UICC universal integrated circuit card
  • USEVI universal subscriber identity module
  • R-UIM removable user identity module
  • the UEVI 124 typically stores information elements related to a mobile subscriber.
  • the device 100 may be equipped with memory.
  • the device 100 may include volatile memory 126, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data.
  • RAM volatile Random Access Memory
  • the device 100 may also include other non-volatile memory 128, which may be embedded and/or may be removable.
  • the non-volatile memory 128 may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory, hard drive, or the like.
  • the memories may store any number of pieces of information, and data, used by the device 100 to implement the functions of the device 100.
  • FIGURE 2 illustrates an apparatus 200 configured to detect facial expression(s), in accordance with an example embodiment.
  • the apparatus 200 may be employed, for example, in the device 100 of FIGURE 1. However, it should be noted that the apparatus 200, may also be employed on a variety of other devices both mobile and fixed, and therefore, embodiments should not be limited to application on devices such as the device 100 of FIGURE 1. Alternatively or additionally, embodiments may be employed on a combination of devices including, for example, those listed above. Accordingly, various embodiments may be embodied wholly at a single device, for example, the device 100 or in a combination of devices. It should be noted that some devices or elements described below may not be mandatory and some may be omitted in certain embodiments.
  • the apparatus 200 includes or otherwise is in communication with at least one processor 202 and at least one memory 204.
  • the at least one memory 204 include, but are not limited to, volatile and/or non-volatile memories.
  • volatile memory include random access memory, dynamic random access memory, static random access memory, and the like.
  • non-volatile memory includes hard disks, magnetic tapes, optical disks, programmable read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, flash memory, and the like.
  • the memory 204 may be configured to store information, data, applications, instructions or the like for enabling the apparatus 200 to carry out various functions in accordance with various example embodiments.
  • the memory 204 may be configured to buffer input data for processing by the processor 202. Additionally or alternatively, the memory 204 may be configured to store instructions for execution by the processor 202. In an example embodiment, the memory 204 may be configured to store content, such as a media file.
  • An example of processor 202 may include the controller 108.
  • the processor 202 may be embodied in a number of different ways. The processor 202 may be embodied as a multi-core processor, a single core processor; or combination of multi-core processors and single core processors.
  • the processor 202 may be embodied as one or more of various processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • the multi-core processor may be configured to execute instructions stored in the memory 204 or otherwise accessible to the processor 202.
  • the processor 202 may be configured to execute hard coded functionality.
  • the processor 202 may represent an entity, for example, physically embodied in circuitry, capable of performing operations according to various embodiments while configured accordingly.
  • the processor 202 may be specifically configured hardware for conducting the operations described herein.
  • the processor 202 is embodied as an executor of software instructions, the instructions may specifically configure the processor 202 to perform the algorithms and/or operations described herein when the instructions are executed.
  • the processor 202 may be a processor of a specific device, for example, a mobile terminal or network device adapted for employing embodiments by further configuration of the processor 202 by instructions for performing the algorithms and/or operations described herein.
  • the processor 202 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 202.
  • ALU arithmetic logic unit
  • a user interface 206 may be in communication with the processor 202. Examples of the user interface 206 include but are not limited to, input interface and/or output user interface.
  • the input interface is configured to receive an indication of a user input.
  • the output user interface provides an audible, visual, mechanical or other output and/or feedback to the user. Examples of the input interface may include, but are not limited to, a keyboard, a mouse, a joystick, a keypad, a touch screen, soft keys, and the like.
  • the output interface may include, but are not limited to, a display such as light emitting diode display, thin-film transistor (TFT) display, liquid crystal displays, active-matrix organic light-emitting diode (AMOLED) display, a microphone, a speaker, ringers, vibrators, and the like.
  • the user interface 206 may include, among other devices or elements, any or all of a speaker, a microphone, a display, and a keyboard, touch screen, or the like.
  • the processor 202 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface 206, such as, for example, a speaker, ringer, microphone, display, and/or the like.
  • the processor 202 and/or user interface circuitry comprising the processor 202 may be configured to control one or more functions of one or more elements of the user interface 206 through computer program instructions, for example, software and/or firmware, stored on a memory, for example, the at least one memory 204, and/or the like, accessible to the processor 202.
  • An image sensor 208 may be in communication with the processor 202 and/or other components of the apparatus 200.
  • the image sensor 208 may be in communication with other imaging circuitries and/or software, and is configured to capture digital images or to make a video or other graphic media files.
  • the image sensor 208 and other circuitries, in combination, may be an example of the camera module 122 of the device 100.
  • the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to detect a pair of eye location for a face in a media file.
  • the pair of eye location comprises a first eye location and a second eye location.
  • the first eye location and the second eye location may correspond to the left eye and right eye of a face.
  • the media file may be any image, video, any other graphic content that can feature faces.
  • the media file may be received from internal memory such as hard drive, random access memory (RAM) of the apparatus 200, or from the memory 204, or from external storage medium such as digital versatile disk (DVD), compact disk (CD), flash drive, memory card, or from external storage locations through the Internet, local area network, Bluetooth ® , and the like.
  • the media file such as the image or the video may be instantaneously captured by the image sensor 204 and other circuitries.
  • a processing means may be configured to detect the first eye location and the second eye location for the face in the media file.
  • An example of the processing means may include the processor 202, which may be an example of the controller 108.
  • the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to determine a first set of eye locations and a second set of eye locations.
  • the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location
  • the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location.
  • the plurality of locations neighbouring the first eye location and the second eye location may be determined based on a distance of threshold number of pixels from the first eye location and the second eye location.
  • the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to generate a set of sample faces corresponding to the face.
  • eye locations of a sample face comprise an eye location from the first set of eye locations and an eye location from the second set of eye locations.
  • various different combinations of eyes for example, left eyes may be selected from five eye locations of the first set of eye locations and the right eyes may be selected from five eye locations of the second set of eye locations.
  • a total of 25 different eyes may be configured.
  • from 25 different eyes equal number of sample faces may also be generated.
  • the processor 202 may be configured to, with the content of the memory 204, and optionally with other components described herein, to generate the set of sample faces based on the eye locations from the first set of eye locations and the second set of eye locations, and at least one point corresponding to a portion of the face.
  • other points corresponding to other portion(s) of the face may be determined.
  • An example of the portion of the face may include a nose of the face.
  • Other examples of the portion may include, but are not limited to, chin, mouth, centre of forehead, front teeth, ear and remaining portion of the face.
  • the set of sample faces may be generated.
  • the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to determine a presence of a facial expression in the set of sample faces.
  • the facial expression may be a smile expression.
  • the facial expression may also be grief, anger, or any other emotional or behavioral expression.
  • the presence of the facial expression in a sample face may be determined based on processing of the sample face by a facial expression classifier.
  • the sample faces may be provided to a smile classifier which can detect the presence of the smile expression in the sample faces.
  • the smile classifier may be a classifier that is trained on attributes of smiling face samples.
  • the smile classifier may be a pattern recognition based classifier that is trained by a number of smiling face samples for an expression.
  • the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to count a number of sample faces in which the facial expression is present.
  • the processor 202 is configured to detect a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value. For example, assuming the media file is an image comprises a face, and 25 sample faces corresponding to the face are generated. In an example, consider there are 18 sample faces for which smile expressions are detected. In this example, if the first threshold value is assumed to be 17, the presence of the smile expression is determined in the face in the image, as number of sample faces determined to comprise smile (18) is greater than first threshold value (17).
  • Various example embodiments may be utilized to detect the presence of the facial expression in a face in a media file where the face might appear in consecutive frames such as in a video, or in any other graphic media file.
  • the number of sample faces in which the presence of the facial expression is determined is counted for some of the consecutive frames, and the presence of the facial expression is detected based on the counted number of the sample faces for the consecutive frames.
  • the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to determine presence of a facial expression in the set of sample faces in a plurality of frames of the media file.
  • the plurality of frames may be consecutive frames of the media file, such as video.
  • the processor 202 is configured to, cause the apparatus 200 to count, for the plurality of frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present.
  • the apparatus 200 is caused to calculate an average of the number of sample faces for a first threshold number of consecutive frames. In an example, consider the first threshold number of consecutive frames to be 10.
  • a processing means may be configured to determine presence of a facial expression in the set of sample faces in a plurality of frames of the media file.
  • the processing means may also be configured to count, for the plurality of frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present.
  • the processing means may also be configured to calculate an average of the number of sample faces for a first threshold number of consecutive frames.
  • An example of the processing means may include the processor 202, which may be an example of the controller 108.
  • the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to detect a presence of the facial expression in the face if the average of the number of sample faces is greater than a first threshold value. For example, if the first threshold value is 17, the calculated average (17.6) is greater than 17, and the face in a current frame of the video may be detected as comprising the smile expression. In an alternate example embodiment, presence of a facial expression in the set of sample faces may be determined in a set of consecutive frames of the media file. In this example embodiment, the apparatus 200 is caused to count, for the set of consecutive frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present.
  • the apparatus 200 is caused to detect the presence of the facial expression in the face if the number of sample faces is greater than a first threshold value for each frame of the set of consecutive frames. For example, if it is detected that for each of the last seven consecutive frames, the number of samples faces is greater than 17, the face in the current frames may be determined as comprising the facial expression.
  • a face is detected as comprising the facial expression, such as the smile expression
  • a smiling state may be assigned to the face in the current frame of the video.
  • the smiling state for the face persists until a smiling state rejection criterion is satisfied.
  • the apparatus 200 is caused to calculate the average of the number of sample faces for the second threshold number of consecutive frames. For example, if the second threshold number of consecutive frames is 20, the apparatus 200 may be caused to calculate the average of number of sample faces for 20 consecutive frames in the video.
  • the apparatus 200 may include at least one buffer or array that can store the number of sample faces in which the facial expression is determined in a frame wise manner. In an example embodiment, the apparatus 200 may include a buffer/array that can store the number of sample faces in last first threshold number of consecutive frames, for example last 10 consecutive frames.
  • the stored numbers of sample faces for the last 10 consecutive frames may be utilized to calculate the average of the number of sample faces for the last 10 consecutive frames at any current frame of the video.
  • the apparatus may include a buffer/array that can store the number of sample faces in last second threshold number of consecutive frames, for example, last 20 consecutive frames.
  • the stored numbers of sample faces for the last 20 consecutive frames may be utilized to calculate the average of the number of sample faces for the last 20 consecutive frames at any current frame of the video.
  • the apparatus 200 may include a single bugger/array that can store number of sample faces for last N frames, where N can be any integer value.
  • the one or more buffers/arrays for storing the numbers of sample faces may be stored in the memory 204.
  • the apparatus 200 may comprise a communication device.
  • the communication device may include, but is not limited to, a mobile phone, a personal digital assistant (PDA), a notebook, a tablet personal computer (PC), and a global positioning device (GPS).
  • the communication device may comprise an image sensor.
  • the image sensor, along with other components may be configured to facilitate a user to capture images or videos of human faces.
  • An example of the image sensor and the other components may be the camera module 122.
  • the communication device may comprise a user interface circuitry and user interface software configured to facilitate a user to control at least one function of the communication device through use of a display and further configured to respond to user inputs.
  • the user interface circuitry may be similar to the user interface explained in FIGURE 1 and the description is not included herein for sake of brevity of description.
  • the communication device may include a display circuitry configured to display at least a portion of a user interface of the communication device, the display and display circuitry configured to facilitate the user to control at least one function of the communication device.
  • the communication device may include typical components such as a transceiver (such as transmitter 104 and a receiver 106), volatile and non-volatile memory (such as volatile memory 126 and non-volatile memory 128), and the like. The various components of the communication device are not included herein for the sake of brevity of description.
  • FIGURE 3 is a schematic diagram representing an example of generating multiple sample faces corresponding to a face, in accordance with an example embodiment. As discussed below, the FIGURE 3 provides an example of a manner in which various sample faces corresponding to a particular face may be generated.
  • a processing means for example, the processor 200 or the controller 108 may be configured to detect eyes of a face.
  • the processing means may be configured to detect the eyes of the face that may be present in a media file, such as an image, or a frame of a video.
  • a first eye location and second eye location corresponding to the left eye and the right eye, respectively, of the face may be detected.
  • a first eye location 310 and a second eye location 320 of a particular face are detected.
  • a first set of eye locations and a second set of eye locations are generated.
  • the first set of eye locations comprises the first eye location 310 and a plurality of neighbouring locations 312, 314, 316 and 318.
  • the second set of eye locations comprises the second eye location 320 and a plurality of neighbouring locations 322, 324, 326 and 328.
  • there are five possible eye locations in the first set of eye locations that may be five possible locations for the left eye.
  • N sample faces may be generated using the N eye points, for example, 25 sample faces may be generated from 25 eye pairs.
  • the N sample faces may be normalized to generate N normalized sample faces.
  • coordinates of the first eye location and a second eye location may be stored in a memory, such as the memory 204 or any other storage contained or in communication with the apparatus 200.
  • the coordinates corresponding to the first eye location may be represented as LxO and LyO; and the coordinates corresponding to the second eye location may be represented as RxO and RyO in a frame of a media file.
  • 4 points are considered at a distance of Dx pixels from the both eye locations (LxO, LyO) and (RxO, RyO) as below
  • the coordinates would be LxO-Dx, LyO, and for the eye location 316, the coordinates would be LxO, LyO-Dx.
  • the coordinates would be RxO-Dx, RyO, and for the eye location 326, the coordinates would be RxO, RyO-Dx.
  • a more or less number of (than 4) neighbouring eye locations may be determined to generate various eye pairs.
  • the processing means may be configured to generate sample faces corresponding to these eye pairs.
  • Some examples of outlines of sample faces are shown in FIGURE 3, for example, an outline 330 of a sample face is shown that may be generated from the pair of eye locations 310 and 320.
  • an outline 332 of a sample face is shown that may be generated from the pair of eye locations 312 and 324, and in similar manner 25 sample faces may be generated from 25 different pairs of eye locations.
  • FIGURE 4 is a plot 400 illustrative of detection of a facial expression, in accordance with an example embodiment.
  • the plot 400 illustrates frame numbers of the video (on X-axis) and number of sample faces in which presence of a facial expression is determined (on Y-axis).
  • the plot 400 may correspond to the example of 25 sample faces as generated in FIGURE 3.
  • the plot 400 comprises a graph 410 and a graph 420.
  • the graph 410 comprises variation of number of sample faces that are determined as comprising the facial expression with respect to frames of the video.
  • the number of sample faces that are detected as comprising the facial expression in a frame is also referred to as 'presence score' of the frame.
  • the facial expression is a smile expression in a face
  • the presence score of the face in a frame refers to the number of sample faces that are determined as smiling in the frame.
  • the frame number refers to number of frames of a media file, such as a video file.
  • presence score for a face between frame numbers 350 to 650 of the video is plotted.
  • the graph 420 represents an average of presence scores for a first threshold number of consecutive frames of the video.
  • the graph 420 may represent average of the presence scores of last 10 frames of the video.
  • the average of the presence scores exceeds a first threshold value, presence of the smile expression is detected in the face in the current frame.
  • the first threshold value is equal to 15
  • the value of the average of the presence scores exceed the first threshold value in frame number 472 (shown by reference numeral 422).
  • the presence of smile expression may be detected in the face in the frame number 472 of the video.
  • average of the presence scores in last 10 consecutive frames is considered for example purposes, and as such, any other number of frames may be considered for calculating the average.
  • the facial expression may be detected in the face in the current frame. For example, if the presence scores for the seven consecutive frames are greater than the first threshold value (15), the presence of the facial expression may be detected. For example, as shown in FIGURE 3, between the frames 465 and 472, the smile confidence value remains greater than 15, and accordingly, the face in the current frame may be determined as a smiling face at the frame 472.
  • a threshold value such as the first threshold value for a threshold number of consecutive frames
  • a presence state of the facial expression may be assigned to the face, for example, a smiling state may be assigned to at the frame 472.
  • a smiling state of the face is rejected if the average of the number of sample faces for a second threshold number of consecutive frames is less than a second threshold value. Such rejection of the smiling state is explained with a plot illustrated in FIGURE 5.
  • FIGURE 5 is a plot 500 illustrative of rejection of a presence state of a facial expression, in accordance with an example embodiment.
  • the plot 500 illustrates frame numbers of the video (on X-axis) and number of sample faces (presence score) in which presence of a facial expression is determined (on Y-axis).
  • the plot 500 may correspond to the example embodiment of 25 sample faces as generated in FIGURE 3.
  • it may be determined that a particular face is not smiling if the presence score for the second threshold number of consecutive frames becomes less than the second threshold value.
  • the second threshold value is assumed to as equal to ⁇ '.
  • the plot 500 comprises a graph 510 and a graph 520.
  • the graph 510 comprises variation of the presence score with respect to frames of the video.
  • the graph 520 represents variation of the average of the presence scores for the second threshold number of consecutive frames, with respect to frames of the video. For example, any value on the graph 520 may represent average of the presence scores for the last 20 frames of the video, at a current frame of the video.
  • the smiling state of the face is rejected, if the average of the presence score becomes less than the second threshold value.
  • the value of the average of the presence scores becomes less than second threshold value in frame number 490.
  • the smiling state of the face is rejected at the frame number 490.
  • the first threshold value is utilized for detecting the presence of the facial expression, and the face persists in the smiling state until the average of the number of sample faces becomes less than the second threshold value.
  • FIGURE 6 is a flowchart depicting an example method 600 for detecting presence of a facial expression in a media file in accordance with an example embodiment.
  • the method 600 depicted in flow chart may be executed by, for example, the apparatus 200 of FIGURE 2.
  • Operations of the flowchart, and combinations of operation in the flowchart may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions.
  • one or more of the procedures described in various embodiments may be embodied by computer program instructions.
  • the computer program instructions, which embody the procedures, described in various embodiments may be stored by at least one memory device of an apparatus and executed by at least one processor in the apparatus.
  • Any such computer program instructions may be loaded onto a computer or other programmable apparatus (for example, hardware) to produce a machine, such that the resulting computer or other programmable apparatus embody means for implementing the operations specified in the flowchart.
  • These computer program instructions may also be stored in a computer-readable storage memory (as opposed to a transmission medium such as a carrier wave or electromagnetic signal) that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the operations specified in the flowchart.
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions, which execute on the computer or other programmable apparatus provide operations for implementing the operations in the flowchart.
  • the operations of the method 600 are described with help of apparatus 200. However, the operations of the method 600 can be described and/or practiced by using any other apparatus.
  • the flowchart diagrams that follow are generally set forth as logical flowchart diagrams.
  • the depicted operations and sequences thereof are indicative of at least one embodiment. While various arrow types, line types, and formatting styles may be employed in the flowchart diagrams, they are understood not to limit the scope of the corresponding method.
  • some arrows, connectors and other formatting features may be used to indicate the logical flow of the methods. For instance, some arrows or connectors may indicate a waiting or monitoring period of an unspecified duration. Accordingly, the specifically disclosed operations, sequences, and formats are provided to explain the logical flow of the method and are understood not to limit the scope of the present disclosure.
  • a first eye location and a second eye location of a face is detected in an image.
  • the first eye location may correspond to the left eye of the face and the second eye location may correspond to the right eye of the face.
  • a first set of eye locations and a second set of eye locations are determined.
  • the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location
  • the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location.
  • the plurality of locations neighbouring the first eye location and the second eye location may be determined based on a distance of threshold number of pixels from the first eye location and the second eye location, as explained in FIGURES 2 and 3.
  • a set of sample faces corresponding to the face are generated.
  • eyes of a sample face correspond to an eye location from the first set of eye locations and an eye location from the second set of eye locations.
  • various different combinations of eyes may be generated, for example, left eyes may be selected from the first set of eye locations and the right eyes may be selected from the second set of eye locations, and sample faces may be generated from these various combinations of eyes.
  • presence of a facial expression for example, a smile expression is determined in the set of sample faces. As described in FIGURE 3, in an example embodiment, the presence of the smile expression in a sample face may be determined based on processing of the sample face by a smile classifier.
  • a number of sample faces is counted in which the presence of the facial expression is determined.
  • a presence of the facial expression in the face is detected if the number of sample faces is greater than a first threshold value. For example, in 25 sample faces corresponding to a face in an image, presence of the smile expression is determined in 17 sample faces. In this example, if the first threshold value is 15 sample faces, the number of sample faces in which the smile expression is determined (17) is greater than 15. In this example, the face in the image may be detected as comprising the smile expression.
  • FIGURE 7 Various example embodiments may be utilized to detect the presence of the facial expression in a face is media file where the face appears in consecutive frames such as in a video, or in any other graphic file. Some of these example embodiments are described in FIGURE 7.
  • FIGURE 7 is a flowchart depicting an example method 700 for detecting presence of a facial expression, in accordance with an example embodiment.
  • the method 700 depicted in flow chart may be executed by, for example, the apparatus 200 of FIGURE 2.
  • Operations of the flowchart, and combinations of operation in the flowchart may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions.
  • the different functions discussed in FIGURE 7 may be performed in a different order and/or concurrently with each other.
  • one or more of these functions may be optional or may be combined.
  • a first eye location and a second eye location of a face is detected in a video.
  • the fist eye location and the second eye location of the face may be detected in a current frame of the video.
  • a first set of eye locations and a second set of eye locations are determined, in the current frame of the video.
  • at block 706 at least one point corresponding to a portion of the face is determined.
  • An example of the portion of the face may include nose of the face.
  • Other examples of the portion may include, but are not limited to, chin, mouth, centre of forehead, front teeth, ear and remaining portion of the face.
  • a set of sample faces corresponding to the face are generated, in a current frame of the video.
  • the sample faces are generated based on the eye locations from the first set of eye locations and the second set of eye locations, and the at least one point such as nose of the face.
  • the sample faces corresponding to the face may be generated based on the eye locations from the first set of eye locations and the second set of eye locations, only.
  • presence of a facial expression in the set of sample faces is determined.
  • the presence of the facial expression is determined in a plurality of frames of the media file.
  • a number of sample faces of the set of sample faces may be counted in which the facial expression is determined as present.
  • the number of sample faces corresponding to each frame may be stored in a memory.
  • an average of the number of sample faces for a first threshold number of consecutive frames is calculated.
  • the average may be calculated based on numbers of sample faces in first threshold number of last consecutive frames that are stored in the memory.
  • the presence of the facial expression in the face is detected in the current frame of the video. If the average of the number of sample faces is less than or equal to the first threshold value, it may be determined that the facial expression is absent in the face in the current frame of the video, at block 720.
  • a face is detected as comprising the facial expression, such as the smile expression, a smiling state may be assigned to the face in the current frame of the video.
  • average of the number of sample faces for a second threshold number of consecutive frame is calculated.
  • a technical effect of one or more of the example embodiments disclosed herein is to detect facial expressions in faces in media files such as images, and videos.
  • Various embodiments generate multiple sample faces corresponding to a single face, and detect presence of the facial expression in the face on the basis of determining presence of the facial expression in the multiple sample faces.
  • Such phenomena used in various embodiments enhance the accuracy of the detection of the facial expressions, in a significant manner.
  • the first threshold value, the second threshold value, first threshold number of consecutive frames, and the second threshold number of consecutive frames may be selectively chosen.
  • Various embodiments described above may be implemented in software, hardware, application logic or a combination of software, hardware and application logic.
  • the software, application logic and/or hardware may reside on at least one memory, at least one processor, an apparatus or, a computer program product.
  • the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media.
  • a "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer, with one example of an apparatus described and depicted in FIGURES 1 and/or 2.
  • a computer-readable medium may comprise a computer-readable storage medium that may be any media or means that can contain or store the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined.

Abstract

In accordance with an example embodiment a method and apparatus are provided. The method comprises detecting a first eye location and a second eye location of a face in a media file, and determining a first set of eye locations and a second set of eye locations. The first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location. The method also comprises generating a set of sample faces corresponding to the face, where the set of sample faces comprises eye locations from the first set of eye locations and the second set of eye locations.

Description

METHOD APPARATUS AND COMPUTER PROGRAM PRODUCT FOR DETECTION
OF FACIAL EXPRESSIONS
TECHNICAL FIELD
Various implementations relate generally to method, apparatus, and computer program product for detecting facial expressions in media content.
BACKGROUND
Media content such as video and still pictures are widely accessed in variety of multimedia and other electronic devices. Such media content may feature a variety of subject faces and their various facial expressions. Examples of the facial expressions may include emotions such as happiness, anger, sorrow, shock, and joy. At times, user may desire to access certain frames of video, or like to access pictures having particular facial expressions. For example, user may desire to access scenes of interest such as scenes including smiling faces. Further, it may be useful if different facial expression may be determined in the images and/or videos, and this may enable images and/or videos to be further sorted and categorized according to the facial expressions of subjects.
SUMMARY OF SOME EMBODIMENTS
Various aspects of example embodiments are set out in the claims. In a first aspect, there is provided a method comprising: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
In a second aspect, there is provided an apparatus comprising: at least one processor; and at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
In a third aspect, there is provided a computer program product comprising at least one computer- readable storage medium, the computer-readable storage medium comprising a set of instructions, which, when executed by one or more processors, cause an apparatus at least to perform: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
In a fourth aspect, there is provided an apparatus comprising: means for detecting a first eye location and a second eye location of a face in a media file; means for determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and means for generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
In a fifth aspect, there is provided a computer program comprising program instructions which when executed by an apparatus, cause the apparatus to: detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
BRIEF DESCRIPTION OF THE FIGURES
For more understanding of example embodiments, reference is now made to the following descriptions taken in connection with the accompanying drawings in which:
FIGURE 1 illustrates a device in accordance with an example embodiment; FIGURE 2 illustrates an apparatus configured to detect facial expressions, in accordance with an example embodiment;
FIGURE 3 is a schematic diagram representing an example of generating multiple sample faces corresponding to a face;
FIGURE 4 is a plot 400 illustrative of detection of a facial expression, in accordance with an example embodiment;
FIGURE 5 is a plot 500 illustrative of rejection of a presence state of the facial expression, in accordance with an example embodiment;
FIGURE 6 is a flowchart depicting an example method 600 for detecting presence of a facial expression, in accordance with an example embodiment; and
FIGURE 7 is a flowchart depicting an example method 700 for detecting presence of a facial expression, in accordance with another example embodiment.
DETAILED DESCRIPTION
Example embodiments and their potential effects are understood by referring to FIGURES 1 through 7 of the drawings.
FIGURE 1 illustrates a device 100 in accordance with an example embodiment. It should be understood, however, that the device 100 as illustrated and hereinafter described is merely illustrative of one type of device that may benefit from various embodiments, therefore, should not be taken to limit the scope of the embodiments. As such, it should be appreciated that at least some of the components described below in connection with the device 100 may be optional and in an example embodiment may include more, less or different components than those described in connection with the example embodiment of FIGURE 1. The device 100 could be any of a number of types of mobile electronic devices, for example, portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, cellular phones, all types of computers (for example, laptops, mobile computers or desktops), cameras, audio/video players, radios, global positioning system (GPS) devices, media players, mobile digital assistants, or any combination of the aforementioned, and other types of communications devices.
The device 100 may include an antenna 102 (or multiple antennas) in operable communication with a transmitter 104 and a receiver 106. The device 100 may further include an apparatus, such as a controller 108 or other processing device that provides signals to and receives signals from the transmitter 104 and receiver 106, respectively. The signals may include signaling information in accordance with the air interface standard of the applicable cellular system, and/or may also include data corresponding to user speech, received data and/or user generated data. In this regard, the device 100 may be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the device 100 may be capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like. For example, the device 100 may be capable of operating in accordance with second-generation (2G) wireless communication protocols IS- 136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA1000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as evolved- universal terrestrial radio access network (E-UTRAN), with fourth-generation (4G) wireless communication protocols, or the like. As an alternative (or additionally), the device 100 may be capable of operating in accordance with non-cellular communication mechanisms. For example, computer networks such as the Internet, local area network, wide area networks, and the like; short range wireless communication networks such as include Bluetooth® networks, Zigbee® networks, Institute of Electric and Electronic Engineers (IEEE) 802.1 lx networks, and the like; wireline telecommunication networks such as public switched telephone network (PSTN).
The controller 108 may include circuitry implementing, among others, audio and logic functions of the device 100. For example, the controller 108 may include, but are not limited to, one or more digital signal processor devices, one or more microprocessor devices, one or more processor(s) with accompanying digital signal processor(s), one or more processor(s) without accompanying digital signal processor(s), one or more special-purpose computer chips, one or more field-programmable gate arrays (FPGAs), one or more controllers, one or more application- specific integrated circuits (ASICs), one or more computer(s), various analog to digital converters, digital to analog converters, and/or other support circuits. Control and signal processing functions of the device 100 are allocated between these devices according to their respective capabilities. The controller 108 may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. The controller 108 may additionally include an internal voice coder, and may include an internal data modem. Further, the controller 108 may include functionality to operate one or more software programs, which may be stored in a memory. For example, the controller 108 may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the device 100 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like. In an example embodiment, the controller 108 may be embodied as a multi-core processor such as a dual or quad core processor. However, any number of processors may be included in the controller 108.
The device 100 may also comprise a user interface including an output device such as a ringer 110, an earphone or speaker 112, a microphone 1 14, a display 116, and a user input interface, which may be coupled to the controller 108. The user input interface, which allows the device 100 to receive data, may include any of a number of devices allowing the device 100 to receive data, such as a keypad 118, a touch display, a microphone or other input device. In embodiments including the keypad 118, the keypad 118 may include numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the device 100. Alternatively or additionally, the keypad 118 may include a conventional QWERTY keypad arrangement. The keypad 118 may also include various soft keys with associated functions. In addition, or alternatively, the device 100 may include an interface device such as a joystick or other user input interface. The device 100 further includes a battery 120, such as a vibrating battery pack, for powering various circuits that are used to operate the device 100, as well as optionally providing mechanical vibration as a detectable output.
In an example embodiment, the device 100 includes a media capturing element, such as a camera, video and/or audio module, in communication with the controller 108. The media capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission. In an example embodiment in which the media capturing element is a camera module 122, the camera module 122 may include a digital camera capable of forming a digital image file from a captured image. As such, the camera module 122 includes all hardware, such as a lens or other optical component(s), and software for creating a digital image file from a captured image. Alternatively or additionally, the camera module 122 may include only the hardware needed to view an image, while a memory device of the device 100 stores instructions for execution by the controller 108 in the form of software to create a digital image file from a captured image. In an example embodiment, the camera module 122 may further include a processing element such as a co-processor, which assists the controller 108 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a JPEG standard format or another like format. For video, the encoder and/or decoder may employ any of a plurality of standard formats such as, for example, standards associated with H.261 , H.262/ MPEG-2, H.263, H.264, H.264/MPEG-4, MPEG-4, and the like. In some cases, the camera module 122 may provide live image data to the display 116. Moreover, in an example embodiment, the display 116 may be located on one side of the device 100 and the camera module 122 may include a lens positioned on the opposite side of the device 100 with respect to the display 1 16 to enable the camera module 122 to capture images on one side of the device 100 and present a view of such images to the user positioned on the other side of the device 100.
The device 100 may further include a user identity module (UIM) 124. The UIM 124 may be a memory device having a processor built in. The UIM 124 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USEVI), a removable user identity module (R-UIM), or any other smart card. The UEVI 124 typically stores information elements related to a mobile subscriber. In addition to the UIM 124, the device 100 may be equipped with memory. For example, the device 100 may include volatile memory 126, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data. The device 100 may also include other non-volatile memory 128, which may be embedded and/or may be removable. The non-volatile memory 128 may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory, hard drive, or the like. The memories may store any number of pieces of information, and data, used by the device 100 to implement the functions of the device 100. FIGURE 2 illustrates an apparatus 200 configured to detect facial expression(s), in accordance with an example embodiment. The apparatus 200 may be employed, for example, in the device 100 of FIGURE 1. However, it should be noted that the apparatus 200, may also be employed on a variety of other devices both mobile and fixed, and therefore, embodiments should not be limited to application on devices such as the device 100 of FIGURE 1. Alternatively or additionally, embodiments may be employed on a combination of devices including, for example, those listed above. Accordingly, various embodiments may be embodied wholly at a single device, for example, the device 100 or in a combination of devices. It should be noted that some devices or elements described below may not be mandatory and some may be omitted in certain embodiments.
The apparatus 200 includes or otherwise is in communication with at least one processor 202 and at least one memory 204. Examples of the at least one memory 204 include, but are not limited to, volatile and/or non-volatile memories. Some examples of the volatile memory include random access memory, dynamic random access memory, static random access memory, and the like. Some example of the non-volatile memory includes hard disks, magnetic tapes, optical disks, programmable read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, flash memory, and the like. The memory 204 may be configured to store information, data, applications, instructions or the like for enabling the apparatus 200 to carry out various functions in accordance with various example embodiments. For example, the memory 204 may be configured to buffer input data for processing by the processor 202. Additionally or alternatively, the memory 204 may be configured to store instructions for execution by the processor 202. In an example embodiment, the memory 204 may be configured to store content, such as a media file. An example of processor 202 may include the controller 108. The processor 202 may be embodied in a number of different ways. The processor 202 may be embodied as a multi-core processor, a single core processor; or combination of multi-core processors and single core processors. For example, the processor 202 may be embodied as one or more of various processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. In an example embodiment, the multi-core processor may be configured to execute instructions stored in the memory 204 or otherwise accessible to the processor 202. Alternatively or additionally, the processor 202 may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 202 may represent an entity, for example, physically embodied in circuitry, capable of performing operations according to various embodiments while configured accordingly. For example, if the processor 202 is embodied as two or more of an ASIC, FPGA or the like, the processor 202 may be specifically configured hardware for conducting the operations described herein. Alternatively, as another example, if the processor 202 is embodied as an executor of software instructions, the instructions may specifically configure the processor 202 to perform the algorithms and/or operations described herein when the instructions are executed. In some cases, the processor 202 may be a processor of a specific device, for example, a mobile terminal or network device adapted for employing embodiments by further configuration of the processor 202 by instructions for performing the algorithms and/or operations described herein. The processor 202 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 202.
A user interface 206 may be in communication with the processor 202. Examples of the user interface 206 include but are not limited to, input interface and/or output user interface. The input interface is configured to receive an indication of a user input. The output user interface provides an audible, visual, mechanical or other output and/or feedback to the user. Examples of the input interface may include, but are not limited to, a keyboard, a mouse, a joystick, a keypad, a touch screen, soft keys, and the like. Examples of the output interface may include, but are not limited to, a display such as light emitting diode display, thin-film transistor (TFT) display, liquid crystal displays, active-matrix organic light-emitting diode (AMOLED) display, a microphone, a speaker, ringers, vibrators, and the like. In an example embodiment, the user interface 206 may include, among other devices or elements, any or all of a speaker, a microphone, a display, and a keyboard, touch screen, or the like. In this regard, for example, the processor 202 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface 206, such as, for example, a speaker, ringer, microphone, display, and/or the like. The processor 202 and/or user interface circuitry comprising the processor 202 may be configured to control one or more functions of one or more elements of the user interface 206 through computer program instructions, for example, software and/or firmware, stored on a memory, for example, the at least one memory 204, and/or the like, accessible to the processor 202. An image sensor 208 may be in communication with the processor 202 and/or other components of the apparatus 200. The image sensor 208 may be in communication with other imaging circuitries and/or software, and is configured to capture digital images or to make a video or other graphic media files. The image sensor 208 and other circuitries, in combination, may be an example of the camera module 122 of the device 100.
In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to detect a pair of eye location for a face in a media file. In an example embodiment, the pair of eye location comprises a first eye location and a second eye location. In an example embodiment, the first eye location and the second eye location may correspond to the left eye and right eye of a face. The media file may be any image, video, any other graphic content that can feature faces. The media file may be received from internal memory such as hard drive, random access memory (RAM) of the apparatus 200, or from the memory 204, or from external storage medium such as digital versatile disk (DVD), compact disk (CD), flash drive, memory card, or from external storage locations through the Internet, local area network, Bluetooth®, and the like. In an example embodiment, the media file such as the image or the video may be instantaneously captured by the image sensor 204 and other circuitries. In an example embodiment, a processing means may be configured to detect the first eye location and the second eye location for the face in the media file. An example of the processing means may include the processor 202, which may be an example of the controller 108.
In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to determine a first set of eye locations and a second set of eye locations. In an example embodiment, the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location. In an example embodiment, the plurality of locations neighbouring the first eye location and the second eye location may be determined based on a distance of threshold number of pixels from the first eye location and the second eye location. For example, there may be four neighbouring locations, each at a distance of 3 pixels from the first eye location in different directions, may be selected as the neighbouring locations of the first eye location. The selection of the locations neighbouring the first eye location and the second eye location is further described in FIGURE 3. In an example, if there are four neighbouring locations of each of the first eye location and the second eye location, the first set of eye locations comprises five different eye locations (first eye location and its four neighbouring locations) and the second set of eye locations comprises five different eye locations (second eye location and its four neighbouring locations). In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to generate a set of sample faces corresponding to the face. In an example embodiment, eye locations of a sample face comprise an eye location from the first set of eye locations and an eye location from the second set of eye locations. In an example embodiment, various different combinations of eyes, for example, left eyes may be selected from five eye locations of the first set of eye locations and the right eyes may be selected from five eye locations of the second set of eye locations. In this example, a total of 25 different eyes may be configured. In an example embodiment, from 25 different eyes, equal number of sample faces may also be generated.
In some example embodiments, the processor 202 may be configured to, with the content of the memory 204, and optionally with other components described herein, to generate the set of sample faces based on the eye locations from the first set of eye locations and the second set of eye locations, and at least one point corresponding to a portion of the face. In these example embodiments, in addition to the eye locations, other points corresponding to other portion(s) of the face may be determined. An example of the portion of the face may include a nose of the face. Other examples of the portion may include, but are not limited to, chin, mouth, centre of forehead, front teeth, ear and remaining portion of the face. In these example embodiments, based on the information of the eye locations and the at least one point corresponding to the portion such as the nose of the face, the set of sample faces may be generated.
In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to determine a presence of a facial expression in the set of sample faces. In an example embodiment, the facial expression may be a smile expression. In other example embodiments, the facial expression may also be sorrow, anger, or any other emotional or behavioral expression. In an example embodiment, the presence of the facial expression in a sample face may be determined based on processing of the sample face by a facial expression classifier. For example, the sample faces may be provided to a smile classifier which can detect the presence of the smile expression in the sample faces. In an example embodiment, the smile classifier may be a classifier that is trained on attributes of smiling face samples. For example, in an example embodiment, the smile classifier may be a pattern recognition based classifier that is trained by a number of smiling face samples for an expression. In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to count a number of sample faces in which the facial expression is present. In an example embodiment, the processor 202 is configured to detect a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value. For example, assuming the media file is an image comprises a face, and 25 sample faces corresponding to the face are generated. In an example, consider there are 18 sample faces for which smile expressions are detected. In this example, if the first threshold value is assumed to be 17, the presence of the smile expression is determined in the face in the image, as number of sample faces determined to comprise smile (18) is greater than first threshold value (17).
Various example embodiments may be utilized to detect the presence of the facial expression in a face in a media file where the face might appear in consecutive frames such as in a video, or in any other graphic media file. In these example embodiments, the number of sample faces in which the presence of the facial expression is determined, is counted for some of the consecutive frames, and the presence of the facial expression is detected based on the counted number of the sample faces for the consecutive frames.
For example, in an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to determine presence of a facial expression in the set of sample faces in a plurality of frames of the media file. In an example embodiment, the plurality of frames may be consecutive frames of the media file, such as video. In an example embodiment, the processor 202 is configured to, cause the apparatus 200 to count, for the plurality of frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present. In an example embodiment, the apparatus 200 is caused to calculate an average of the number of sample faces for a first threshold number of consecutive frames. In an example, consider the first threshold number of consecutive frames to be 10. In this example, consider that 18, 20, 14, 22, 18, 23, 12, 15, 16 and 18 sample faces comprising smile expression are determined, in the consecutive 10 frames of the video, respectively. An average of the number of sample faces for these 10 consecutive frames may be calculated by dividing the sum of the number of sample faces determined as comprising the smile expression in these consecutive frames, by 10 (for example, (18+20+14+22+18+23+12+15+16+18)/10=17.6). In an example embodiment, a processing means may be configured to determine presence of a facial expression in the set of sample faces in a plurality of frames of the media file. In an example embodiment, the processing means may also be configured to count, for the plurality of frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present. In an example embodiment, the processing means may also be configured to calculate an average of the number of sample faces for a first threshold number of consecutive frames. An example of the processing means may include the processor 202, which may be an example of the controller 108.
In an example embodiment, the processor 202 is configured to, with the content of the memory 204, and optionally with other components described herein, to cause the apparatus 200 to detect a presence of the facial expression in the face if the average of the number of sample faces is greater than a first threshold value. For example, if the first threshold value is 17, the calculated average (17.6) is greater than 17, and the face in a current frame of the video may be detected as comprising the smile expression. In an alternate example embodiment, presence of a facial expression in the set of sample faces may be determined in a set of consecutive frames of the media file. In this example embodiment, the apparatus 200 is caused to count, for the set of consecutive frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present. In this example embodiment, the apparatus 200 is caused to detect the presence of the facial expression in the face if the number of sample faces is greater than a first threshold value for each frame of the set of consecutive frames. For example, if it is detected that for each of the last seven consecutive frames, the number of samples faces is greater than 17, the face in the current frames may be determined as comprising the facial expression. In an example embodiment, in a video, if a face is detected as comprising the facial expression, such as the smile expression, a smiling state may be assigned to the face in the current frame of the video. In an example embodiment, if the smiling state is assigned to the face, the smiling state for the face persists until a smiling state rejection criterion is satisfied. In an example embodiment, if the smiling state is assigned to the face in the video, the detection of the smile expression is stopped and the face is tested only for rejection of smile expression. In an example embodiment, the smiling state rejection criterion may be satisfied for the face, if an average of the number of sample faces (that are determined as comprising the smile expression) for a second threshold number of consecutive frames is less than a second threshold value. For example, in an example embodiment, the apparatus 200 is caused to calculate the average of the number of sample faces for the second threshold number of consecutive frames. For example, if the second threshold number of consecutive frames is 20, the apparatus 200 may be caused to calculate the average of number of sample faces for 20 consecutive frames in the video. If the calculated average is 9 and the second threshold value is 10, it may be determined that the face does not comprise the smile expression. In an example embodiment, the presence of the smile expression is rejected in the face in a current frame. In an example embodiment, the presence of the smile expression may be rejected by rejecting/discarding the smiling state of the face. In an example embodiment, if the smiling state of the face is rejected, detection of the presence of the smile expression in the face is re-started. In an example embodiment, the apparatus 200 may include at least one buffer or array that can store the number of sample faces in which the facial expression is determined in a frame wise manner. In an example embodiment, the apparatus 200 may include a buffer/array that can store the number of sample faces in last first threshold number of consecutive frames, for example last 10 consecutive frames. In this example embodiment, the stored numbers of sample faces for the last 10 consecutive frames may be utilized to calculate the average of the number of sample faces for the last 10 consecutive frames at any current frame of the video. In an example, embodiment, the apparatus may include a buffer/array that can store the number of sample faces in last second threshold number of consecutive frames, for example, last 20 consecutive frames. In this example embodiment, the stored numbers of sample faces for the last 20 consecutive frames may be utilized to calculate the average of the number of sample faces for the last 20 consecutive frames at any current frame of the video. In an example embodiment, the apparatus 200 may include a single bugger/array that can store number of sample faces for last N frames, where N can be any integer value. In an example embodiment, the one or more buffers/arrays for storing the numbers of sample faces may be stored in the memory 204.
In an example embodiment, the apparatus 200 may comprise a communication device. An example of the communication device may include, but is not limited to, a mobile phone, a personal digital assistant (PDA), a notebook, a tablet personal computer (PC), and a global positioning device (GPS). The communication device may comprise an image sensor. The image sensor, along with other components may be configured to facilitate a user to capture images or videos of human faces. An example of the image sensor and the other components may be the camera module 122. The communication device may comprise a user interface circuitry and user interface software configured to facilitate a user to control at least one function of the communication device through use of a display and further configured to respond to user inputs. The user interface circuitry may be similar to the user interface explained in FIGURE 1 and the description is not included herein for sake of brevity of description. Additionally or alternatively, the communication device may include a display circuitry configured to display at least a portion of a user interface of the communication device, the display and display circuitry configured to facilitate the user to control at least one function of the communication device. Additionally or alternatively, the communication device may include typical components such as a transceiver (such as transmitter 104 and a receiver 106), volatile and non-volatile memory (such as volatile memory 126 and non-volatile memory 128), and the like. The various components of the communication device are not included herein for the sake of brevity of description.
FIGURE 3 is a schematic diagram representing an example of generating multiple sample faces corresponding to a face, in accordance with an example embodiment. As discussed below, the FIGURE 3 provides an example of a manner in which various sample faces corresponding to a particular face may be generated. In an example embodiment, a processing means, for example, the processor 200 or the controller 108 may be configured to detect eyes of a face. The processing means may be configured to detect the eyes of the face that may be present in a media file, such as an image, or a frame of a video. In an example embodiment, a first eye location and second eye location corresponding to the left eye and the right eye, respectively, of the face may be detected. For example, as shown in FIGURE 3, a first eye location 310 and a second eye location 320 of a particular face are detected. In an example embodiment, a first set of eye locations and a second set of eye locations are generated. In an example representation, the first set of eye locations comprises the first eye location 310 and a plurality of neighbouring locations 312, 314, 316 and 318. In an example representation, the second set of eye locations comprises the second eye location 320 and a plurality of neighbouring locations 322, 324, 326 and 328. In this example embodiment, there are five possible eye locations in the first set of eye locations that may be five possible locations for the left eye. Similarly, there are five possible eye locations in the second set of eye locations that may be five possible locations for the right eye. In an example embodiment, five left eye points and the five right eye points may form 25 possible different eye pairs. In an example embodiment, N sample faces may be generated using the N eye points, for example, 25 sample faces may be generated from 25 eye pairs. In an example embodiment, the N sample faces may be normalized to generate N normalized sample faces. In an example embodiment, coordinates of the first eye location and a second eye location may be stored in a memory, such as the memory 204 or any other storage contained or in communication with the apparatus 200. The coordinates corresponding to the first eye location may be represented as LxO and LyO; and the coordinates corresponding to the second eye location may be represented as RxO and RyO in a frame of a media file. In an example embodiment, around each eye location, 4 points are considered at a distance of Dx pixels from the both eye locations (LxO, LyO) and (RxO, RyO) as below
{ Lxi } = { LxO +/- Dx} where Dx =max(face_width/32,3);
{ Lyi } = { LyO +/- Dx} where Dx = max(face_width/32,3);
{ Rxi } = { Rx0+/- Dx} where Dx = max(face_width/32, 3); and
{ Ryi } = { Ry0+/- Dx} where Dx = max(face_width/32,3).
For example, for the eye location 314, the coordinates would be LxO-Dx, LyO, and for the eye location 316, the coordinates would be LxO, LyO-Dx. Similarly, for the for the eye location 324, the coordinates would be RxO-Dx, RyO, and for the eye location 326, the coordinates would be RxO, RyO-Dx.
In an example embodiment, a more or less number of (than 4) neighbouring eye locations may be determined to generate various eye pairs. In an example embodiment, the processing means may be configured to generate sample faces corresponding to these eye pairs. Some examples of outlines of sample faces are shown in FIGURE 3, for example, an outline 330 of a sample face is shown that may be generated from the pair of eye locations 310 and 320. Similarly, an outline 332 of a sample face is shown that may be generated from the pair of eye locations 312 and 324, and in similar manner 25 sample faces may be generated from 25 different pairs of eye locations. FIGURE 4 is a plot 400 illustrative of detection of a facial expression, in accordance with an example embodiment. The plot 400 illustrates frame numbers of the video (on X-axis) and number of sample faces in which presence of a facial expression is determined (on Y-axis). In an example representation, the plot 400 may correspond to the example of 25 sample faces as generated in FIGURE 3.
The plot 400 comprises a graph 410 and a graph 420. The graph 410 comprises variation of number of sample faces that are determined as comprising the facial expression with respect to frames of the video. Herein, the number of sample faces that are detected as comprising the facial expression in a frame is also referred to as 'presence score' of the frame. If the facial expression is a smile expression in a face, the presence score of the face in a frame refers to the number of sample faces that are determined as smiling in the frame. In an example embodiment, the frame number refers to number of frames of a media file, such as a video file. In an example embodiment shown in FIGURE 4, presence score for a face between frame numbers 350 to 650 of the video is plotted.
In the example plot 400 shown in FIGURE 4, the graph 420 represents an average of presence scores for a first threshold number of consecutive frames of the video. For example, the graph 420 may represent average of the presence scores of last 10 frames of the video. In an example embodiment, if the average of the presence scores exceeds a first threshold value, presence of the smile expression is detected in the face in the current frame. For example, if the first threshold value is equal to 15, the value of the average of the presence scores exceed the first threshold value in frame number 472 (shown by reference numeral 422). In this example, the presence of smile expression may be detected in the face in the frame number 472 of the video. Herein, average of the presence scores in last 10 consecutive frames is considered for example purposes, and as such, any other number of frames may be considered for calculating the average.
In another example embodiment, if it is determined that the presence score is greater than a threshold value such as the first threshold value for a threshold number of consecutive frames, the facial expression may be detected in the face in the current frame. For example, if the presence scores for the seven consecutive frames are greater than the first threshold value (15), the presence of the facial expression may be detected. For example, as shown in FIGURE 3, between the frames 465 and 472, the smile confidence value remains greater than 15, and accordingly, the face in the current frame may be determined as a smiling face at the frame 472.
In an example embodiment, once the presence of the smile expression is detected in a face, a presence state of the facial expression may be assigned to the face, for example, a smiling state may be assigned to at the frame 472. In an example embodiment, smiling state of the face is rejected if the average of the number of sample faces for a second threshold number of consecutive frames is less than a second threshold value. Such rejection of the smiling state is explained with a plot illustrated in FIGURE 5.
FIGURE 5 is a plot 500 illustrative of rejection of a presence state of a facial expression, in accordance with an example embodiment. The plot 500 illustrates frame numbers of the video (on X-axis) and number of sample faces (presence score) in which presence of a facial expression is determined (on Y-axis). In an example representation, the plot 500 may correspond to the example embodiment of 25 sample faces as generated in FIGURE 3. In an example embodiment, it may be determined that a particular face is not smiling if the presence score for the second threshold number of consecutive frames becomes less than the second threshold value. For the example shown in FIGURE 5, the second threshold value is assumed to as equal to ΊΟ'. The plot 500 comprises a graph 510 and a graph 520. The graph 510 comprises variation of the presence score with respect to frames of the video. The graph 520 represents variation of the average of the presence scores for the second threshold number of consecutive frames, with respect to frames of the video. For example, any value on the graph 520 may represent average of the presence scores for the last 20 frames of the video, at a current frame of the video.
In an example embodiment, the smiling state of the face is rejected, if the average of the presence score becomes less than the second threshold value. For example, at shown by reference numeral 522, the value of the average of the presence scores becomes less than second threshold value in frame number 490. In this example representation, the smiling state of the face is rejected at the frame number 490. As such, is should be understood that for some frames, it may be the case that average of the numbers of sample frames may be less than the first threshold value, but still the presence state of the facial expression for the face is retained. In an example embodiment, the first threshold value is utilized for detecting the presence of the facial expression, and the face persists in the smiling state until the average of the number of sample faces becomes less than the second threshold value.
FIGURE 6 is a flowchart depicting an example method 600 for detecting presence of a facial expression in a media file in accordance with an example embodiment. The method 600 depicted in flow chart may be executed by, for example, the apparatus 200 of FIGURE 2. Operations of the flowchart, and combinations of operation in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described in various embodiments may be embodied by computer program instructions. In an example embodiment, the computer program instructions, which embody the procedures, described in various embodiments may be stored by at least one memory device of an apparatus and executed by at least one processor in the apparatus. Any such computer program instructions may be loaded onto a computer or other programmable apparatus (for example, hardware) to produce a machine, such that the resulting computer or other programmable apparatus embody means for implementing the operations specified in the flowchart. These computer program instructions may also be stored in a computer-readable storage memory (as opposed to a transmission medium such as a carrier wave or electromagnetic signal) that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the operations specified in the flowchart. The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions, which execute on the computer or other programmable apparatus provide operations for implementing the operations in the flowchart. The operations of the method 600 are described with help of apparatus 200. However, the operations of the method 600 can be described and/or practiced by using any other apparatus.
The flowchart diagrams that follow are generally set forth as logical flowchart diagrams. The depicted operations and sequences thereof are indicative of at least one embodiment. While various arrow types, line types, and formatting styles may be employed in the flowchart diagrams, they are understood not to limit the scope of the corresponding method. In addition, some arrows, connectors and other formatting features may be used to indicate the logical flow of the methods. For instance, some arrows or connectors may indicate a waiting or monitoring period of an unspecified duration. Accordingly, the specifically disclosed operations, sequences, and formats are provided to explain the logical flow of the method and are understood not to limit the scope of the present disclosure.
At block 602 of the method 600, a first eye location and a second eye location of a face is detected in an image. In an example embodiment, the first eye location may correspond to the left eye of the face and the second eye location may correspond to the right eye of the face. At block 604, a first set of eye locations and a second set of eye locations are determined. In an example embodiment, the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location. In an example embodiment, the plurality of locations neighbouring the first eye location and the second eye location may be determined based on a distance of threshold number of pixels from the first eye location and the second eye location, as explained in FIGURES 2 and 3.
At block 606, a set of sample faces corresponding to the face are generated. In an example embodiment, eyes of a sample face correspond to an eye location from the first set of eye locations and an eye location from the second set of eye locations. In an example embodiment, various different combinations of eyes may be generated, for example, left eyes may be selected from the first set of eye locations and the right eyes may be selected from the second set of eye locations, and sample faces may be generated from these various combinations of eyes. At block 608, presence of a facial expression, for example, a smile expression is determined in the set of sample faces. As described in FIGURE 3, in an example embodiment, the presence of the smile expression in a sample face may be determined based on processing of the sample face by a smile classifier. At block 610 of the method 600, a number of sample faces is counted in which the presence of the facial expression is determined. At block 612, a presence of the facial expression in the face is detected if the number of sample faces is greater than a first threshold value. For example, in 25 sample faces corresponding to a face in an image, presence of the smile expression is determined in 17 sample faces. In this example, if the first threshold value is 15 sample faces, the number of sample faces in which the smile expression is determined (17) is greater than 15. In this example, the face in the image may be detected as comprising the smile expression.
Various example embodiments may be utilized to detect the presence of the facial expression in a face is media file where the face appears in consecutive frames such as in a video, or in any other graphic file. Some of these example embodiments are described in FIGURE 7.
FIGURE 7 is a flowchart depicting an example method 700 for detecting presence of a facial expression, in accordance with an example embodiment. The method 700 depicted in flow chart may be executed by, for example, the apparatus 200 of FIGURE 2. Operations of the flowchart, and combinations of operation in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. If desired, the different functions discussed in FIGURE 7 may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of these functions may be optional or may be combined.
At block 702, a first eye location and a second eye location of a face is detected in a video. In an example embodiment, the fist eye location and the second eye location of the face may be detected in a current frame of the video. At block 704, a first set of eye locations and a second set of eye locations are determined, in the current frame of the video. In some example embodiments, at block 706, at least one point corresponding to a portion of the face is determined. An example of the portion of the face may include nose of the face. Other examples of the portion may include, but are not limited to, chin, mouth, centre of forehead, front teeth, ear and remaining portion of the face. At block 708, a set of sample faces corresponding to the face are generated, in a current frame of the video. In some example embodiments, the sample faces are generated based on the eye locations from the first set of eye locations and the second set of eye locations, and the at least one point such as nose of the face. However, in some other example embodiments such as the embodiments as explained in FIGURE 6, the sample faces corresponding to the face may be generated based on the eye locations from the first set of eye locations and the second set of eye locations, only.
At block 710, presence of a facial expression in the set of sample faces is determined. In an example embodiment, the presence of the facial expression is determined in a plurality of frames of the media file. In an example embodiment, at block 712, for the plurality of frames, a number of sample faces of the set of sample faces may be counted in which the facial expression is determined as present. In an example embodiment, the number of sample faces corresponding to each frame may be stored in a memory.
At block 714 of the method 700, an average of the number of sample faces for a first threshold number of consecutive frames is calculated. In an example embodiment, the average may be calculated based on numbers of sample faces in first threshold number of last consecutive frames that are stored in the memory. At block 716, it is determined whether at a current frame, the average of the number of sample faces is greater than a first threshold value. At block 718, if it is determined that the average of the number of sample faces is greater than the first threshold value, the presence of the facial expression in the face is detected in the current frame of the video. If the average of the number of sample faces is less than or equal to the first threshold value, it may be determined that the facial expression is absent in the face in the current frame of the video, at block 720. At block 722, if a face is detected as comprising the facial expression, such as the smile expression, a smiling state may be assigned to the face in the current frame of the video. At block 724, average of the number of sample faces for a second threshold number of consecutive frame, is calculated. At block 726, it is checked whether the average of the number of sample faces for the second threshold number of consecutive frame is less than a second threshold value. If it is checked that the average is less than the second threshold value, it may be determined that the face in the current frame no longer comprises the facial expression. In this case, in an example embodiment, at block 728, the presence state of the facial expression to the face is rejected. If it is checked that the average is greater or equal to the second threshold value, the presence state of the facial expression is maintained.
Without in any way limiting the scope, interpretation, or application of the claims appearing below, a technical effect of one or more of the example embodiments disclosed herein is to detect facial expressions in faces in media files such as images, and videos. Various embodiments generate multiple sample faces corresponding to a single face, and detect presence of the facial expression in the face on the basis of determining presence of the facial expression in the multiple sample faces. Such phenomena used in various embodiments enhance the accuracy of the detection of the facial expressions, in a significant manner. For detecting facial expression in media files such as the video, the first threshold value, the second threshold value, first threshold number of consecutive frames, and the second threshold number of consecutive frames, may be selectively chosen.
Various embodiments described above may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on at least one memory, at least one processor, an apparatus or, a computer program product. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a "computer-readable medium" may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer, with one example of an apparatus described and depicted in FIGURES 1 and/or 2. A computer-readable medium may comprise a computer-readable storage medium that may be any media or means that can contain or store the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined.
Although various aspects of the embodiments are set out in the independent claims, other aspects comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the combinations explicitly set out in the claims.
It is also noted herein that while the above describes example embodiments of the invention, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present disclosure as defined in the appended claims.

Claims

CLAIMS:
1. A method comprising:
detecting a first eye location and a second eye location of a face in a media file;
determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and
generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
2. The method as claimed in claim 1 , wherein generating the set of sample faces comprises:
determining at least one point corresponding to a portion of the face; and
generating the set of sample faces based on the eye locations from the first set of eye locations and the second set of eye locations, and the at least one point.
3. The method as claimed in claim 2, wherein the portion of the face comprises nose of the face.
4. The method as claimed in claims 1 or 2, further comprising:
determining presence of a facial expression in the set of sample faces;
counting a number of sample faces of the set of sample faces in which the facial expression is determined as present; and
detecting a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value.
5. The method as claimed in claims 1 or 2, further comprising:
determining presence of a facial expression in the set of sample faces in a plurality of frames of the media file;
counting, for the plurality of frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present;
calculating an average of the number of sample faces for a first threshold number of consecutive frames; and
detecting a presence of the facial expression in the face if the average of the number of sample faces is greater than a first threshold value.
6. The method as claimed in claim 5, further comprising assigning a presence state of the facial expression corresponding to the face if the presence of the facial expression is detected in the face.
7. The method as claimed in claim 6, further comprising:
calculating an average of the number of sample faces for a second threshold number of consecutive frames; and
rejecting the presence state of the facial expression corresponding to the face if the average of the number of sample faces for the second threshold number of consecutive frames is less than a second threshold value.
8. The method as claimed in claims 1 or 2, further comprising:
determining presence of a facial expression in the set of sample faces in a set of consecutive frames of the media file;
counting, for the set of consecutive frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present; and
detecting a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value for the each frame of the set of consecutive frames.
9. The method as claimed in any of the claims 1 to 8, wherein presence of the facial expression in a sample face is detected based on processing of the sample face by a facial expression classifier.
10. The method as claimed in the claim 9, wherein the facial expression classifier is a smile classifier.
1 1. The method as claimed in claims 1 or 2, wherein the plurality of locations neighbouring the first eye location and the second eye location are determined based on a distance of threshold number of pixels from the first eye location and the second eye location.
12. An apparatus comprising:
at least one processor; and
at least one memory comprising computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus at least to perform:
detecting a first eye location and a second eye location of a face in a media file; determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and
generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
13. The apparatus as claimed in claim 12, wherein the apparatus is further caused, at least in part, to generate the set of sample faces by:
determining at least one point corresponding to a portion of the face; and
generating the set of sample faces based on the eye locations from the first set of eye locations and the second set of eye locations, and the at least one point.
14. The apparatus as claimed in claim 13, wherein the portion of the face comprises nose of the face.
15. The apparatus as claimed in claims 12 or 13, wherein the apparatus is further caused, at least in part, to:
determine presence of a facial expression in the set of sample faces;
counting a number of sample faces of the set of sample faces in which the facial expression is determined as present; and
detect a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value.
16. The apparatus as claimed in claims 12 or 13, wherein the apparatus is further caused, at least in part, to:
determine presence of a facial expression in the set of sample faces in a plurality of frames of the media file;
count, for the plurality of frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present;
calculate an average of the number of sample faces for a first threshold number of consecutive frames; and
detect a presence of the facial expression in the face if the average of the number of sample faces is greater than a first threshold value.
17. The apparatus as claimed in claims 16, wherein the apparatus is further caused, at least in part, to assign a presence state of the facial expression corresponding to the face if the presence of the facial expression is detected in the face.
18. The apparatus as claimed in claim 17, wherein the apparatus is further caused, at least in part, to: calculate an average of the number of sample faces for a second threshold number of consecutive frames; and
reject the presence state of the facial expression corresponding to the face if the average of the number of sample faces for the second threshold number of consecutive frames is less than a second threshold value.
19. The apparatus as claimed in claims 12 or 13, wherein the apparatus is further caused, at least in part, to:
determine presence of a facial expression in the set of sample faces in a set of consecutive frames of the media file;
count, for the set of consecutive frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present; and
detect a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value for the each frame of the set of consecutive frames.
20. The apparatus as claimed in any of the claims 12 to 19, wherein the apparatus is further caused, at least in part, to detect the presence of the facial expression in a sample face based on processing of the sample face by a facial expression classifier.
21. The apparatus as claimed in the claims 20, wherein the facial expression classifier is a smile classifier.
22. The apparatus as claimed in claims 12 or 13, wherein the apparatus is further caused, at least in part, to determine the plurality of locations neighbouring the first eye location and the second eye location based on a distance of threshold number of pixels from the first eye location and the second eye location.
23. The apparatus as claimed in claims 12 or 13, wherein the apparatus comprises a communication device comprising:
a user interface circuitry and user interface software configured to facilitate a user to control at least one function of the communication device through use of a display and further configured to respond to user inputs; and
a display circuitry configured to display at least a portion of a user interface of the communication device, the display and display circuitry configured to facilitate the user to control at least one function of the communication device.
24. The apparatus as claimed in claim 23, wherein the communication device comprises an image sensor configured to capture images and videos.
25. A computer program comprising a set of program instructions, which when executed by one or more processors, cause an apparatus at least to perform:
detecting a first eye location and a second eye location of a face in a media file;
determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and
generating a set of sample faces corresponding to the face, the set of sample faces comprising eye locations from the first set of eye locations and the second set of eye locations.
26. The computer program as claimed in claim 25, wherein the apparatus is further caused, at least in part, to generate the set of sample faces by:
determining at least one point corresponding to a portion of the face; and
generating the set of sample faces based on the eye locations from the first set of eye locations and the second set of eye locations, and the at least one point.
27. The computer program as claimed in claim 26, wherein the portion of the face comprises nose of the face.
28. The computer program as claimed in claims 25 or 26, wherein the apparatus is further caused, at least in part, to further perform:
determining presence of a facial expression in the set of sample faces;
counting a number of sample faces of the set of sample faces in which the facial expression is determined as present; and
detecting a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value.
29. The computer program as claimed in claims 25 and 26, wherein the apparatus is further caused, at least in part, to further perform:
determining presence of a facial expression in the set of sample faces in a plurality of frames of the media file;
counting, for the plurality of frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present;
calculating an average of the number of sample faces for a first threshold number of consecutive frames; and
detecting a presence of the facial expression in the face if the average of the number of sample faces is greater than a first threshold value.
30. The computer program as claimed in claim 29, wherein the apparatus is further caused, at least in part, to further perform assigning a presence state of the facial expression corresponding to the face if the presence of the facial expression is detected in the face.
31. The computer program as claimed in claim 30, wherein the apparatus is further caused, at least in part, to further perform:
calculating an average of the number of sample faces for a second threshold number of consecutive frames; and
rejecting the presence state of the facial expression corresponding to the face if the average of the number of sample faces for the second threshold number of consecutive frames is less than a second threshold value.
32. The computer program as claimed in claims 25 and 26, wherein the apparatus is further caused, at least in part, to further perform:
determining presence of a facial expression in the set of sample faces in a set of consecutive frames of the media file;
counting, for the set of consecutive frames, a number of sample faces of the set of sample faces in which the facial expression is determined as present; and
detecting a presence of the facial expression in the face if the number of sample faces is greater than a first threshold value for the each frame of the set of consecutive frames.
33. The computer program as claimed in any of the claims 25 to 32, wherein the apparatus is further caused, at least in part, to further perform detecting the presence of the facial expression in a sample face based on processing of the sample face by a facial expression classifier.
34. The computer program as claimed in claim 33, wherein the facial expression classifier is a smile classifier.
35. The computer program as claimed in claims 25 or 26, wherein the apparatus is further caused, at least in part, to determine the plurality of locations neighbouring the first eye location and the second eye location based on a distance of threshold number of pixels from the first eye location and the second eye location.
36. An apparatus comprising:
means for detecting a first eye location and a second eye location of a face in a media file; means for determining a first set of eye locations and a second set of eye locations, wherein the first set of eye locations comprises the first eye location and a plurality of locations neighbouring the first eye location, and wherein the second set of eye locations comprises the second eye location and a plurality of locations neighbouring the second eye location; and
means for generating a set of sample faces corresponding to the face based on the first set of eye locations and the second set of eye locations, the set of sample face comprising eye locations from the first set of eye locations and the second set of eye locations.
PCT/FI2012/050135 2011-03-25 2012-02-14 Method apparatus and computer program product for detection of facial expressions WO2012131149A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN937/CHE/2011 2011-03-25
IN937CH2011 2011-03-25

Publications (1)

Publication Number Publication Date
WO2012131149A1 true WO2012131149A1 (en) 2012-10-04

Family

ID=46929540

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2012/050135 WO2012131149A1 (en) 2011-03-25 2012-02-14 Method apparatus and computer program product for detection of facial expressions

Country Status (1)

Country Link
WO (1) WO2012131149A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104112131A (en) * 2013-04-19 2014-10-22 浙江大华技术股份有限公司 Method and device for generating training samples used for face detection
WO2015142936A1 (en) * 2014-03-17 2015-09-24 Meggitt Training Systems Inc. Method and apparatus for rendering a 3-dimensional scene
US9355366B1 (en) 2011-12-19 2016-05-31 Hello-Hello, Inc. Automated systems for improving communication at the human-machine interface

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070014433A1 (en) * 2005-07-13 2007-01-18 Canon Kabushiki Kaisha Image processing apparatus and image processing method
EP1768058A2 (en) * 2005-09-26 2007-03-28 Canon Kabushiki Kaisha Information processing apparatus and control method therefor
WO2010133661A1 (en) * 2009-05-20 2010-11-25 Tessera Technologies Ireland Limited Identifying facial expressions in acquired digital images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070014433A1 (en) * 2005-07-13 2007-01-18 Canon Kabushiki Kaisha Image processing apparatus and image processing method
EP1768058A2 (en) * 2005-09-26 2007-03-28 Canon Kabushiki Kaisha Information processing apparatus and control method therefor
WO2010133661A1 (en) * 2009-05-20 2010-11-25 Tessera Technologies Ireland Limited Identifying facial expressions in acquired digital images

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9355366B1 (en) 2011-12-19 2016-05-31 Hello-Hello, Inc. Automated systems for improving communication at the human-machine interface
CN104112131A (en) * 2013-04-19 2014-10-22 浙江大华技术股份有限公司 Method and device for generating training samples used for face detection
CN104112131B (en) * 2013-04-19 2017-03-22 浙江大华技术股份有限公司 Method and device for generating training samples used for face detection
WO2015142936A1 (en) * 2014-03-17 2015-09-24 Meggitt Training Systems Inc. Method and apparatus for rendering a 3-dimensional scene
US9875573B2 (en) 2014-03-17 2018-01-23 Meggitt Training Systems, Inc. Method and apparatus for rendering a 3-dimensional scene

Similar Documents

Publication Publication Date Title
EP3036901B1 (en) Method, apparatus and computer program product for object detection and segmentation
US9542750B2 (en) Method, apparatus and computer program product for depth estimation of stereo images
EP2726937B1 (en) Method, apparatus and computer program product for generating panorama images
US10250811B2 (en) Method, apparatus and computer program product for capturing images
EP2659486B1 (en) Method, apparatus and computer program for emotion detection
US9412175B2 (en) Method, apparatus and computer program product for image segmentation
EP2998960A1 (en) Method and device for video browsing
US9183618B2 (en) Method, apparatus and computer program product for alignment of frames
US20150235377A1 (en) Method, apparatus and computer program product for segmentation of objects in media content
US20140218370A1 (en) Method, apparatus and computer program product for generation of animated image associated with multimedia content
WO2012146823A1 (en) Method, apparatus and computer program product for blink detection in media content
US20170351932A1 (en) Method, apparatus and computer program product for blur estimation
US9158374B2 (en) Method, apparatus and computer program product for displaying media content
US9275134B2 (en) Method, apparatus and computer program product for classification of objects
US9269158B2 (en) Method, apparatus and computer program product for periodic motion detection in multimedia content
US9489741B2 (en) Method, apparatus and computer program product for disparity estimation of foreground objects in images
WO2012131149A1 (en) Method apparatus and computer program product for detection of facial expressions
US20140205266A1 (en) Method, Apparatus and Computer Program Product for Summarizing Media Content
US20140314273A1 (en) Method, Apparatus and Computer Program Product for Object Detection
US20130107008A1 (en) Method, apparatus and computer program product for capturing images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12765829

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12765829

Country of ref document: EP

Kind code of ref document: A1