US20170061202A1 - Human identity verification via automated analysis of facial action coding system features - Google Patents

Human identity verification via automated analysis of facial action coding system features Download PDF

Info

Publication number
US20170061202A1
US20170061202A1 US14/840,745 US201514840745A US2017061202A1 US 20170061202 A1 US20170061202 A1 US 20170061202A1 US 201514840745 A US201514840745 A US 201514840745A US 2017061202 A1 US2017061202 A1 US 2017061202A1
Authority
US
United States
Prior art keywords
frame
video
individuals
different
query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US14/840,745
Other versions
US9594949B1 (en
Inventor
Matthew Adam Shreve
Jayant Kumar
Qun Li
Edgar A. Bernal
Raja Bala
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xerox Corp
Original Assignee
Xerox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xerox Corp filed Critical Xerox Corp
Priority to US14/840,745 priority Critical patent/US9594949B1/en
Assigned to XEROX CORPORATION reassignment XEROX CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: BALA, RAJA, BERNAL, EDGAR A., KUMAR, JAYANT, LI, QUN, SHREVE, MATTHEW ADAM
Publication of US20170061202A1 publication Critical patent/US20170061202A1/en
Application granted granted Critical
Publication of US9594949B1 publication Critical patent/US9594949B1/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • G06K9/00315
    • G06K9/00288
    • G06K9/52
    • G06K9/6215
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Definitions

  • the present disclosure relates generally to human identity verification and, more particularly, to a method and apparatus for identifying an individual based upon facial expressions of a query video.
  • the human face serves as an important interface to convey nonverbal emotional information.
  • the human face is probably the most natural characteristic that humans use to identify each other.
  • the study on the potential use of the face as a biometrics trait has received significant attention in the search of new biometrics modalities in the past decades.
  • One disclosed feature of the embodiments is a method that receives a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receives the query video, calculates a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video and provides a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
  • Another disclosed feature of the embodiments is a non-transitory computer-readable medium having stored thereon a plurality of instructions, the plurality of instructions including instructions which, when executed by a processor, cause the processor to perform operations that receives a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receive the query video, calculate a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video and provide a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
  • Another disclosed feature of the embodiments is an apparatus comprising a processor and a computer-readable medium storing a plurality of instructions which, when executed by the processor, cause the processor to perform operations that receive a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receive the query video, calculate a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video and provide a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
  • FIG. 1 illustrates an example block diagram of a system of the present disclosure
  • FIG. 2 illustrates a high-level block diagram of a method of the present disclosure
  • FIG. 3 illustrates an example flowchart of a method for verifying an identify of an individual based upon facial expressions as exhibited in a query video
  • FIG. 4 illustrates a high-level block diagram of a computer suitable for use in performing the functions described herein.
  • the present disclosure broadly discloses a method and apparatus for verifying an identity of an individual based upon facial expressions as exhibited in a query video.
  • the human face serves as an important interface to convey nonverbal emotional information.
  • the human face is probably the most natural characteristic that humans use to identify each other.
  • the study on the potential use of the face as a biometrics trait has received significant attention in the search of new biometrics modalities in the past decades.
  • Embodiments of the present disclosure provide a method for automatically verifying the identity of an individual based upon the similarities in facial expressions exhibited by the individual between a reference video and a query video using facial gesture encoders.
  • An example facial gesture encoder that describes the various aspects of facial expression is the Facial Action Coding System (FACS), which provides a discrete scale of values that denote region and intensity of facial movements called Action Units (AU).
  • FACS Facial Action Coding System
  • An enrollment period may be used to collect reference videos of a plurality of different individuals.
  • AU values may be extracted from at least one frame of each one of the reference videos.
  • a query video may then be captured and AU values may be extracted from at least one frame of the query video.
  • the AU values from the query video may be compared to each one of the AU values from the reference videos to obtain a similarity score of an identity verification of the individual.
  • the ranking of the top N individuals associated with a similarity score above a threshold value may be presented.
  • both the query and reference videos are captured while individuals are performing a specific task.
  • the method would continuously authenticate a user over the duration of specific task(s) during a single session, or specific task(s) that take place over separate sessions. Examples of such task(s) could include problem solving tasks (e.g., an exam, puzzle, game, questionnaire, and the like).
  • the similarity score may be based on a fusion of multiple different scores.
  • two different scores may be used.
  • the two different scores may be a distance metric between AU histograms of the query video and a reference video, or the two different scores may be a temporal similarity or dissimilarity score based on a temporal alignment of the sequence of AUs of the frames of the query video and the AUs of the frames of the reference video.
  • the temporal alignment can comprise a dynamic time warping (DTW) algorithm.
  • DTW dynamic time warping
  • FIG. 1 illustrates an example system 100 of the present disclosure.
  • the system 100 includes an Internet Protocol (IP) network 102 .
  • IP Internet Protocol
  • the IF network 102 may be any type of network including, for example, a cellular network, a broadband network, and the like.
  • the IP network 102 has been simplified for ease of explanation.
  • the IP network 102 may include additional access networks or network elements that are not shown.
  • the additional network elements may include a gateway, a router, a switch, a firewall, an application server, and the like.
  • the IF network 102 may include an application server (AS) 104 and a database (DB) 106 .
  • AS application server
  • DB database
  • the AS 104 may be deployed as a computer or a server having a processor and computer readable memory for storing instructions, which when executed by the processor perform the functions described herein.
  • One example of a computer is discussed below and described in FIG. 4 .
  • the DB 106 may store the various functions, parameters and values used for performing the automated method for identifying an individual based upon facial expression of a query video described herein.
  • the DB 106 may also store all of the reference videos and annotated AU values for at least one frame of each of the reference videos of different individuals, as discussed below.
  • FIG. 1 Although only a single AS 104 and a single DB 106 are illustrated in FIG. 1 , it should be noted that any number of application servers and databases may be deployed. In addition, the application servers and databases may be co-located or located remotely from one another.
  • the system 100 may include a video camera 108 .
  • the video camera 108 may be located remotely from the AS 104 and the DB 106 or may be co-located with the AS 104 and the DB 106 .
  • the video camera 108 may be coupled to an endpoint device (not shown) that is in communication with the AS 104 over the IP network 102 .
  • the video camera 108 may capture a reference video for each one of a plurality of different individuals.
  • each reference video may comprise a plurality of frames and AU values may be extracted from at least one frame of the reference video.
  • the AUs may be in accordance with a FACS coding of facial expressions.
  • the AUs may be detected using a program such as FaceReader® 6, or any other program that extracts FACS features.
  • the AUs may be annotated by a trained human expert. Table 1 below illustrates an example of AUs associated with the FACS coding of facial expressions.
  • the present disclosure also assigns a value based on a range of values of a discrete scale.
  • a discrete scale may be a numerical scale, an alphabetical scale, and the like.
  • the discrete scale may be a range of 0-6 where 0 is off and 6 is a maximum amount of activation.
  • the values for the detected AUs may be assigned by an expert.
  • the sequence of AUs extracted from the reference videos may then be sent to the AS 104 and stored in the DB 106 . It should be noted that although AU values from a FACS coding is used, the embodiments of the present disclosure may work with any facial gesture encoders.
  • an endpoint 112 may be located remotely from the camera 108 or co-located with the camera 108 .
  • the endpoint 112 may include a display and be used to verify the identity of an individual based upon a query video.
  • the camera 108 may be used to also capture the query video.
  • another camera (not shown), similar to the camera 108 , may be used to capture the query video.
  • the reference videos may be collected during an enrollment phase at a first location using the camera 108 .
  • the query video may be taken at a second location that is being monitored with a second camera associated with the endpoint 112 .
  • the endpoint 112 may be any computing device with a display.
  • the endpoint 112 may be a desktop computer, a laptop computer, a tablet computer, a mobile telephone, a smart phone, and the like.
  • the query video may also be sent to the AS 104 for processing to determine if the individual in the query video matches an individual in one of the reference videos.
  • AU values can be extracted from at least one frame of the query video. Then the sequence of AUs extracted from the frames of a query video may be compared to the sequence of AUs extracted from the frames of the reference video for each one of the plurality of different individuals.
  • a similarity score may be calculated based upon the comparison.
  • the similarity score may have a value between 0 and 1.
  • a ranking of a top N individuals may be displayed on the endpoint 112 based upon the similarity score. In one embodiment, the ranking of the top N individuals may be based on a number of the plurality of different individuals who have a similarity score above a threshold (e.g., 0 . 90 ).
  • the similarity score may be based on a distance score that is calculated based upon a distance between the AU values of at least one frame of the reference video compared to the AU values of at least one frame of the query video.
  • the similarity score may be based upon a score-level fusion that fuses the distance score to a temporal score.
  • the temporal score may be based upon a temporal analysis of a sequence of the AUs in at least one frame of the query video compared to a sequence of the AUs in at least one frame of the reference video.
  • FIG. 2 illustrates a high-level block diagram of a method of the present disclosure.
  • FIG. 2 illustrates a reference video 202 and a query video 204 .
  • the reference video 202 and the query video 204 may comprise a plurality of frames 206 1 to 206 n (herein referred to individually as frame 206 or collectively as frames 206 ) and a plurality of frames 208 1 to 208 n (herein referred to individually as frame 208 or collectively as frames 208 ), respectively.
  • each one of the frames 206 and 208 may be analyzed to extract AUs, and a value of the amount of activation of the AUs may be assigned for each one of the detected AUs.
  • a histogram 210 of the values of each one of the AUs for at least one frame 206 1 to 206 n and a histogram 212 of the values of each one of the AUs for at least one frame 208 1 to 208 n may be created.
  • a distance function may be applied to the histograms 210 and 212 to calculate a distance score between the AU histogram for the frames 206 1 to 206 n and the AU histogram for the frames 208 1 to 208 n .
  • a 6-bin frequency histogram may be generated for each AU over all n frames as shown in Equations (1) and (2):
  • AU i (f) is the value of the i-th AU (as per Table 1 above) corresponding to frame f
  • Equation (3) is the histogram bin index that corresponds to one of the 6 AU intensity values.
  • H ⁇ H AU i ,H AU 2 , . . . ,H AU 20 ⁇ Equation 3:
  • a chi squared ( ⁇ 2 ) distance function may be applied. It should be noted that any distance function may be applied. For example, a cosine distance function or any other distance or divergence function may be also applied.
  • the similarity score may be based on only the distance function between the histograms. However, in one embodiment, the accuracy of the identify verification may be improved by fusing the distance score to a temporal score. In one embodiment, a temporal analysis may also be applied to the value of the AUs in the frames 206 1 to 206 n and the AUs in the frames 208 1 to 208 n .
  • the temporal analysis may be a dynamic time warping (DTW) function.
  • the DTW function may apply a temporal warping of a sequence of the plurality of different AUs of at least one frame of the query video to align with a sequence of the plurality of different AUs of at least one frame of the reference video.
  • An example is illustrated by the aligned sequence of AUs 216 illustrated in FIG. 2 .
  • the distance score and the temporal score may be fused 218 to obtain the similarity score.
  • the distance score (S FACS-H ) may be fused with the temporal score (S DTW ) in accordance with Equation (4) below:
  • the values of ⁇ and ⁇ were found to be 0.38 and 0.62, respectively.
  • Equations (1)-(4) are only example equations or functions that may be used.
  • the human identity verification may be performed with only a single distance score as the confidence score. In other words, embodiments of the present disclosure do not require that score-level fusion 218 be performed.
  • N may be based on a number of references videos having a similarity score above a threshold.
  • the threshold may be 0.95 and ten reference videos may have a similarity score above 0.95.
  • N may be the top 10 reference videos associated with ten different individuals.
  • N may be 1. In other words, it may be assumed that the top similarity score is the reference video having the identity of the individual in the query video.
  • embodiments of the present disclosure automatically perform human identify verification based on facial expressions captured in a query video.
  • Values may be assigned to AUs in each frame 206 and 208 of the reference videos 202 and the query video 204 , respectively, and the AU values may be compared between the reference videos 202 and the query video 204 to obtain a similarity score that identifies the individual in the query video based on a facial expression match with an individual in the reference video.
  • FIG. 3 illustrates a flowchart of a method 300 for verifying the identity of an individual based upon facial expressions as exhibited in a query video.
  • one or more steps, or operations, of the method 300 may be performed by the application server 104 or a computer as illustrated in FIG. 4 and discussed below.
  • the method 300 begins.
  • the method 300 receives a reference video for each one of a plurality of different individuals. For example, an enrollment period may allow a reference video for each one of the plurality of different individuals to be captured.
  • Each reference video may comprise a plurality of frames. At least one frame of the reference video may be annotated with a plurality of different AUs. The AUs may be in accordance with a FACS encoding.
  • a value may be assigned for each one of the different AUs based on a discrete scale comprising three or more incremental values.
  • the discrete scale does not include only on or off. Rather, the AU may be assigned a value that indicates an amount or a degree of activation of a particular AU. The process may be repeated for each reference video of each one of the plurality of different users.
  • the method 300 receives a query video.
  • a query video For example, an entrance to a building or a room may be monitored to ensure only authorized personnel are allowed to enter.
  • a device may be secured to allow only authorized individuals to access the device.
  • a camera at the location or the device may capture the query video. The same camera that captured the reference video may be used or a different camera may be used.
  • the query video may comprise a plurality of frames.
  • AU values corresponding to a plurality of different AUs may be extracted from at least one frame of the query video.
  • a value may be assigned for each one of the different AUs for at least one frame of the query video based on a discrete scale comprising three or more incremental values.
  • the method 300 calculates a similarity score based on an analysis that compares the reference video of one of the plurality of different individuals and the query video.
  • the similarity score may be a value between 0 and 1 (e.g., a decimal value).
  • the similarity score may be based on a similarity of value of the AUs in at least one frame of the reference video compared to the value of the AUs in at least one frame of the query video. In one embodiment, the similarity may be measured based upon distance score calculated by applying a distance function between a first histogram of the values for each one of the plurality of AUs of at least one frame of the reference video and a second histogram of the values for each one of the plurality of AUs of at least one frame of the query video.
  • any type of distance function may be used (e.g., a chi squared ( ⁇ 2 ) distance function, a cosine distance function, a divergence function and the like).
  • the Equations (1)-(3) described above may be used to create the first and second histograms.
  • the distance score may be fused with a temporal score to obtain the similarity score.
  • a score level fusion may be applied for the distance score and the temporal score as described in Equation (4) above.
  • the temporal score may be obtained by applying a temporal analysis between a sequence of the plurality of different AUs of at least one frame of the reference video and a sequence of the plurality of different AUs of at least one frame of the query video.
  • the temporal analysis may include a DTW function.
  • the DTW function applies a temporal warping of the sequence of the plurality of different AUs of at least one frame of the query video to align with the sequence of the plurality of different AUs of at least one frame of the reference video.
  • fusing the distance score and the temporal score may provide a higher similarity score or a more accurate human identity verification.
  • the method 300 determines if the reference video of each one of the plurality of different individuals was compared to the query video. In other words, the sequence of AU values extracted from the query video is compared against each sequence of AU values extracted from the reference video of each one of the plurality of different individuals. Thus, if there are 100 reference videos that each corresponds to a different one of 100 individuals, then the sequence of AU values extracted from the query video are compared to each one of the sequence of AU values extracted from the 100 reference videos.
  • Blocks 308 and 310 may be repeated until all of the reference videos have been compared to the query video to calculate a similarity score for each reference video when compared to the query video.
  • the method 300 provides a ranking of a top N individuals of the plurality of different individuals based upon the similarity score.
  • N may be based on a number of references videos having a similarity score above a threshold.
  • the threshold may be 0.95 and ten reference videos may have a similarity score above 0.95.
  • N may be the top 10 reference videos associated with ten different individuals.
  • N may be 1. In other words, it may be assumed that the top similarity score is the reference video having the identity of the individual in the query video.
  • one or more steps, functions, or operations of the method 300 described above may include a storing, displaying and/or outputting step as required for a particular application.
  • any data, records, fields, and/or intermediate results discussed in the methods can be stored, displayed, and/or outputted to another device as required for a particular application.
  • steps, functions, or operations in FIG. 3 that recite a determining operation, or involve a decision do not necessarily require that both branches of the determining operation be practiced. In other words, one of the branches of the determining operation can be deemed as an optional step.
  • the embodiments of the present disclosure improve the functioning of a computer or a device.
  • the functioning of a computer may be improved to automatically identify an individual based upon facial expressions of a query video.
  • biometric identity verification may be performed using facial expressions, as described herein.
  • the embodiments of the present disclosure transform video data into annotated sequences of AU values that are used for identity verification of an individual, as discussed above.
  • no previous machine or computer was capable of performing the functions described herein as the present disclosure provides an improvement in the technological arts of biometric identity verification.
  • FIG. 4 depicts a high-level block diagram of a computer that can be transformed to into a machine that is dedicated to perform the functions described herein. Notably, no computer or machine currently exists that performs the functions as described herein. As a result, the embodiments of the present disclosure improve the operation and functioning of the computer to provide automatic identification of an individual based upon facial expressions of a query video, as disclosed herein.
  • the computer 400 comprises one or more hardware processor elements 402 (e.g., a central processing unit (CPU), a microprocessor, or a multi-core processor), a memory 404 , e.g., random access memory (RAM) and/or read only memory (ROM), a module 405 for verifying an identity of an individual based upon facial expressions as exhibited in a query video, and various input/output devices 406 (e.g., storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive, a receiver, a transmitter, a speaker, a display, a speech synthesizer, an output port, an input port and a user input device (such as a keyboard, a keypad, a mouse, a microphone and the like)).
  • hardware processor elements 402 e.g., a central processing unit (CPU), a microprocessor, or a multi-core processor
  • a memory 404 e.g.,
  • the computer may employ a plurality of processor elements.
  • the computer may employ a plurality of processor elements.
  • the method(s) as discussed above is implemented in a distributed or parallel manner for a particular illustrative example, i.e., the steps of the above method(s) or the entire method(s) are implemented across multiple or parallel computers, then the computer of this figure is intended to represent each of those multiple computers.
  • one or more hardware processors can be utilized in supporting a virtualized or shared computing environment.
  • the virtualized computing environment may support one or more virtual machines representing computers, servers, or other computing devices. In such virtualized virtual machines, hardware components such as hardware processors and computer-readable storage devices may be virtualized or logically represented.
  • the present disclosure can be implemented in software and/or in a combination of software and hardware, e.g., using application specific integrated circuits (ASIC), a programmable logic array (PLA), including a field-programmable gate array (FPGA), or a state machine deployed on a hardware device, a computer or any other hardware equivalents, e.g., computer readable instructions pertaining to the method(s) discussed above can be used to configure a hardware processor to perform the steps, functions and/or operations of the above disclosed methods.
  • ASIC application specific integrated circuits
  • PDA programmable logic array
  • FPGA field-programmable gate array
  • instructions and data for the present module or process 405 for verifying an identity of an individual based upon facial expressions as exhibited in a query video can be loaded into memory 404 and executed by hardware processor element 402 to implement the steps, functions or operations as discussed above in connection with the exemplary method 300 .
  • a hardware processor executes instructions to perform “operations,” this could include the hardware processor performing the operations directly and/or facilitating, directing, or cooperating with another hardware device or component (e.g., a co-processor and the like) to perform the operations.
  • the processor executing the computer readable or software instructions relating to the above described method(s) can be perceived as a programmed processor or a specialized processor.
  • the present module 405 for verifying an identity of an individual based upon facial expressions as exhibited in a query video (including associated data structures) of the present disclosure can be stored on a tangible or physical (broadly non-transitory) computer-readable storage device or medium, e.g., volatile memory, non-volatile memory, ROM memory, RAM memory, magnetic or optical drive, device or diskette and the like.
  • the computer-readable storage device may comprise any physical devices that provide the ability to store information such as data and/or instructions to be accessed by a processor or a computing device such as a computer or an application server.

Abstract

A method, computer readable medium and apparatus for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual are disclosed. The method includes receiving a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receiving the query video, calculating a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video.

Description

  • The present disclosure relates generally to human identity verification and, more particularly, to a method and apparatus for identifying an individual based upon facial expressions of a query video.
  • BACKGROUND
  • The human face serves as an important interface to convey nonverbal emotional information. The human face is probably the most natural characteristic that humans use to identify each other. Thus, the study on the potential use of the face as a biometrics trait has received significant attention in the search of new biometrics modalities in the past decades.
  • SUMMARY
  • According to aspects illustrated herein, there are provided a method, non-transitory computer readable medium and apparatus for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual. One disclosed feature of the embodiments is a method that receives a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receives the query video, calculates a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video and provides a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
  • Another disclosed feature of the embodiments is a non-transitory computer-readable medium having stored thereon a plurality of instructions, the plurality of instructions including instructions which, when executed by a processor, cause the processor to perform operations that receives a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receive the query video, calculate a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video and provide a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
  • Another disclosed feature of the embodiments is an apparatus comprising a processor and a computer-readable medium storing a plurality of instructions which, when executed by the processor, cause the processor to perform operations that receive a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals, receive the query video, calculate a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video and provide a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The teaching of the present disclosure can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
  • FIG. 1 illustrates an example block diagram of a system of the present disclosure;
  • FIG. 2 illustrates a high-level block diagram of a method of the present disclosure;
  • FIG. 3 illustrates an example flowchart of a method for verifying an identify of an individual based upon facial expressions as exhibited in a query video; and
  • FIG. 4 illustrates a high-level block diagram of a computer suitable for use in performing the functions described herein.
  • To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.
  • DETAILED DESCRIPTION
  • The present disclosure broadly discloses a method and apparatus for verifying an identity of an individual based upon facial expressions as exhibited in a query video. As discussed above, the human face serves as an important interface to convey nonverbal emotional information. The human face is probably the most natural characteristic that humans use to identify each other. Thus, the study on the potential use of the face as a biometrics trait has received significant attention in the search of new biometrics modalities in the past decades.
  • Embodiments of the present disclosure provide a method for automatically verifying the identity of an individual based upon the similarities in facial expressions exhibited by the individual between a reference video and a query video using facial gesture encoders. An example facial gesture encoder that describes the various aspects of facial expression is the Facial Action Coding System (FACS), which provides a discrete scale of values that denote region and intensity of facial movements called Action Units (AU). An enrollment period may be used to collect reference videos of a plurality of different individuals. AU values may be extracted from at least one frame of each one of the reference videos. A query video may then be captured and AU values may be extracted from at least one frame of the query video. Then the AU values from the query video may be compared to each one of the AU values from the reference videos to obtain a similarity score of an identity verification of the individual. In one example, the ranking of the top N individuals associated with a similarity score above a threshold value may be presented.
  • In one embodiment, both the query and reference videos are captured while individuals are performing a specific task. In one example case, the method would continuously authenticate a user over the duration of specific task(s) during a single session, or specific task(s) that take place over separate sessions. Examples of such task(s) could include problem solving tasks (e.g., an exam, puzzle, game, questionnaire, and the like).
  • In one embodiment, the similarity score may be based on a fusion of multiple different scores. In one embodiment, two different scores may be used. The two different scores may be a distance metric between AU histograms of the query video and a reference video, or the two different scores may be a temporal similarity or dissimilarity score based on a temporal alignment of the sequence of AUs of the frames of the query video and the AUs of the frames of the reference video. In one embodiment, the temporal alignment can comprise a dynamic time warping (DTW) algorithm.
  • FIG. 1 illustrates an example system 100 of the present disclosure. In one embodiment, the system 100 includes an Internet Protocol (IP) network 102. The IF network 102 may be any type of network including, for example, a cellular network, a broadband network, and the like.
  • It should be noted that the IP network 102 has been simplified for ease of explanation. The IP network 102 may include additional access networks or network elements that are not shown. For example, the additional network elements may include a gateway, a router, a switch, a firewall, an application server, and the like.
  • In one embodiment, the IF network 102 may include an application server (AS) 104 and a database (DB) 106. In one embodiment, the AS 104 may be deployed as a computer or a server having a processor and computer readable memory for storing instructions, which when executed by the processor perform the functions described herein. One example of a computer is discussed below and described in FIG. 4.
  • In one embodiment, the DB 106 may store the various functions, parameters and values used for performing the automated method for identifying an individual based upon facial expression of a query video described herein. The DB 106 may also store all of the reference videos and annotated AU values for at least one frame of each of the reference videos of different individuals, as discussed below.
  • Although only a single AS 104 and a single DB 106 are illustrated in FIG. 1, it should be noted that any number of application servers and databases may be deployed. In addition, the application servers and databases may be co-located or located remotely from one another.
  • In one embodiment, the system 100 may include a video camera 108. The video camera 108 may be located remotely from the AS 104 and the DB 106 or may be co-located with the AS 104 and the DB 106. In one embodiment, the video camera 108 may be coupled to an endpoint device (not shown) that is in communication with the AS 104 over the IP network 102. The video camera 108 may capture a reference video for each one of a plurality of different individuals.
  • In one embodiment, each reference video may comprise a plurality of frames and AU values may be extracted from at least one frame of the reference video. In one embodiment, the AUs may be in accordance with a FACS coding of facial expressions. In one embodiment, the AUs may be detected using a program such as FaceReader® 6, or any other program that extracts FACS features. In another embodiment, the AUs may be annotated by a trained human expert. Table 1 below illustrates an example of AUs associated with the FACS coding of facial expressions.
  • TABLE 1
    AUS FOR FACS CODING
    AU DESCRIPTION
    1 Inner Brow Raiser
    2 Outer Brow Raiser
    4 Brow Lowerer
    5 Upper Lid Raiser
    6 Cheek Raiser
    7 Lid Tightener
    9 Nose Wrinkler
    10 Upper Lip Raiser
    12 Lip Corner Puller
    14 Dimpler
    15 Lip Corner Depressor
    17 Chin Raiser
    18 Lip Pucker
    20 Lip Stretcher
    23 Lip Tightener
    24 Lip Pressor
    25 Lips Part
    26 Jaw Drop
    27 Mouth Stretch
    43 Eyes Closed
  • In one embodiment, the present disclosure also assigns a value based on a range of values of a discrete scale. In other words, previous applications of the AUs would only assign a binary value (e.g., either the AU is on or off). In contrast, the present disclosure uses a degree of activation of each AU. For example, the discrete scale may be a numerical scale, an alphabetical scale, and the like. In one embodiment, the discrete scale may be a range of 0-6 where 0 is off and 6 is a maximum amount of activation. In another embodiment, the discrete scale may be a range of A-E where A=trace, B=slight, C=marked, D=severe, and E=maximum. In one embodiment, the values for the detected AUs may be assigned by an expert. The sequence of AUs extracted from the reference videos may then be sent to the AS 104 and stored in the DB 106. It should be noted that although AU values from a FACS coding is used, the embodiments of the present disclosure may work with any facial gesture encoders.
  • In one embodiment, an endpoint 112 may be located remotely from the camera 108 or co-located with the camera 108. The endpoint 112 may include a display and be used to verify the identity of an individual based upon a query video. In one embodiment, the camera 108 may be used to also capture the query video. Alternatively, another camera (not shown), similar to the camera 108, may be used to capture the query video.
  • For example, the reference videos may be collected during an enrollment phase at a first location using the camera 108. The query video may be taken at a second location that is being monitored with a second camera associated with the endpoint 112.
  • In one embodiment, the endpoint 112 may be any computing device with a display. For example, the endpoint 112 may be a desktop computer, a laptop computer, a tablet computer, a mobile telephone, a smart phone, and the like.
  • In one embodiment, the query video may also be sent to the AS 104 for processing to determine if the individual in the query video matches an individual in one of the reference videos. In one embodiment, AU values can be extracted from at least one frame of the query video. Then the sequence of AUs extracted from the frames of a query video may be compared to the sequence of AUs extracted from the frames of the reference video for each one of the plurality of different individuals.
  • In one embodiment, a similarity score may be calculated based upon the comparison. In one embodiment, the similarity score may have a value between 0 and 1.
  • In one embodiment, a ranking of a top N individuals may be displayed on the endpoint 112 based upon the similarity score. In one embodiment, the ranking of the top N individuals may be based on a number of the plurality of different individuals who have a similarity score above a threshold (e.g., 0.90).
  • In one embodiment, the similarity score may be based on a distance score that is calculated based upon a distance between the AU values of at least one frame of the reference video compared to the AU values of at least one frame of the query video.
  • In one embodiment, the similarity score may be based upon a score-level fusion that fuses the distance score to a temporal score. The temporal score may be based upon a temporal analysis of a sequence of the AUs in at least one frame of the query video compared to a sequence of the AUs in at least one frame of the reference video.
  • FIG. 2 illustrates a high-level block diagram of a method of the present disclosure. FIG. 2 illustrates a reference video 202 and a query video 204. As discussed above, the reference video 202 and the query video 204 may comprise a plurality of frames 206 1 to 206 n (herein referred to individually as frame 206 or collectively as frames 206) and a plurality of frames 208 1 to 208 n (herein referred to individually as frame 208 or collectively as frames 208), respectively.
  • As discussed above, each one of the frames 206 and 208 may be analyzed to extract AUs, and a value of the amount of activation of the AUs may be assigned for each one of the detected AUs. In one embodiment, a histogram 210 of the values of each one of the AUs for at least one frame 206 1 to 206 n and a histogram 212 of the values of each one of the AUs for at least one frame 208 1 to 208 n may be created. In one embodiment, a distance function may be applied to the histograms 210 and 212 to calculate a distance score between the AU histogram for the frames 206 1 to 206 n and the AU histogram for the frames 208 1 to 208 n.
  • In one embodiment, a 6-bin frequency histogram may be generated for each AU over all n frames as shown in Equations (1) and (2):

  • H AU i (j)=Σf=1 nψ(AUi(f),j),  Equation 1:
  • where i=1, 2, . . . , 20 is the AU index, AUi(f) is the value of the i-th AU (as per Table 1 above) corresponding to frame f,

  • ψ(AUi(f),j)={0 otherwise 1 if AU i (f)=j,  Equation 2:
  • and j is the histogram bin index that corresponds to one of the 6 AU intensity values. For a given video sequence of a given subject, all histograms are concatenated to create an overall histogram H of length 120 (e.g., 6×20) according to Equation (3):

  • H={H AU i ,H AU 2 , . . . ,H AU 20 }  Equation 3:
  • In one embodiment, a chi squared (χ2) distance function may be applied. It should be noted that any distance function may be applied. For example, a cosine distance function or any other distance or divergence function may be also applied.
  • In one embodiment, the similarity score may be based on only the distance function between the histograms. However, in one embodiment, the accuracy of the identify verification may be improved by fusing the distance score to a temporal score. In one embodiment, a temporal analysis may also be applied to the value of the AUs in the frames 206 1 to 206 n and the AUs in the frames 208 1 to 208 n.
  • In one embodiment, the temporal analysis may be a dynamic time warping (DTW) function. The DTW function may apply a temporal warping of a sequence of the plurality of different AUs of at least one frame of the query video to align with a sequence of the plurality of different AUs of at least one frame of the reference video. An example is illustrated by the aligned sequence of AUs 216 illustrated in FIG. 2.
  • The distance score and the temporal score may be fused 218 to obtain the similarity score. In one embodiment, the distance score (SFACS-H) may be fused with the temporal score (SDTW) in accordance with Equation (4) below:

  • F(i,j)=αS FACS-H(i,j)+βS DTW(i,j),  Equation 4:
  • where α and β are weighting values having a real value between 0 and 1, where α+β=1. In one embodiment, the values of α and β were found to be 0.38 and 0.62, respectively.
  • It should be noted that the above Equations (1)-(4) are only example equations or functions that may be used. In addition, the human identity verification may be performed with only a single distance score as the confidence score. In other words, embodiments of the present disclosure do not require that score-level fusion 218 be performed.
  • As discussed above, once the similarity score is calculated for each reference video compared to the query video, the ranking of the top N reference videos having the highest similarity scores may be presented to a user on the endpoint 112. In one embodiment, N may be based on a number of references videos having a similarity score above a threshold. For example, the threshold may be 0.95 and ten reference videos may have a similarity score above 0.95. Thus, N may be the top 10 reference videos associated with ten different individuals.
  • In one embodiment, N may be 1. In other words, it may be assumed that the top similarity score is the reference video having the identity of the individual in the query video.
  • Thus, embodiments of the present disclosure automatically perform human identify verification based on facial expressions captured in a query video. Values may be assigned to AUs in each frame 206 and 208 of the reference videos 202 and the query video 204, respectively, and the AU values may be compared between the reference videos 202 and the query video 204 to obtain a similarity score that identifies the individual in the query video based on a facial expression match with an individual in the reference video.
  • FIG. 3 illustrates a flowchart of a method 300 for verifying the identity of an individual based upon facial expressions as exhibited in a query video. In one embodiment, one or more steps, or operations, of the method 300 may be performed by the application server 104 or a computer as illustrated in FIG. 4 and discussed below.
  • At block 302 the method 300 begins. At block 304, the method 300 receives a reference video for each one of a plurality of different individuals. For example, an enrollment period may allow a reference video for each one of the plurality of different individuals to be captured.
  • Each reference video may comprise a plurality of frames. At least one frame of the reference video may be annotated with a plurality of different AUs. The AUs may be in accordance with a FACS encoding.
  • In one embodiment, a value may be assigned for each one of the different AUs based on a discrete scale comprising three or more incremental values. In other words, the discrete scale does not include only on or off. Rather, the AU may be assigned a value that indicates an amount or a degree of activation of a particular AU. The process may be repeated for each reference video of each one of the plurality of different users.
  • For example, the discrete scale may be a numerical scale, an alphabetical scale, and the like. In one embodiment, the discrete scale may have a range of three or more values. In one embodiment, the discrete scale may be a range of 0-6 where 0 is off and 6 is a maximum amount of activation. In another embodiment, the discrete scale may be a range of A-E where A=trace, B=slight, C=marked, D=severe, and E=maximum. In one embodiment, the values for the detected AUs may be assigned by an expert.
  • At block 306, the method 300 receives a query video. For example, an entrance to a building or a room may be monitored to ensure only authorized personnel are allowed to enter. Alternatively, a device may be secured to allow only authorized individuals to access the device. A camera at the location or the device may capture the query video. The same camera that captured the reference video may be used or a different camera may be used.
  • In one embodiment, the query video may comprise a plurality of frames. AU values corresponding to a plurality of different AUs may be extracted from at least one frame of the query video. In one embodiment, a value may be assigned for each one of the different AUs for at least one frame of the query video based on a discrete scale comprising three or more incremental values.
  • At block 308, the method 300 calculates a similarity score based on an analysis that compares the reference video of one of the plurality of different individuals and the query video. In one embodiment, the similarity score may be a value between 0 and 1 (e.g., a decimal value).
  • In one embodiment, the similarity score may be based on a similarity of value of the AUs in at least one frame of the reference video compared to the value of the AUs in at least one frame of the query video. In one embodiment, the similarity may be measured based upon distance score calculated by applying a distance function between a first histogram of the values for each one of the plurality of AUs of at least one frame of the reference video and a second histogram of the values for each one of the plurality of AUs of at least one frame of the query video.
  • In one embodiment, any type of distance function may be used (e.g., a chi squared (χ2) distance function, a cosine distance function, a divergence function and the like). In one embodiment, the Equations (1)-(3) described above may be used to create the first and second histograms.
  • In one embodiment, the distance score may be fused with a temporal score to obtain the similarity score. For example, a score level fusion may be applied for the distance score and the temporal score as described in Equation (4) above. As discussed above, the temporal score may be obtained by applying a temporal analysis between a sequence of the plurality of different AUs of at least one frame of the reference video and a sequence of the plurality of different AUs of at least one frame of the query video.
  • In one embodiment, the temporal analysis may include a DTW function. In one embodiment, the DTW function applies a temporal warping of the sequence of the plurality of different AUs of at least one frame of the query video to align with the sequence of the plurality of different AUs of at least one frame of the reference video. In one embodiment, fusing the distance score and the temporal score may provide a higher similarity score or a more accurate human identity verification.
  • At block 310, the method 300 determines if the reference video of each one of the plurality of different individuals was compared to the query video. In other words, the sequence of AU values extracted from the query video is compared against each sequence of AU values extracted from the reference video of each one of the plurality of different individuals. Thus, if there are 100 reference videos that each corresponds to a different one of 100 individuals, then the sequence of AU values extracted from the query video are compared to each one of the sequence of AU values extracted from the 100 reference videos.
  • If the answer to block 310 is no, then the method 300 returns to block 308. Blocks 308 and 310 may be repeated until all of the reference videos have been compared to the query video to calculate a similarity score for each reference video when compared to the query video.
  • If the answer to block 310 is yes, then the method 300 proceeds to block 312. At block 312, the method 300 provides a ranking of a top N individuals of the plurality of different individuals based upon the similarity score. In one embodiment, N may be based on a number of references videos having a similarity score above a threshold. For example, the threshold may be 0.95 and ten reference videos may have a similarity score above 0.95. Thus, N may be the top 10 reference videos associated with ten different individuals.
  • In one embodiment, N may be 1. In other words, it may be assumed that the top similarity score is the reference video having the identity of the individual in the query video. At block 314 the method 300 ends.
  • It should be noted that although not explicitly specified, one or more steps, functions, or operations of the method 300 described above may include a storing, displaying and/or outputting step as required for a particular application. In other words, any data, records, fields, and/or intermediate results discussed in the methods can be stored, displayed, and/or outputted to another device as required for a particular application. Furthermore, steps, functions, or operations in FIG. 3 that recite a determining operation, or involve a decision, do not necessarily require that both branches of the determining operation be practiced. In other words, one of the branches of the determining operation can be deemed as an optional step.
  • As a result, the embodiments of the present disclosure improve the functioning of a computer or a device. For example, the functioning of a computer may be improved to automatically identify an individual based upon facial expressions of a query video. In other words, biometric identity verification may be performed using facial expressions, as described herein. In addition, the embodiments of the present disclosure transform video data into annotated sequences of AU values that are used for identity verification of an individual, as discussed above. Notably, no previous machine or computer was capable of performing the functions described herein as the present disclosure provides an improvement in the technological arts of biometric identity verification.
  • FIG. 4 depicts a high-level block diagram of a computer that can be transformed to into a machine that is dedicated to perform the functions described herein. Notably, no computer or machine currently exists that performs the functions as described herein. As a result, the embodiments of the present disclosure improve the operation and functioning of the computer to provide automatic identification of an individual based upon facial expressions of a query video, as disclosed herein.
  • As depicted in FIG. 4, the computer 400 comprises one or more hardware processor elements 402 (e.g., a central processing unit (CPU), a microprocessor, or a multi-core processor), a memory 404, e.g., random access memory (RAM) and/or read only memory (ROM), a module 405 for verifying an identity of an individual based upon facial expressions as exhibited in a query video, and various input/output devices 406 (e.g., storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive, a receiver, a transmitter, a speaker, a display, a speech synthesizer, an output port, an input port and a user input device (such as a keyboard, a keypad, a mouse, a microphone and the like)). Although only one processor element is shown, it should be noted that the computer may employ a plurality of processor elements. Furthermore, although only one computer is shown in the figure, if the method(s) as discussed above is implemented in a distributed or parallel manner for a particular illustrative example, i.e., the steps of the above method(s) or the entire method(s) are implemented across multiple or parallel computers, then the computer of this figure is intended to represent each of those multiple computers. Furthermore, one or more hardware processors can be utilized in supporting a virtualized or shared computing environment. The virtualized computing environment may support one or more virtual machines representing computers, servers, or other computing devices. In such virtualized virtual machines, hardware components such as hardware processors and computer-readable storage devices may be virtualized or logically represented.
  • It should be noted that the present disclosure can be implemented in software and/or in a combination of software and hardware, e.g., using application specific integrated circuits (ASIC), a programmable logic array (PLA), including a field-programmable gate array (FPGA), or a state machine deployed on a hardware device, a computer or any other hardware equivalents, e.g., computer readable instructions pertaining to the method(s) discussed above can be used to configure a hardware processor to perform the steps, functions and/or operations of the above disclosed methods. In one embodiment, instructions and data for the present module or process 405 for verifying an identity of an individual based upon facial expressions as exhibited in a query video (e.g., a software program comprising computer-executable instructions) can be loaded into memory 404 and executed by hardware processor element 402 to implement the steps, functions or operations as discussed above in connection with the exemplary method 300. Furthermore, when a hardware processor executes instructions to perform “operations,” this could include the hardware processor performing the operations directly and/or facilitating, directing, or cooperating with another hardware device or component (e.g., a co-processor and the like) to perform the operations.
  • The processor executing the computer readable or software instructions relating to the above described method(s) can be perceived as a programmed processor or a specialized processor. As such, the present module 405 for verifying an identity of an individual based upon facial expressions as exhibited in a query video (including associated data structures) of the present disclosure can be stored on a tangible or physical (broadly non-transitory) computer-readable storage device or medium, e.g., volatile memory, non-volatile memory, ROM memory, RAM memory, magnetic or optical drive, device or diskette and the like. More specifically, the computer-readable storage device may comprise any physical devices that provide the ability to store information such as data and/or instructions to be accessed by a processor or a computing device such as a computer or an application server.
  • It will be appreciated that variants of the above-disclosed and other features and functions, or alternatives thereof, may be combined into many other different systems or applications. Various presently unforeseen or unanticipated alternatives, modifications, variations, or improvements therein may be subsequently made by those skilled in the art which are also intended to be encompassed by the following claims.

Claims (20)

What is claimed is:
1. A method for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual, comprising:
receiving, by a processor, a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals;
receiving, by the processor, the query video;
calculating, by the processor, a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares a plurality of different action units (AU) values of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of different AU values extracted from at least one frame of the query video; and
providing, by the processor, a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
2. The method of claim 1, wherein the reference video and the query video are captured while the plurality of different individuals is preforming at least one task.
3. The method of claim 2, wherein the at least one task comprises a problem solving task.
4. The method of claim 1, wherein each one of the plurality of facial gesture encoders comprises different action unit values in accordance with a Facial Action Coding System (FACS) of facial expressions.
5. The method of claim 1, wherein the analysis to calculate the similarity score comprises applying a distance function between a first feature representation of the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of individuals and a feature representation of the plurality of facial gesture encoders of the at least one frame of the query video.
6. The method of claim 5, wherein the feature representation of the plurality of facial gesture encoders of the at least one frame of the reference video and the feature representation of the facial gesture encoders of the at least one frame of the query video comprise a histogram of the plurality of facial gesture encoders.
7. The method of claim 5, wherein the distance function comprises at least one of: a chi squared (χ2) distance function, a cosine distance function or a divergence metric.
8. The method of claim 5, wherein the analysis further comprises a temporal analysis between a sequence of the plurality of different action units of the at least one frame of the reference video for the each one of the plurality of individuals and a sequence of the plurality of different action units of the at least one frame of the query video.
9. The method of claim 8, wherein the temporal analysis comprises a dynamic time warping function that applies a temporal warping of the sequence of the plurality of different action units of the at least one frame of the query video to align with the sequence of the plurality of different action units of the at least one frame of the reference video for the each one of the plurality of different individuals.
10. The method of claim 8, wherein the similarity score comprises a fusion between a weighted score from the distance function and a weighted score from the temporal analysis.
11. The method of claim 1, wherein the ranking of the top N individuals comprises the plurality of individuals having the similarity score above a threshold value.
12. A non-transitory computer readable medium for storing instructions, which when executed by a processor, perform operations for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual, the operations comprising:
receiving a reference video for each one of a plurality of different individuals, wherein a plurality of facial gesture encoders is extracted from at least one frame of the reference video describing one or more facial expressions of each one of the plurality of different individuals;
receiving the query video;
calculating a similarity score for the reference video for the each one of the plurality of different individuals based on an analysis that compares the plurality of facial gesture encoders of the at least one frame of the reference video for the each one of the plurality of different individuals to a plurality of facial gesture encoders extracted from at least one frame of the query video; and
providing a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
13. The non-transitory computer readable medium of claim 12, wherein the reference video and the query video are captured while the plurality of different individuals is preforming at least one task.
14. The non-transitory computer readable medium of claim 12, wherein the at least one task comprises a problem solving task.
15. The non-transitory computer readable medium of claim 12, wherein each one of the plurality of facial gesture encoders comprises different action unit values in accordance with a Facial Action Coding System (FACS) of facial expressions
16. The non-transitory computer readable medium of claim 12, wherein the analysis to calculate the similarity score comprises applying a distance function between a first feature representation of the plurality of different AU values of the at least one frame of the reference video for the each one of the plurality of individuals and a feature representation of the plurality of facial gesture encoders of the at least one frame of the query video.
17. The non-transitory computer readable medium of claim 16, wherein the distance function comprises at least one of: a chi squared (χ2) distance function, a cosine distance function or a divergence metric.
18. The non-transitory computer readable medium of claim 16, wherein the analysis further comprises a temporal analysis between a sequence of the plurality of different action units of the at least one frame of the reference video for the each one of the plurality of individuals and a sequence of the plurality of different action units of the at least one frame of the query video.
19. The non-transitory computer readable medium of claim 16, wherein the temporal analysis comprises a dynamic time warping function that applies a temporal warping of the sequence of the plurality of different action units of the at least one frame of the query video to align with the sequence of the plurality of different action units of the at least one frame of the reference video for the each one of the plurality of different individuals.
20. A method for verifying an identity of an individual based upon facial expressions as exhibited in a query video of the individual, comprising:
creating, by a processor, a reference video describing one or more facial expressions of each one of a plurality of different individuals, wherein at least one frame of the reference video for the each one of the plurality of different individuals is annotated with a reference value for each one of a plurality of different action units;
receiving, by the processor, the query video;
annotating, by the processor, at least one frame of the query video with a query value for each one of the plurality of action units of the at least one frame of the query video;
calculating, by the processor, a chi squared distance score between a first histogram of the reference values of the plurality of different action units of the at least one frame of the reference video for the each one of the plurality of individuals and a second histogram of the query values of the plurality of different action units of the at least one frame of the query video;
calculating, by the processor, a temporal score based upon a dynamic time warping of a sequence of the plurality of different action units of the at least one frame of the query video to align with a sequence of the plurality of different action units of the at least one frame of the reference video for the each one of the plurality of different individuals;
applying, by the processor, a first weight to the chi squared distance score to obtain a weighted chi squared distance score and a second weight to the temporal score to obtain a weighted temporal score;
fusing, by the processor, the weighted chi squared distance score and the weighted temporal score to calculate a similarity score; and
providing, by the processor, a ranking of a top N individuals of the plurality of different individuals based upon the similarity score that is calculated for the each one of the plurality of different individuals.
US14/840,745 2015-08-31 2015-08-31 Human identity verification via automated analysis of facial action coding system features Expired - Fee Related US9594949B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US14/840,745 US9594949B1 (en) 2015-08-31 2015-08-31 Human identity verification via automated analysis of facial action coding system features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/840,745 US9594949B1 (en) 2015-08-31 2015-08-31 Human identity verification via automated analysis of facial action coding system features

Publications (2)

Publication Number Publication Date
US20170061202A1 true US20170061202A1 (en) 2017-03-02
US9594949B1 US9594949B1 (en) 2017-03-14

Family

ID=58096658

Family Applications (1)

Application Number Title Priority Date Filing Date
US14/840,745 Expired - Fee Related US9594949B1 (en) 2015-08-31 2015-08-31 Human identity verification via automated analysis of facial action coding system features

Country Status (1)

Country Link
US (1) US9594949B1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10282530B2 (en) * 2016-10-03 2019-05-07 Microsoft Technology Licensing, Llc Verifying identity based on facial dynamics
US10534955B2 (en) * 2016-01-22 2020-01-14 Dreamworks Animation L.L.C. Facial capture analysis and training system
US10747859B2 (en) * 2017-01-06 2020-08-18 International Business Machines Corporation System, method and computer program product for stateful instruction-based dynamic man-machine interactions for humanness validation
US20200344238A1 (en) * 2017-11-03 2020-10-29 Sensormatic Electronics, LLC Methods and System for Controlling Access to Enterprise Resources Based on Tracking
US11290447B2 (en) * 2016-10-27 2022-03-29 Tencent Technology (Shenzhen) Company Limited Face verification method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4486594B2 (en) * 2002-11-07 2010-06-23 本田技研工業株式会社 Video-based face recognition using probabilistic appearance aggregates
EP3358501B1 (en) * 2003-07-18 2020-01-01 Canon Kabushiki Kaisha Image processing device, imaging device, image processing method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10534955B2 (en) * 2016-01-22 2020-01-14 Dreamworks Animation L.L.C. Facial capture analysis and training system
US10282530B2 (en) * 2016-10-03 2019-05-07 Microsoft Technology Licensing, Llc Verifying identity based on facial dynamics
US11290447B2 (en) * 2016-10-27 2022-03-29 Tencent Technology (Shenzhen) Company Limited Face verification method and device
US10747859B2 (en) * 2017-01-06 2020-08-18 International Business Machines Corporation System, method and computer program product for stateful instruction-based dynamic man-machine interactions for humanness validation
US20200344238A1 (en) * 2017-11-03 2020-10-29 Sensormatic Electronics, LLC Methods and System for Controlling Access to Enterprise Resources Based on Tracking

Also Published As

Publication number Publication date
US9594949B1 (en) 2017-03-14

Similar Documents

Publication Publication Date Title
US10275672B2 (en) Method and apparatus for authenticating liveness face, and computer program product thereof
US20220079325A1 (en) Techniques for identifying skin color in images having uncontrolled lighting conditions
US10776470B2 (en) Verifying identity based on facial dynamics
US9594949B1 (en) Human identity verification via automated analysis of facial action coding system features
US20180308107A1 (en) Living-body detection based anti-cheating online research method, device and system
US11443551B2 (en) Facial recognitions based on contextual information
US11151385B2 (en) System and method for detecting deception in an audio-video response of a user
EP3229177A2 (en) Methods and systems for authenticating users
KR20170000128A (en) Mobile electric document system of multiple biometric
US8941741B1 (en) Authentication using a video signature
US20210287472A1 (en) Attendance management system and method, and electronic device
EP3001343B1 (en) System and method of enhanced identity recognition incorporating random actions
Smith-Creasey et al. Continuous face authentication scheme for mobile devices with tracking and liveness detection
CN111898413A (en) Face recognition method, face recognition device, electronic equipment and medium
US9760767B1 (en) Rating applications based on emotional states
CN110612530A (en) Method for selecting a frame for use in face processing
CN111738199B (en) Image information verification method, device, computing device and medium
JP2021520015A (en) Image processing methods, devices, terminal equipment, servers and systems
CN113435362A (en) Abnormal behavior detection method and device, computer equipment and storage medium
KR102215522B1 (en) System and method for authenticating user
CN110633677A (en) Face recognition method and device
Bouras et al. An online real-time face recognition system for police purposes
CN107390864B (en) Network investigation method based on eyeball trajectory tracking, electronic equipment and storage medium
WO2016139655A1 (en) Method and system for preventing uploading of faked photos
CN113449543B (en) Video detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: XEROX CORPORATION, CONNECTICUT

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHREVE, MATTHEW ADAM;KUMAR, JAYANT;LI, QUN;AND OTHERS;REEL/FRAME:036459/0636

Effective date: 20150827

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20210314