WO2024044185A1

WO2024044185A1 - Face image matching based on feature comparison

Info

Publication number: WO2024044185A1
Application number: PCT/US2023/030821
Authority: WO
Inventors: Akash James; Aastha SINGH; Sridhar Sudarsan
Original assignee: SparkCognition, Inc.; Sparkcognition India Private Limited
Priority date: 2022-08-23
Filing date: 2023-08-22
Publication date: 2024-02-29

Abstract

A device includes one or more processors configured to determine that image data includes a first image portion corresponding to a first face and determine that the image data includes a second image portion corresponding to a second face. The one or more processors are further configured to generate enhanced image portions based on the first and second image portions. The one or more processors are further configured to use one or more neural networks to process the enhanced image portions to generate first and second feature values representing the first and second faces, respectively. The one or more processors are further configured to determine, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face.

Description

FACE IMAGE MATCHING BASED ON FEATURE COMPARISON

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority from India Provisional Patent Application No. 202241047984 filed August 23, 2022, the content of which is incorporated by reference herein in its entirety.

FIELD

[0002] The present disclosure is generally related to determining feature values from image portions that correspond to faces and determining whether the faces match based on a comparison of the feature values.

BACKGROUND

[0003] In various situations, a picture on an identification card, a passport, etc., is compared to a person to verify the identity of the person. These comparisons are typically performed by personnel with associated costs, such as training costs and salaries. There can also be challenges such as human error in performing the comparisons and delays arising from limited staffing.

SUMMARY

[0004] The present disclosure describes systems and methods that enable face image matching based on feature comparison. For example, a camera captures an image of a user and an image of an identification card. In some examples, the image of the user and the image of the identification card are included in a single image frame. In other examples, the image of the user and the image of the identification card are included in distinct image frames.

[0005] A face comparison engine performs face detection on image data received from the camera. For example, the face comparison engine determines that the image data includes a first image portion corresponding to a first face detected in the image of the identification card and a second image portion corresponding to a second face detected in the image of the user. An identification card is provided as an illustrative example. In other examples, a face can be detected in an image of any type of identification token. [0006] Tn some examples, the face comparison engine pre-processes the first image portion, the second image portion, or both. For example, the pre-processing can include image enhancement, face alignment, or both. To illustrate, the image enhancement can include increasing resolution, brightening, de-blurring, one or more additional image enhancements, or a combination thereof. The face alignment can include adjusting the first image portion, the second image portion, or both, such that the first face in the first image portion is aligned with the second face in the second image portion.

[0007] The face comparison engine uses a neural network to process the first image portion to generate first feature values. The face comparison engine uses the neural network to process the second image portion to generate second feature values. The face comparison engine performs a comparison of the first feature values and the second feature values to generate a result indicating whether the first face matches the second face. In some examples, the neural network is trained to generate matching feature values independently of elements that can vary for the same person, such hairstyle, eyewear, makeup, facial hair, etc. In some examples, the neural network is trained to generate feature values that are based on (e.g., give more weight to) facial elements that do not typically change significantly over time for the same person, such as shape of skull, shape of ears, nose width, etc.

[0008] Tn some aspects, a device includes one or more processors configured to determine that image data includes a first image portion corresponding to a first face. The one or more processors are also configured to determine that the image data includes a second image portion corresponding to a second face. The one or more processors are further configured to generate, based on the first image portion, first feature values representing the first face. The one or more processors are also configured to generate, based on the second image portion, second feature values representing the second face. The one or more processors are further configured to determine, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face.

[0009] In some aspects, a method includes determining, at a device, that image data includes a first image portion corresponding to a first face. The method also includes determining, at the device, that the image data includes a second image portion corresponding to a second face. The method further includes generating, based on the first image portion, first feature values representing the first face. The method also includes generating, based on the second image portion, second feature values representing the second face. The method further includes determining, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face. The method also includes generating, at the device, an output indicating whether the first face matches the second face.

[0010] In some aspects, a computer-readable medium stores instructions that, when executed by one or more processors, cause the one or more processors to determine that image data includes a first image portion corresponding to a first face. The instructions, when executed by the one or more processors, also cause the one or more processors to determine that the image data includes a second image portion corresponding to a second face. The instructions, when executed by the one or more processors, further cause the one or more processors to generate, based on the first image portion, first feature values representing the first face. The instructions, when executed by the one or more processors, also cause the one or more processors to generate, based on the second image portion, second feature values representing the second face. The instructions, when executed by the one or more processors, further cause the one or more processors to determine, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face. The instructions, when executed by the one or more processors, also cause the one or more processors to generate an output indicating whether the first face matches the second face.

BRIEF DESCRIPTION OF THE DRAWINGS

[0011] FIG. 1 is a block diagram illustrating a particular implementation of a system that is operable to perform face image matching based on feature comparison.

[0012] FIG. 2 is a diagram illustrating a non-limiting example of operations associated with face image matching that can be performed by the system of FIG. 1 in accordance with some examples of the present disclosure. [0013] FIG. 3 is a diagram illustrating a non-limiting example of a neural network of the system of FIG. 1 that is configured to perform resolution enhancement in accordance with some examples of the present disclosure.

[0014] FIG. 4 is a diagram illustrating a non-limiting example of a neural network of the system of FIG. 1 that is configured to perform feature extraction in accordance with some examples of the present disclosure.

[0015] FIG. 5 is a diagram illustrating an example of a face image match and an example of a face image mismatch in accordance with some examples of the present disclosure.

[0016] FIG. 6 is a flow chart of an example of a method of face image matching in accordance with some examples of the present disclosure.

DETAILED DESCRIPTION

[0017] Particular aspects of the present disclosure are described below with reference to the drawings. In the description, common features are designated by common reference numbers throughout the drawings. In some drawings, multiple instances of a particular type of feature are used. Although these features are physically and/or logically distinct, the same reference number is used for each, and the different instances are distinguished by addition of a letter to the reference number. When the features as a group or a type are referred to herein (e.g., when no particular one of the features is being referenced), the reference number is used without a distinguishing letter. However, when one particular feature of multiple features of the same type is referred to herein, the reference number is used with the distinguishing letter. For example, referring to FIG. 4, multiple layers are illustrated and associated with reference numbers 440A, 440B, 440C, and 440D. When referring to a particular one of these layers, such as the layer 440A, the distinguishing letter “A” is used. However, when referring to any arbitrary one of these layers or to these layers as a group, the reference number 440 is used without a distinguishing letter.

[0018] As used herein, various terminology is used for the purpose of describing particular implementations only and is not intended to be limiting. For example, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Further the terms “comprise,” “comprises,” and “comprising” may be used interchangeably with “include,” “includes,” or “including.” Additionally, the term “wherein” may be used interchangeably with “where.” As used herein, “exemplary” may indicate an example, an implementation, and/or an aspect, and should not be construed as limiting or as indicating a preference or a preferred implementation. As used herein, an ordinal term (e.g., “first,” “second,” “third,” etc.) used to modify an element, such as a structure, a component, an operation, etc., does not by itself indicate any priority or order of the element with respect to another element, but rather merely distinguishes the element from another element having a same name (but for use of the ordinal term). As used herein, the term “set” refers to a grouping of one or more elements, and the term “plurality” refers to multiple elements.

[0019] In the present disclosure, terms such as “determining,” “calculating,” “estimating,” “shifting,” “adjusting,” etc. may be used to describe how one or more operations are performed. Such terms are not to be construed as limiting and other techniques may be utilized to perform similar operations. Additionally, as referred to herein, “generating,” “calculating,” “estimating,” “using,” “selecting,” “accessing,” and “determining” may be used interchangeably. For example, “generating,” “calculating,” “estimating,” or “determining” a parameter (or a signal) may refer to actively generating, estimating, calculating, or determining the parameter (or the signal) or may refer to using, selecting, or accessing the parameter (or signal) that is already generated, such as by another component or device.

[0020] As used herein, “coupled” may include “communicatively coupled,” “electrically coupled,” or “physically coupled,” and may also (or alternatively) include any combinations thereof. Two devices (or components) may be coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) directly or indirectly via one or more other devices, components, wires, buses, networks (e.g., a wired network, a wireless network, or a combination thereof), etc. T o devices (or components) that are electrically coupled may be included in the same device or in different devices and may be connected via electronics, one or more connectors, or inductive coupling, as illustrative, non-limiting examples. In some implementations, two devices (or components) that are communicatively coupled, such as in electrical communication, may send and receive electrical signals (digital signals or analog signals) directly or indirectly, such as via one or more wires, buses, networks, etc. As used herein, “directly coupled” may include two devices that are coupled (e.g., communicatively coupled, electrically coupled, or physically coupled) without intervening components.

[0021] As used herein, the term “machine learning” should be understood to have any of its usual and customary meanings within the fields of computer science and data science, such meanings including, for example, processes or techniques by which one or more computers can learn to perform some operation or function without being explicitly programmed to do so. As a typical example, machine learning can be used to enable one or more computers to analyze data to identify patterns in data and generate a result based on the analysis. For certain types of machine learning, the results that are generated include data that indicates an underlying structure or pattern of the data itself. Such techniques, for example, include so called “clustering” techniques, which identify clusters (e.g., groupings of data elements of the data).

[0022] For certain types of machine learning, the results that are generated include a data model (also referred to as a “machine-learning model” or simply a “model”). Typically, a model is generated using a first data set to facilitate analysis of a second data set. For example, a first portion of a large body of data may be used to generate a model that can be used to analyze the remaining portion of the large body of data. As another example, a set of historical data can be used to generate a model that can be used to analyze future data.

[0023] Since a model can be used to evaluate a set of data that is distinct from the data used to generate the model, the model can be viewed as a type of software (e.g., instructions, parameters, or both) that is automatically generated by the computer(s) during the machine learning process. As such, the model can be portable (e.g., can be generated at a first computer, and subsequently moved to a second computer for further training, for use, or both). Additionally, a model can be used in combination with one or more other models to perform a desired analysis. To illustrate, first data can be provided as input to a first model to generate first model output data, which can be provided (alone, with the first data, or with other data) as input to a second model to generate second model output data indicating a result of a desired analysis. Depending on the analysis and data involved, different combinations of models may be used to generate such results. In some examples, multiple models may provide model output that is input to a single model. In some examples, a single model provides model output to multiple models as input. [0024] Examples of machine-learning models include, without limitation, perceptrons, neural networks, support vector machines, regression models, decision trees, Bayesian models, Boltzmann machines, adaptive neuro-fuzzy inference systems, as well as combinations, ensembles and variants of these and other types of models. Variants of neural networks include, for example and without limitation, prototypical networks, autoencoders, transformers, selfattention networks, convolutional neural networks, deep neural networks, deep belief networks, etc. Variants of decision trees include, for example and without limitation, random forests, boosted decision trees, etc.

[0025] Since machine-learning models are generated by computer(s) based on input data, machine-learning models can be discussed in terms of at least two distinct time windows - a creation/training phase and a runtime phase. During the creation/training phase, a model is created, trained, adapted, validated, or otherwise configured by the computer based on the input data (which in the creation/training phase, is generally referred to as “training data”). Note that the trained model corresponds to software that has been generated and/or refined during the creation/training phase to perform particular operations, such as classification, prediction, encoding, or other data analysis or data synthesis operations. During the runtime phase (or “inference” phase), the model is used to analyze input data to generate model output. The content of the model output depends on the type of model. For example, a model can be trained to perform classification tasks or regression tasks, as non-limiting examples. Tn some implementations, a model may be continuously, periodically, or occasionally updated, in which case training time and runtime may be interleaved or one version of the model can be used for inference while a copy is updated, after which the updated copy may be deployed for inference.

[0026] In some implementations, a previously generated model is trained (or re-trained) using a machine-learning technique. In this context, “training” refers to adapting the model or parameters of the model to a particular data set. Unless otherwise clear from the specific context, the term “training” as used herein includes “re-training” or refining a model for a specific data set. For example, training may include so-called “transfer learning.” As described further below, in transfer learning a base model may be trained using a generic or typical data set, and the base model may be subsequently refined (e.g., re-trained or further trained) using a more specific data set. [0027] A data set used during training is referred to as a “training data set” or simply “training data.” The data set may be labeled or unlabclcd. “Labeled data” refers to data that has been assigned a categorical label indicating a group or category with which the data is associated, and “unlabeled data” refers to data that is not labeled. Typically, “supervised machine-learning processes” use labeled data to train a machine-learning model, and “unsupervised machinelearning processes” use unlabeled data to train a machine-learning model; however, it should be understood that a label associated with data is itself merely another data element that can be used in any appropriate machine-learning process. To illustrate, many clustering operations can operate using unlabeled data; however, such a clustering operation can use labeled data by ignoring labels assigned to data or by treating the labels the same as other data elements.

[0028] Machine-learning models can be initialized from scratch (e.g., by a user, such as a data scientist) or using a guided process (e.g., using a template or previously built model). Initializing the model includes specifying parameters and hyperparameters of the model.

“Hyperparameters” are characteristics of a model that are not modified during training, and “parameters” of the model arc characteristics of the model that are modified during training. The term “hyperparameters” may also be used to refer to parameters of the training process itself, such as a learning rate of the training process. In some examples, the hyperparameters of the model are specified based on the task the model is being created for, such as the type of data the model is to use, the goal of the model (e.g., classification, regression, anomaly detection), etc. The hyperparameters may also be specified based on other design goals associated with the model, such as a memory footprint limit, where and when the model is to be used, etc.

[0029] Model type and model architecture of a model illustrate a distinction between model generation and model training. The model type of a model, the model architecture of the model, or both, can be specified by a user or can be automatically determined by a computing device. However, neither the model type nor the model architecture of a particular model is changed during training of the particular model. Thus, the model type and model architecture are hyperparameters of the model and specifying the model type and model architecture is an aspect of model generation (rather than an aspect of model training). In this context, a “model type” refers to the specific type or sub-type of the machine-learning model. As noted above, examples of machine-learning model types include, without limitation, perceptrons, neural networks, support vector machines, regression models, decision trees, Bayesian models, Boltzmann machines, adaptive ncuro-fuzzy inference systems, as well as combinations, ensembles and variants of these and other types of models. In this context, “model architecture” (or simply “architecture”) refers to the number and arrangement of model components, such as nodes or layers, of a model, and which model components provide data to or receive data from other model components. As a non-limiting example, the architecture of a neural network may be specified in terms of nodes and links. To illustrate, a neural network architecture may specify the number of nodes in an input layer of the neural network, the number of hidden layers of the neural network, the number of nodes in each hidden layer, the number of nodes of an output layer, and which nodes are connected to other nodes (e.g., to provide input or receive output). As another non-limiting example, the architecture of a neural network may be specified in terms of layers. To illustrate, the neural network architecture may specify the number and arrangement of specific types of functional layers, such as long- short-term memory (LSTM) layers, fully connected (FC) layers, convolution layers, etc. While the architecture of a neural network implicitly or explicitly describes links between nodes or layers, the architecture does not specify link weights. Rather, link weights are parameters of a model (rather than hyperparameters of the model) and are modified during training of the model.

[0030] In many implementations, a data scientist selects the model type before training begins. However, in some implementations, a user may specify one or more goals (e.g., classification or regression), and automated tools may select one or more model types that are compatible with the specified goal(s). In such implementations, more than one model type may be selected, and one or more models of each selected model type can be generated and trained. A best performing model (based on specified criteria) can be selected from among the models representing the various model types. Note that in this process, no particular model type is specified in advance by the user, yet the models are trained according to their respective model types. Thus, the model type of any particular model does not change during training.

[0031] Similarly, in some implementations, the model architecture is specified in advance (e.g., by a data scientist); whereas in other implementations, a process that both generates and trains a model is used. Generating (or generating and training) the model using one or more machinelearning techniques is referred to herein as “automated model building”. In one example of automated model building, an initial set of candidate models is selected or generated, and then one or more of the candidate models arc trained and evaluated. In some implementations, after one or more rounds of changing hyperparameters and/or parameters of the candidate model(s), one or more of the candidate models may be selected for deployment (e.g., for use in a runtime phase).

[0032] Certain aspects of an automated model building process may be defined in advance (e.g., based on user settings, default values, or heuristic analysis of a training data set) and other aspects of the automated model building process may be determined using a randomized process. For example, the architectures of one or more models of the initial set of models can be determined randomly within predefined limits. As another example, a termination condition may be specified by the user or based on configurations settings. The termination condition indicates when the automated model building process should stop. To illustrate, a termination condition may indicate a maximum number of iterations of the automated model building process, in which case the automated model building process stops when an iteration counter reaches a specified value. As another illustrative example, a termination condition may indicate that the automated model building process should stop when a reliability metric associated with a particular model satisfies a threshold. As yet another illustrative example, a termination condition may indicate that the automated model building process should stop if a metric that indicates improvement of one or more models over time (e.g., between iterations) satisfies a threshold. In some implementations, multiple termination conditions, such as an iteration count condition, a time limit condition, and a rate of improvement condition can be specified, and the automated model building process can stop when one or more of these conditions is satisfied.

[0033] Another example of training a previously generated model is transfer learning. “Transfer learning” refers to initializing a model for a particular data set using a model that was trained using a different data set. For example, a “general purpose” model can be trained to detect anomalies in vibration data associated with a variety of types of rotary equipment, and the general purpose model can be used as the starting point to train a model for one or more specific types of rotary equipment, such as a first model for generators and a second model for pumps. As another example, a general-purpose natural-language processing model can be trained using a large selection of natural-language text in one or more target languages. In this example, the general -purpose natural-language processing model can be used as a starting point to train one or more models for specific natural-language processing tasks, such as translation between two languages, question answering, or classifying the subject matter of documents. Often, transfer learning can converge to a useful model more quickly than building and training the model from scratch.

[0034] Training a model based on a training data set generally involves changing parameters of the model with a goal of causing the output of the model to have particular characteristics based on data input to the model. To distinguish from model generation operations, model training may be referred to herein as optimization or optimization training. In this context, “optimization” refers to improving a metric, and does not mean finding an ideal (e.g., global maximum or global minimum) value of the metric. Examples of optimization trainers include, without limitation, backpropagation trainers, derivative free optimizers (DFOs), and extreme learning machines (ELMs). As one example of training a model, during supervised training of a neural network, an input data sample is associated with a label. When the input data sample is provided to the model, the model generates output data, which is compared to the label associated with the input data sample to generate an error value. Parameters of the model are modified in an attempt to reduce (e.g., optimize) the error value. As another example of training a model, during unsupervised training of an autoencoder, a data sample is provided as input to the autoencoder, and the autoencoder reduces the dimensionality of the data sample (which is a lossy operation) and attempts to reconstruct the data sample as output data. In this example, the output data is compared to the input data sample to generate a reconstruction loss, and parameters of the autoencoder are modified in an attempt to reduce (e.g., optimize) the reconstruction loss.

[0035] As another example, to use supervised training to train a model to perform a classification task, each data element of a training data set may be labeled to indicate a category or categories to which the data element belongs. In this example, during the creation/training phase, data elements are input to the model being trained, and the model generates output indicating categories to which the model assigns the data elements. The category labels associated with the data elements are compared to the categories assigned by the model. The computer modifies the model until the model accurately and reliably (e.g., within some specified criteria) assigns the correct labels to the data elements. Tn this example, the model can subsequently be used (in a runtime phase) to receive unknown (c.g., unlabclcd) data elements, and assign labels to the unknown data elements. In an unsupervised training scenario, the labels may be omitted. During the creation/training phase, model parameters may be tuned by the training algorithm in use such that during the runtime phase, the model is configured to determine which of multiple unlabeled “clusters” an input data sample is most likely to belong to.

[0036] As another example, to train a model to perform a regression task, during the creation/training phase, one or more data elements of the training data are input to the model being trained, and the model generates output indicating a predicted value of one or more other data elements of the training data. The predicted values of the training data are compared to corresponding actual values of the training data, and the computer modifies the model until the model accurately and reliably (e.g., within some specified criteria) predicts values of the training data. In this example, the model can subsequently be used (in a runtime phase) to receive data elements and predict values that have not been received. To illustrate, the model can analyze time series data, in which case, the model can predict one or more future values of the time series based on one or more prior values of the time series.

[0037] Tn some aspects, the output of a model can be subjected to further analysis operations to generate a desired result. To illustrate, in response to particular input data, a classification model (e.g., a model trained to perform classification tasks) may generate output including an array of classification scores, such as one score per classification category that the model is trained to assign. Each score is indicative of a likelihood (based on the model’s analysis) that the particular input data should be assigned to the respective category. In this illustrative example, the output of the model may be subjected to a softmax operation to convert the output to a probability distribution indicating, for each category label, a probability that the input data should be assigned the corresponding label. In some implementations, the probability distribution may be further processed to generate a one-hot encoded array. In other examples, other operations that retain one or more category labels and a likelihood value associated with each of the one or more category labels can be used. [0038] FTG. 1 illustrates an example of a system 100 that is configured to perform face image matching based on feature comparison. The system 100 can be implemented as or incorporated into one or more of various other devices, such as a personal computer (PC), a tablet PC, a server computer, a cloud-based computing system, a control system, an internet of things device, a personal digital assistant (PDA), a laptop computer, a desktop computer, a communications device, a wireless telephone, or any other machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. Further, while a single system 100 is illustrated, the term “system” includes any collection of systems or sub-systems that individually or jointly execute a set, or multiple sets, of instructions to perform one or more computer functions.

[0039] While FIG. 1 illustrates one example of the system 100, other computer systems or computing architectures and configurations may be used for carrying out the face image matching operations disclosed herein. The system 100 includes one or more processors 110. Each processor of the one or more processors 110 can include a single processing core or multiple processing cores that operate sequentially, in parallel, or sequentially at times and in parallel at other times. Each processor of the one or more processors 110 includes circuitry defining a plurality of logic circuits 112, working memory 114 (e.g., registers and cache memory), communication circuits, etc., which together enable the one or more processors 110 to control the operations performed by the system 100 and enable the one or more processors 110 to generate a useful result based on analysis of particular data and execution of specific instructions.

[0040] The one or more processors 110 are configured to interact with other components or subsystems of the system 100 via a bus 160. The bus 160 is illustrative of any interconnection scheme serving to link the subsystems of the system 100, external subsystems or devices, or any combination thereof. The bus 160 includes a plurality of conductors to facilitate communication of electrical and/or electromagnetic signals between the components or subsystems of the system 100. Additionally, the bus 160 includes one or more bus controllers or other circuits (e.g., transmitters and receivers) that manage signaling via the plurality of conductors and that cause signals sent via the plurality of conductors to conform to particular communication protocols. [0041] Tn a particular aspect, dedicated hardware implementations, such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement one or more of the operations described herein. Accordingly, the present disclosure encompasses software, firmware, and hardware implementations.

[0042] The system 100 also includes the one or more memory devices 142. The one or more memory devices 142 include any suitable computer-readable storage device depending on, for example, whether data access needs to be bi-directional or unidirectional, speed of data access required, memory capacity required, other factors related to data access, or any combination thereof. Generally, the one or more memory devices 142 includes some combinations of volatile memory devices and non-volatile memory devices, though in some implementations, only one or the other may be present. Examples of volatile memory devices and circuits include registers, caches, latches, many types of random-access memory (RAM), such as dynamic random-access memory (DRAM), etc. Examples of non-volatile memory devices and circuits include hard disks, optical disks, flash memory, and certain type of RAM, such as resistive random-access memory (ReRAM). Other examples of both volatile and non-volatile memory devices can be used as well, or in the alternative, so long as such memory devices store information in a physical, tangible medium. Thus, the one or more memory devices 142 include circuits and structures and are not merely signals or other transitory phenomena (i.e., are non-transitory media).

[0043] In the example illustrated in FIG. 1, the one or more memory devices 142 store instructions 146 that are executable by the one or more processors 110 to perform various operations and functions. The instructions 146 include instructions to enable the various components and subsystems of the system 100 to operate, interact with one another, and interact with a user, such as a basic input/output system (BIOS) 152 and an operating system (OS) 154. Additionally, the instructions 146 include one or more applications 156, scripts, or other program code to enable the one or more processors 110 to perform the operations described herein. For example, in FIG. 1, the instructions 146 include a face comparison engine 158 that is configured to perform face image matching based on feature comparison, as further described with reference to FIG. 2. [0044] in FIG. 1 , the system 100 also includes one or more output devices 130, one or more input devices 120, and one or more interface devices 132. Each of the one or more output devices 130, the one or more input devices 120, and the one or more interface devices 132 can be coupled to the bus 160 via a port or connector, such as a Universal Serial Bus port, a digital visual interface (DVI) port, a serial ATA (SATA) port, a small computer system interface (SCSI) port, a high-definition media interface (HDMI) port, or another serial or parallel port. In some implementations, one or more of the one or more output devices 130, the one or more input devices 120, or the one or more interface devices 132 is coupled to or integrated within a housing with the one or more processors 110 and the one or more memory devices 142, in which case the connections to the bus 160 can be internal, such as via an expansion slot or other card-to-card connector. In other implementations, the one or more processors 110 and the one or more memory devices 142 are integrated within a housing that includes one or more external ports, and one or more of the one or more output devices 130, the one or more input devices 120, the one or more interface devices 132 is coupled to the bus 160 via the one or more external ports.

[0045] Examples of the one or more output devices 130 include display devices, speakers, printers, televisions, projectors, or other devices to provide output of data in a manner that is perceptible by a user. Examples of the one or more input devices 120 include buttons, switches, knobs, a keyboard 122, a pointing device 124, one or more cameras 126, a biometric device, a microphone, a motion sensor, or another device to detect user input actions. The pointing device 124 includes, for example, one or more of a mouse, a stylus, a track ball, a pen, a touch pad, a touch screen, a tablet, another device that is useful for interacting with a graphical user interface, or any combination thereof. A particular device may be an input device 120 and an output device 130. For example, the particular device may be a touch screen.

[0046] The one or more interface devices 132 are configured to enable the system 100 to communicate with one or more other devices 144 directly or via one or more networks 140. For example, the one or more interface devices 132 may encode data in electrical and/or electromagnetic signals that are transmitted to the one or more other devices 144 as control signals or packet-based communication using pre-defined communication protocols. As another example, the one or more interface devices 132 may receive and decode electrical and/or electromagnetic signals that are transmitted by the one or more other devices 144. The electrical and/or electromagnetic signals can be transmitted wirelessly (e.g., via propagation through free space), via one or more wires, cables, optical fibers, or via a combination of wired and wireless transmission.

[0047] During operation, the face comparison engine 158 obtains image data 133. In a particular implementation, the face comparison engine 158 receives the image data 133 from the one or more cameras 126. In some implementations, the one or more cameras 126 capture one or more still image frames or capture a video including a sequence of image frames. In an example, the one or more cameras 126 capture one or more images of a person with an identification token 138. The identification token 138 includes a photo of a face 184. The one or more cameras 126 capture a live image of a face 186 of the person and capture an image of the photo of the face 184. For example, capturing a “live image” of a person corresponds to taking a photograph (or video) of the person, as compared to taking a photograph (or video) of a photo (or another representation) of the person.

[0048] In an example, the images are captured at a security check to match the person with their photo identification (e.g., the identification token 138), such as at airports, train stations, sports stadiums, banks, automated teller machines (ATMs), hotel check-in, etc. In another example, the images are captured during a financial transaction. To illustrate, if the face 184 matches the face 186, a financial transaction is initiated, a purchase is made, etc. The financial transaction can be based on a type of the identification token 138, such as a bank card, a dining card, an identification token on a mobile that has a near field communication (NFC) enabled or a Bluetooth® (a registered trademark of Bluetooth SIG, Inc., Washington) low energy (BLE) wallet, etc. In some aspects, the financial transaction can include one or more devices. For example, the face comparison engine 158 can send a notification to another device indicating whether the face 184 matches the face 186, and the other device selectively, based on the notification, proceeds with the financial transaction.

[0049] According to some implementations, the live image of the face 186 and the image of the photo of the face 184 are both captured in (e.g., are portions of) a single image (e.g., a single image frame) from a single camera of the one or more cameras 126. In other implementations, the live image of the face 186 is captured in a first image frame and the image of the photo of the face 184 is captured in a second image frame. In some aspects, the first image frame and the second image frame can be captured by a single camera of the one or more cameras 126. In other aspects, the first image frame is captured by a first camera of the one or more cameras 126, and the second image frame is captured by a second camera of the one or more cameras 126 that is distinct from the first camera. In some implementations, the face comparison engine 158 receives the image data 133 via the one or more networks 140 from the one or more other devices 144.

[0050] The identification token 138 can include a physical token or a digital token. The identification token 138 includes an image (e.g., the photo) of the face 184. In some aspects, the identification token 138 also includes text indicating information, such as a date of birth, a name, an identifier, an address, a department, or a combination thereof. Non-limiting examples of an identifier include a social security number, a driver’s license number, a passport number, a badge number, another identifier type, or a combination thereof. Non-limiting examples of the identification token 138 include a passport, a driver’s license, another type of identification token issued by a governmental/business/educational entity, an access badge, or another type of private issued identification token, and may be embodied in a physical form (e.g., a plastic card or badge, a paper booklet, etc.) or in a digital form, such as displayed on a screen of a user’s electronic device (e.g., via a mobile phone app). In some examples, the identification token 138 can include a card, a booklet, a badge, etc.

[0051] The image data 133 includes an image portion 134 corresponding to an image of the identification token 1 8, and an image portion 136 corresponding to an image of the person. For example, the image portion 134 corresponds to an image of the photo of the face 184 that is captured by the one or more cameras 126, and the image portion 136 corresponds to a live image of the face 186 captured by the one or more cameras 126.

[0052] In an example 150, the image data 133 includes a single image frame that includes the image portion 134 and the image portion 136, and the face comparison engine 158 receives the single image frame from a single one of the one or more cameras 126. In other examples, the image data 133 can include a first image frame that includes the image portion 134, and a second image frame that includes the image portion 136. In a particular aspect, the image data 133 includes a plurality of image frames (e.g., 180 image frames) corresponding to a capture time (e.g., 3 seconds) associated with a frame rate (e.g., 60 frames per second). In some examples, the face 184 is detectable in a first subset (e.g., image frames 100-130) of the plurality of image frames and the face 186 is detectable in a second subset (e.g., image frames 1-60 and 121 -180) of the plurality of image frames. To illustrate, the person may have moved such that the face 186 is blurry or out of camera view during capture of a particular subset (e.g., image frames 61-120) of the plurality of image frame. In a particular aspect, the image portion 134 can be included in a first image frame (e.g., any of the image frames 100-120) and the image portion 136 can be included in a second image frame (e.g., any of the image frames 1-160 and 121-180) that is distinct from the first image frame. In a particular aspect, the image portion 134 and the image portion 136 can be included in a single image frame (e.g., any of the image frames 121-130).

[0053] In some implementations, the face comparison engine 158 receives the first image frame and the second image from a single one of the one or more cameras 126. To illustrate, a particular camera of the one or more cameras 126 captures an image of the identification token 138 at a first time, and captures an image of the person at a second time. In other implementations, the face comparison engine 158 receives the first image frame from a first camera of the one or more cameras 126, and receives the second image frame from a second camera of the one or more cameras 126 that is distinct from the first camera.

[0054] The face comparison engine 158 performs face image matching based on the image data

133, as further described with reference to FIG. 2. For example, the face comparison engine 158 performs face detection 170 on the image data 133 to determine that the image data 133 includes two image portions 134, 136 corresponding to the images of the faces 184, 186.

[0055] In some implementations, the face comparison engine 158 detects an identification token portion of the image data 133 (e.g., representing the identification token 138) that matches an identification token template, designates the identification token portion as the image portion

134, and performs the face detection 170 on the image portion 134 to determine that the image portion 134 includes an image of the face 184 (e.g., an image of the photo of the face 184). In a particular aspect, matching the identification token template includes matching a shape of the identification template. In some implementations, the face comparison engine 158 uses various liveness check techniques on the image data 133 to verify that the image portion 136 corresponds to the face 186 of a live person. In a particular aspect, the face comparison engine 158 identifies multiple portions of the image data 133 corresponding to live persons, and selects the image portion 136 based on a relative size of the face 186 in the image portion 136. For example, an image size of a face of someone walking by is likely to be smaller than an image size of the face 186 that is closer to the one or more cameras 126 for face matching.

[0056] Optionally, in some implementations, the face comparison engine 158 performs image enhancement 172 of the image portion 134, the image portion 136, or both. The image enhancement 172 includes resolution enhancement (e.g., increasing resolution), brightening, deblurring, one or more additional enhancements, or a combination thereof, as further described with reference to FIG. 2.

[0057] In some implementations, the face comparison engine 158 performs face alignment 174 of the image portion 134, the image portion 136, or both, as further described with reference to FIG. 2. For example, the face comparison engine 158 adjusts the image portion 134, the image portion 136, or both, to align the face 184 with the face 186.

[0058] The face comparison engine 158 performs feature extraction 176 on the image portion 134 to generate first feature values representing the face 184, as further described with reference to FIG. 4. Similarly, the face comparison engine 158 performs the feature extraction 176 on the image portion 136 to generate second feature values representing the face 186. In some aspects, the first feature values and the second feature values match independently of elements that can be different for the same person. For example, the first feature values match the second feature values independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the face 186 and the face 184. In some aspects, the feature extraction 176 gives greater weight to elements that do not typically change substantially over time for the same person. For example, the first feature values are based on at least one of a shape of a skull, a width of a nose, a space between eyes relative to a width of the face 184, a shape of an eye, a shape of an ear, or a shape of the face 184.

[0059] The face comparison engine 158 performs feature comparison 178 of the first feature values and the second feature values to determine whether the face 184 matches the face 186, as further described with reference to FIG. 2. For example, the face comparison engine 158, in response to determining that a difference between the first feature values and the second feature values is greater than a difference threshold, determines that the face 184 does not match the face 186 (e.g., the face 184 of the identification token 138 does not match the face 186 of the person). Alternatively, the face comparison engine 158, in response to determining that the difference is less than or equal to the difference threshold, determines that the face 184 matches the face 186 (e.g., the face 184 of the identification token 138 is of the same person as the face 186).

[0060] The face comparison engine 158 thus determines whether the face 184 matches the face 186 without accessing a database of stored representations of faces. The face comparison engine 158 performs the face image matching independently of (e.g., without) network access. For example, the face comparison engine 158 performs computations for the face image matching locally and does not rely on accessing remote processing resources (e.g., cloud based resources) or databases, via the network 140, to perform the face image matching. To illustrate, the face comparison engine 158, independently of network access, performs the feature extraction 176 to generate the first feature values and the second feature values, and performs the feature comparison 178 to determine whether the face 184 matches the face 186.

[0061] The face comparison engine 158 performs output generation 180 to generate an output 118. For example, the face comparison engine 158 generates the output 118 indicating whether the face 184 matches the face 186. In some implementations, the output 118 includes a graphical user interface 166 indicating whether the face 184 matches the face 186. In a particular aspect, the face comparison engine 158, in response to determining that the face 184 does not match the face 186, generates the output 118 including an alert. In another aspect, the face comparison engine 158, in response to determining that the face 184 matches the face 186, generates the output 118 including an authentication token.

[0062] In some examples, the face comparison engine 158 sends the output 118 to the one or more output devices 130. For example, the one or more output devices 130 display the GUI 166 indicating whether the face 184 matches the face 186. In some examples, the face comparison engine 158 sends the output 118 to the one or more other devices 144. For example, the one or more other devices 144 include a lock that is deactivated in response to receiving an authentication token of the output 118. In some examples, a video stream received from the one or more cameras 126, the face comparison engine 158 overlays the output 118 on the video stream to generate an output video stream, and sends the output video stream to the one or more output devices 130. Tn a particular aspect, the image data 1 3 is based on the video stream received from the one or more cameras 126.

[0063] In a particular aspect, the face comparison engine 158, in response to determining that the face 184 matches the face 186, performs text recognition on the image portion 134 to generate text 137. In an example, the face comparison engine 158 determines that the text 137 includes particular text elements at particular locations of the identification token 138 that is indicated by the image portion 134. To illustrate, the face comparison engine 158 determines that the text 137 indicates a name at a first location, an identifier at a second location, a date of birth at a third location, one or more additional text elements at one or more locations, or a combination thereof. In a particular example, the face comparison engine 158 determines that the image portion 134 includes the face 184 at a particular location of the identification.

[0064] In some implementations, the face comparison engine 158 determines whether the image portion 134 matches an identification token template, and generates the output 118 indicating whether the image portion 134 matches the identification token template. For example, the face comparison engine 158 determines that the image portion 134 matches the identification token template based at least in part on determining that locations of the particular text elements match locations of corresponding text elements in the identification token template, that location of the face 184 matches a location of a corresponding face in the identification token template, or a combination thereof. In some examples, the face comparison engine 158 determines that the image portion 134 matches the identification token template based at least in part on determining whether a format of a particular text element of the text 137 matches a format of a corresponding text element of the identification token template. In some examples, the face comparison engine 158 determines that the image portion 134 matches the identification token template based at least in part on determining whether a design element (e.g., a drawing, a color, a font, a picture, a hologram, microprinting, a design element that becomes visible under ultraviolet (UV) light, microprinting, etc.) indicated by the image portion 134 matches a corresponding design element of the identification token template. In some aspects, one or more design elements can include security features (e.g., a hologram, microprinting, a design element that becomes visible under UV light, microprinting, etc.). In some implementations, the face comparison engine 158 provides the image portion 134 to another component or device to verify whether the image portion 134 matches an identification token template, receives a notification from the other component or device indicating whether the image portion 134 matches the identification token template, generates the output 118 based on the notification indicating whether the image portion 134 matches the identification token template, or a combination thereof.

[0065] In some implementations, the identification token template indicates one or more required elements, one or more optional elements, or a combination thereof. The face comparison engine 158 determines that the image portion 134 matches the identification token template based at least in part on determining that the image portion 134 includes all of the required elements, and that each element of the image portion 134 corresponds to a corresponding one of the required elements or optional elements.

[0066] In some implementations, the face comparison engine 158, in response to determining that the face 184 matches the face 186, determines whether the text 137 satisfies a criterion. In an example, the text 137 indicates a date of birth and the criterion includes an age restriction. To illustrate, the age restriction indicates that people below a particular age are not allowed to enter. The face comparison engine 158 determines a first age corresponding to the date of birth, determines whether the criterion is satisfied based on a comparison of the first age and the particular age, and generates the output 118 indicating whether the first age satisfies the age restriction. For example, the face comparison engine 158, in response to determining that the first age is greater than or equal to the particular age, generates the output 118 indicating a green checkmark, indicating that the age restriction is satisfied. Alternatively, the face comparison engine 158, in response to determining that the first age is less than the particular age, generates the output 118 indicating a stop sign, indicating that the age restriction is not satisfied.

[0067] In some examples, the face comparison engine 158 can process the image portion 136 to determine an estimated age of the face 186 (i.e., the estimated age of the person whose face 186 is captured by the camera(s) 126). For example, the face comparison engine 158 uses an age estimation neural network to process the image portion 136 to generate the estimated age. The age estimation neural network is trained using training images of people of various ages. During training, the age estimation neural network processes a training image to generate an estimated age, and the age estimation neural network is updated based on a difference between the estimated age and an actual age of the person in the training image.

[0068] The face comparison engine 158 determines whether the estimated age of the face 186 matches the first age corresponding to the date of birth indicated by the image portion 134. To illustrate, the face comparison engine 158, in response to determining that a difference between the estimated age and the first age is within an age threshold, determines that the estimated age matches the first age. The face comparison engine 158 generates the output 118 based at least in part on determining whether the estimated age matches the first age. For example, the face comparison engine 158, in response to determining that the first age does not match the estimated age, generates the output 118 indicating a caution sign recommending additional age verification.

[0069] In some implementations, the face comparison engine 158 verifies that the text 137 indicates valid data, and generates output 118 based at least in part on the verification. For example, the text 137 indicates a name and an identifier. A criterion includes valid name and identifier matching using stored records of names and identifiers. For example, the face comparison engine 158 has access to identifier mapping data that includes records of valid names and identifiers. In a particular aspect, the face comparison engine 158 accesses the identifier mapping data stored at a memory device, a network device, a database, or a combination thereof. The face comparison engine 158, based on determining that the identifier mapping data indicates that the identifier is valid for the name, determines that the text 137 satisfies the criterion and generates the output 118 indicating that the text 137 satisfies the criterion.

[0070] The face comparison engine 158 thus enables performing face image matching based on feature comparison. The face image matching can be performed without access to a database storing representations of faces, and independently of network access. As a result, the face matching aspects (e.g., of a security system) remain operational even if the network is down, which makes the disclosed technique preferable to existing systems that need constant network uptime. The image enhancement 172 improves the robustness of the face image matching. For example, face image matching can work in low light conditions with a relatively low quality camera and on relatively modest computing hardware. In some examples, one or more operations of the image enhancements 172 (e.g., resolution enhancement, brightening, and/or deblurring) can be daisy-chained to reduce resource usage (e.g., computing cycles, memory, time, or a combination thereof). To illustrate, in some implementations, values generated by one enhancement neural network (e.g., output values generated by one or more node layers of the enhancement neural network) are provided as input values to be processed by another enhancement neural network, e.g., without an intermediate step of converting the network feature values to an enhanced image and generating the input values from the enhanced image, thereby reducing resource usage. In some examples, the image enhancement 172 of the image portion 134 can be performed concurrently with the face alignment 174 of the image portion 136 to reduce an overall processing time of performing the face image matching.

[0071] The face alignment 174 enables face image matching to be performed independently of an angle at which the identification token 138 is held up to the one or more cameras 126 and independently of a head tilt of the person. The feature extraction 176 enables matching the face 184 and the face 186 independently of elements that can change for the same person, such as hairstyle, eyewear, facial hair, etc. With more weight given to elements that typically do not change substantially over time for a person, such as skull shape, nose width, relative eye width, ear shape, etc., the feature extraction 176 enables matching the face 186 to the face 184 for the same person, even if the face 184 corresponds to an older photo of the person.

[0072] The specific components of the system 100 are provided as an illustrative non-limiting example. In other examples, one or more of the components shown in FIG. 1 can be omitted, one or more additional components can be included, or a combination thereof. In an example, at least one of the one or more interface devices 132 that is useable to access the one or more networks 140 can be omitted from the system 100 because the face comparison engine 158 can operate without network access in at least some implementations. In another example, the keyboard 122, the pointing device 124, or both, can be omitted from the system 100. In some examples, the one or more output devices 130 can include a device (e.g., a lock, a switch, an alarm, etc.) that is activated or deactivated based on the output 118.

[0073] Referring to FIG. 2, a diagram 200 is shown of an example of operations that may be performed by the face comparison engine 158 of FIG. 1. For example, the face comparison engine 158 performs face detection 170 on the image data 133 to determine the image portion 134 corresponding to the face 184 and the image portion 136 corresponding to the face 186.

[0074] The face comparison engine 158 performs the image enhancement 172 on the image portion 134. For example, the face comparison engine 158 performs resolution enhancement 202, de-blurring 204, brightening 206, one or more additional enhancements, or a combination thereof, on the image portion 134 to generate an enhanced image portion 234. In some implementations, the face comparison engine 158 uses a resolution enhancement neural network to perform the resolution enhancement 202, a de -blurring neural network to perform the deblurring 204, a brightening neural network to perform the brightening 206, one or more additional neural networks to perform one or more additional enhancements, or a combination thereof.

[0075] In an example, the resolution enhancement neural network is trained using original images and low-resolution training images generated from the original images. To illustrate, a neural network trainer uses the resolution enhancement neural network to process a low- resolution training image to generate an estimated image. The neural network trainer determines a loss metric based on a difference between an original image (corresponding to the low- resolution training image) and the estimated image, and updates (e.g., the weights, biases, or both, of) the resolution enhancement neural network based on the loss metric. Tn some aspects, the one or more applications 156 of FIG. 1 include the neural network trainer. In other aspects, the neural network trainer is included in the one or more other devices 144 that send the resolution enhancement neural network to the one or more memory devices 142. The face comparison engine 158 uses the resolution enhancement neural network to enhance (e.g., increase) resolution of the image portion 134 to generate the enhanced image portion 234, as further described with reference to FIG. 3.

[0076] In an example, the de-blurring neural network is trained using original images and blurred training images generated from the original images. To illustrate, a neural network trainer uses the de-blurring neural network to process a blurred training image to generate an estimated image. The neural network trainer determines a loss metric based on a difference between an original image (corresponding to the blurred training image) and the estimated image, and updates (e.g., the weights, biases, or both, of) the de-blurring neural network based on the loss metric. In some aspects, the one or more applications 156 of FIG. 1 include the neural network trainer. In other aspects, the neural network trainer is included in the one or more other devices 144 that send the de-blurring neural network to the one or more memory devices 142. The face comparison engine 158 uses the de-blurring neural network to reduce blurring in the image portion 134 to generate the enhanced image portion 234.

[0077] In an example, the brightening neural network is trained using original images and darkened training images generated from the original images. To illustrate, a neural network trainer uses the brightening neural network to process a darkened training image to generate an estimated image. The neural network trainer determines a loss metric based on a difference between an original image (corresponding to the darkened training image) and the estimated image, and updates (e.g., the weights, biases, or both, of) the brightening neural network based on the loss metric. In some aspects, the one or more applications 156 of FIG. 1 include the neural network trainer. In other aspects, the neural network trainer is included in the one or more other devices 144 that send the brightening neural network to the one or more memory devices 142. The face comparison engine 158 uses the brightening neural network to brighten the image portion 134 to generate the enhanced image portion 234.

[0078] The resolution enhancement 202, the de-blurring 204, the brightening 206, one or more additional enhancements, or a combination thereof, can be performed in various orders. For example, the face comparison engine 158 performs the resolution enhancement 202 on the image portion 134 to generate a resolution enhanced image, performs the de-blurring 204 on the resolution enhanced image to generate a de-blurred image, and performs the brightening 206 on the de-blurred image to generate the enhanced image portion 234. As another example, the face comparison engine 158 performs the de-blurring 204 on the image portion 134 to generate a deblurred image, performs the brightening 206 on the de-blurred image to generate a brightened image, and performs the resolution enhancement 202 on the brightened image to generate the enhanced image portion 234. In some aspects, one or more of the resolution enhancement 202, the de-blurring 204, the brightening 206, one or more additional enhancements, or a combination thereof, can be performed at least partially concurrently. [0079] Tn the example illustrated in the diagram 200, the image enhancement 172 is performed on the image portion 134, and not performed on the image portion 136. For example, the image portion 134 corresponds to an image of the identification token 138 and the representation of the face 184 in the image portion 134 may be smaller in size than the representation of the face 186 in the image portion 136. Performing the image enhancement 172 on the image portion 134 may result in a larger increase in matching accuracy, as compared to performing the image enhancement 172 on the image portion 136. Performing the image enhancement 172 on the image portion 134 can improve matching accuracy, while refraining from performing image enhancement 172 on the image portion 136 conserves resources. In other examples, the face comparison engine 158 can also perform the image enhancement 172 on the image portion 136 to generate a second enhanced image portion, and perform the face alignment 174 on the enhanced image portion 234 and the second enhanced image portion.

[0080] In some implementations, the face comparison engine 158 performs the face alignment 174 on the image portion 136 to generate an aligned image portion 246, performs the face alignment 174 on the enhanced image portion 234 to generate an aligned image portion 244, or both. To illustrate, the face 184 is at a first angle (e.g., an angle of deviation of a line of symmetry of the face 186 from vertical) in the enhanced image portion 234, and the face 186 is at a second angle in the image portion 136. In some examples, the face comparison engine 158 selectively performs the face alignment 174 in response to determining that a difference between the first angle and the second angle is greater than an angle threshold (e.g., 5 degrees). The face 184 has a first particular angle in the aligned image portion 244, and the face 186 has a second particular angle in the aligned image portion 246. The second particular angle is within the angle threshold (e.g., 5 degrees) of the first particular angle.

[0081] In some implementations, the face comparison engine 158 performs the face alignment 174 by applying a rotation to a single one of the enhanced image portion 234 or the image portion 136 to generate the aligned image portion 244 and the aligned image portion 246. For example, the face comparison engine 158 applies a rotation to the enhanced image portion 234 to generate the aligned image portion 244 and copies the image portion 136 to generate the aligned image portion 246. The rotation is applied to the enhanced image portion 234 so that the angle of the face 184 in the aligned image portion 244 matches the angle of the face 186 in the image portion 136 (and the aligned image portion 246).

[0082] In some implementations, the face comparison engine 158 performs the face alignment 174 by applying a first rotation to the enhanced image portion 234 to generate the aligned image portion 244 and by applying a second rotation to the image portion 136 to generate the aligned image portion 246. For example, the face comparison engine 158 applies the first rotation so that the face 184 has a pre-determined angle (e.g., vertical) in the aligned image portion 244, and applies the second rotation so that the face 186 has the pre-determined angle (e.g., vertical) in the aligned image portion 246. In a particular aspect, the pre-determined angle is based on default data, a configuration setting, user input, or a combination thereof.

[0083] In some implementations, the face comparison engine 158 performs the face alignment 174 on the image portion 136 to generate the aligned image portion 246 concurrently with the face comparison engine 158 performing the image enhancement 172 on the image portion 134 to generate the enhanced image portion 234. In these implementations, the face comparison engine 158 performs the face alignment 174 based on the pre-determined angle (e.g., vertical). For example, the face comparison engine 158 performs the face alignment 174 on the image portion 136 to generate the aligned image portion 246 such that the face 186 has the pre-determined angle in the aligned image portion 246. The face comparison engine 158 performs the image enhancement 172 on the image portion 134 to generate the enhanced image portion 234 concurrently with performing the face alignment 174 to generate the aligned image portion 246. The face comparison engine 158, subsequent to generating the enhanced image portion 234, performs the face alignment 174 on the enhanced image portion 234 to generate the aligned image portion 244 such that the face 184 has the pre-determined angle in the aligned image portion 246. Performing the face alignment 174 at least partially concurrently with the image enhancement 172 can reduce processing time of performing the face image matching.

[0084] The face comparison engine 158 performs the feature extraction 176 on the aligned image portion 244 to generate feature values 254, as further described with reference to FIG. 4. Similarly, the face comparison engine 158 performs the feature extraction 176 on the aligned image portion 246 to generate feature values 256, as further described with reference to FIG. 4. Tn a particular aspect, the feature values 254 and the feature values 256 are associated with (e.g., give greater weight to) facial elements that do not typically change significantly for the same person, such as skull shape, nose width, etc. In a particular aspect, the feature values 254 and the feature values 256 are independent of (e.g., give less weight to) elements that can change significantly for the same person, such as hairstyle, eyewear, facial hair, makeup, etc.

[0085] The face comparison engine 158 performs the feature comparison 178 of the feature values 254 and the feature values 256 to generate a result 268. For example, the face comparison engine 158 determines differences between the feature values 254 and the feature values 256. In a particular aspect, the feature values 254 correspond to a first feature vector, the feature values 256 correspond to a second feature vector, and the differences correspond to a vector difference. To illustrate, the feature values 254 include fvi,i, .. ., fvi,_n, where fvi,_n corresponds to a feature value, included in the feature values 254, of an nth feature. The feature values 256 include fv2,i, ..., fv2,n, where fv 2, n corresponds to a feature value, included in the feature values 256, of the nth feature. The differences between the feature values 254 and the feature values 256 correspond to fvd,i, ..., fvd,n, where fvd,n corresponds to a difference between fvi,_n and fv2,n.

[0086] The face comparison engine 158, in response to determining that the differences (e.g., fvd,i, ..., fvd,n) fail to satisfy a match criterion, generates the result 268 indicating that the face 184 does not match the face 186. Alternatively, the face comparison engine 158, in response to determining that the differences satisfy the match criterion, generates the result 268 indicating that the face 184 matches the face 186. In an example, the face comparison engine 158, in response to determining that an average (e.g., mean, median, or mode) of the differences (e.g., fvd,i, ..., fvd,n) is less than or equal to a difference threshold, determines that the differences satisfy the match criterion. In another example, the face comparison engine 158, in response to determining that each of a particular subset of feature differences is less than a respective difference threshold, determines that the differences satisfy the match criterion. To illustrate, the face comparison engine 158 in response to determining that a first feature difference (e.g., fvd,x, where x is an integer between 1 and n) is less than a first difference threshold, a second feature difference (e.g., fvd,y, where y is an integer between 1 and n that is not equal to x) is less than a second difference threshold, one or more additional feature differences are less than corresponding difference thresholds, or a combination thereof, that the differences satisfy the match criterion.

[0087] The face comparison engine 158 performs the output generation 180 based on the result 268 to generate the output 118. For example, the output 118 can indicate the result 268. In some implementations, the output 118 can include, based on the result 268, a remote command, an authentication token, or both. For example, the face comparison engine 158, in response to determining that the result 268 indicates that the face 184 matches the face 186, generates the output 118 to include an authentication token, a remote command to enable access, or both. Alternatively, the face comparison engine 158, in response to determining that the result 268 indicates that the face 184 does not match the face 186, generates the output 118 to indicate that authentication is unsuccessful, a remote command to disable access, or both.

[0088] In some implementations, one or more operations of the diagram 200 can be bypassed (e.g., performed selectively). For example, the face comparison engine 158 can bypass one or more of the resolution enhancement 202, the de-blurring 204, the brightening 206, one or more additional image enhancements, the face alignment 174, or a combination thereof, based on the image portion 134, the image portion 136, a remaining battery life, an environmental context, or a combination thereof.

[0089] In a particular aspect, the face comparison engine 158 refrains from performing the resolution enhancement 202 in response to determining that a resolution criterion is satisfied. For example, the face comparison engine 158 determines that the resolution criterion is satisfied in response to determining a resolution (e.g., a count of pixels) of the image portion 134 is greater than a resolution threshold, a remaining battery life is less than a battery threshold, an image resolution of a camera 126 used to capture the image portion 134 is greater than the resolution threshold, or a combination thereof. In a particular aspect, the face comparison engine 158, in response to determining that the resolution criterion is satisfied generates the enhanced image portion 234 from the image portion 134 without performing the resolution enhancement 202.

[0090] In a particular aspect, the face comparison engine 158 refrains from performing the deblurring 204 in response to determining that a de-blurring criterion is satisfied. For example, the face comparison engine 158 determines that the de-blurring criterion is satisfied in response to determining that an amount of blurring detected in the image portion 134 is less than a blurring threshold, a remaining battery life is less than a battery threshold, or a combination thereof. In a particular aspect, the face comparison engine 158, in response to determining that the de-blurring criterion is satisfied generates the enhanced image portion 234 from the image portion 134 without performing the de-blurring 204.

[0091] In a particular aspect, the face comparison engine 158 refrains from performing the brightening 206 in response to determining that a brightness criterion is satisfied. For example, the face comparison engine 158 determines that the brightness criterion is satisfied in response to determining that an amount of brightness detected in the image portion 134 is greater than a brightness threshold, a remaining battery life is less than a battery threshold, detected light at a camera 126 (that is used to capture the image portion 134) is greater than a light threshold, or a combination thereof. In a particular aspect, the face comparison engine 158, in response to determining that the brightness criterion is satisfied generates the enhanced image portion 234 from the image portion 134 without performing the brightening 206. In an example in which none of the image enhancements 172 are performed, the enhanced image portion 234 corresponds to a copy of the image portion 134.

[0092] In a particular aspect, the face comparison engine 158 refrains from performing the face alignment 174 in response to determining that an alignment criterion is satisfied. For example, the face comparison engine 158 determines that the alignment criterion is satisfied in response to determining that a first angle of the face 184 in the enhanced image portion 234 is within an angle threshold (e.g., 5 degrees) of a second angle of the face 186 in the image portion 136, a remaining battery life is less than a battery threshold, or a combination thereof. In an example in which the alignment criterion is satisfied, the aligned image portion 244 corresponds to a copy of the enhanced image portion 234 and the aligned image portion 246 corresponds to a copy of the image portion 136.

[0093] Referring to FIG. 3, a diagram 300 is shown of an example of a neural network 380 of the system 100 of FIG. 1 that is configured to perform the resolution enhancement 202 of FIG. 2. The neural network 380 includes node layers 340, such as a node layer 340A, a node layer 340B, a node layer 340C, a node layer 340D, a node layer 340E, one or more additional node layers, or a combination thereof. Each of the node layers 340 includes a plurality of nodes. [0094] Tn an example, the neural network 380 includes an input layer (e.g., the node layer 340A), one or more hidden layers (e.g., the node layer 340B, the node layer 340C, the node layer 340D, one or more additional node layers, or a combination thereof), and an output layer (e.g., the node layer 340E). Each subsequent node layer has links with a previous node layer of the node layers 340. In some examples, the node layers 340 include fully connected layers. Each of the links has an associated link weight.

[0095] The face comparison engine 158 uses the neural network 380 to perform the resolution enhancement 202 on an image portion 330 to generate a resolution enhanced image portion 350. For example, the face comparison engine 158 extracts input values representing the image portion 330, and provides the input values to the input layer (e.g., the node layer 340A) of the neural network 380. The node layers 340 process the input values to generate network feature values 344 from the output layer (e.g., the node layer 340E). The face comparison engine 158 generates the resolution enhanced image portion 350 corresponding to the network feature values 344. For example, the face comparison engine 158 determines pixel values of pixels of the resolution enhanced image portion 350 based on the network feature values 344. The resolution enhanced image portion 350 has a higher resolution than the image portion 330. For example, the resolution enhanced image portion 350 has a second count of pixels that is greater than a first count of pixels of the image portion 330.

[0096] Tn a particular aspect, the image portion 330 corresponds to the image portion 134 and the resolution enhanced image portion 350 is used to generate the enhanced image portion 234. In some implementations, the image portion 330 corresponds to the image portion 136 and the face comparison engine 158 performs the face alignment 174 on the resolution enhanced image portion 350 to generate the aligned image portion 246.

[0097] In an example 360, a portion 332 of the image portion 330 and a portion 352 of the resolution enhanced image portion 350 are shown. The portion 352 corresponds to a resolution enhanced portion version of the portion 332. For example, the portion 352 includes four pixels corresponding to each pixel of the portion 332, resulting in the resolution enhanced image portion 350 having four times as many pixels as the image portion 330.

[0098] The portion 332 includes four pixels having four pixel values. Each pixel value (e.g., original pixel value) is represented by OA,B, where A corresponds to a row of the portion 332 and B corresponds to a column of the portion 332. The portion 352 has sixteen pixels having sixteen pixel values. In the example 360, the neural network 380 outputs the network feature values 344, such that four of the sixteen pixels of the portion 352 have the same pixel values as the four pixels of the portion 332, and the remaining 12 pixels of the portion 352 have new pixel values. Each new pixel value is represented by NC,D, where C corresponds to a row of the portion 352 and D corresponds to a column of the portion 352.

[0099] In some implementations, each new pixel value of each portion of the resolution enhanced image portion 350 has the same relationship to neighboring pixel values. For example, for any portion of the resolution enhanced image portion 350, NC,D is based on the same function applied to one or more neighboring pixel values.

[0100] In some implementations, a new pixel value of a portion of the resolution enhanced image portion 350 is based on a type of facial element that the portion represents and the original pixel values. For example, when the pixels of the portion 332 represent a facial element (e.g., a left ear) of a facial element type (e.g., an ear), the new pixel values of the portion 352 generated by the face comparison engine 158 have a first relationship with one or more neighboring pixel values. Alternatively, when the portion 332 represents an eye, the new pixel values of the portion 352 generated by the face comparison engine 158 have a second relationship with one or more neighboring pixel values that is distinct from the first relationship. The portion 352 represents the same facial element in the resolution enhanced image portion 350 that the portion 332 represents in the image portion 330. The neural network 380 is thus trained to generate feature values that are based on the input feature values as well as the facial element represented by the input feature values.

[0101] In a particular aspect, during training, a neural network trainer generates a low-resolution training image portion of an actual image portion. The neural network trainer uses the neural network 380 to process the low-resolution training image portion as the image portion 330 to generate network feature values 344, and uses the network feature values 344 to generate a resolution enhanced image portion 350. The neural network trainer determines a loss metric based on a comparison of the resolution enhanced image portion 350 and the actual image portion, and updates the link weights to reduce the loss metric. [0102] The neural network 380 thus enables resolution enhancement that accounts for the facial element type of the pixel values being processed. The resolution enhanced image portion 350 can thus more accurately approximate an actual higher resolution image that could correspond to the image portion 330.

[0103] Referring to FIG. 4, a diagram 400 is shown of an example of a neural network 480 that is configured to perform the feature extraction 176 of FIG. 1. The neural network 480 includes node layers 440, such as a node layer 440A, a node layer 440B, a node layer 440C, a node layer 440D, a node layer 440E, one or more additional node layers, or a combination thereof. Each of the node layers 440 includes a plurality of nodes.

[0104] In an example, the neural network 480 includes an input layer (e.g., the node layer 440A), one or more hidden layers (e.g., the node layer 440B, the node layer 440C, the node layer 440D, one or more additional node layers, or a combination thereof), and an output layer (e.g., the node layer 440E). Each subsequent node layer has links with a previous node layer of the node layers 440. In some examples, the node layers 440 include fully connected layers. Each of the links has an associated link weight.

[0105] The face comparison engine 158 uses the neural network 480 to perform the feature extraction 176 on an image portion 430 to generate normalized feature values 450. For example, the face comparison engine 158 extracts input values representing the image portion 430, and provides the input values to the input layer (e.g., the node layer 440A) of the neural network 480. The node layers 440 process the input values to generate network feature values 444 from the output layer (e.g., the node layer 440E). The face comparison engine 158 applies normalization (e.g., L2 normalization) to the network feature values 444 to generate the normalized feature values 450.

[0106] In a particular aspect, the image portion 430 corresponds to the aligned image portion 244 and the normalized feature values 450 correspond to the feature values 254. In another aspect, the image portion 430 corresponds to the aligned image portion 246 and the normalized feature values 450 correspond to the feature values 256.

[0107] In a particular aspect, during training, a neural network trainer uses training image data to train the neural network 480 to disregard (e.g., apply lower link weights) to one or more of the input values that correspond to changeable elements that can change for the same person, and to consider (e.g., apply higher link weights to) one or more of the input values that correspond to static elements that typically do not change substantially for the same person. Non-limiting examples of changeable elements include eyewear, hair style, hair color, facial hair, makeup usage, or a combination thereof. Non-limiting examples of static elements include a shape of a skull, a width of a nose, a shape of the nose, a space between eyes relative to a width of a face, a shape of an eye, a shape of an ear, a shape of a face, or a combination thereof.

[0108] To illustrate, the training image data includes image portions corresponding to various changeable elements for the same person and image portions corresponding to different persons. The neural network trainer uses the neural network 480 to perform the feature extraction 176 for the image portions to generate feature values. The neural network 480 performs the feature comparison 178 between feature values of pairs of image portions to determine whether the feature values match indicating that faces represented in the pairs of image portions match. The neural network trainer determines a loss metric based on whether the feature comparison 178 correctly indicates that the feature values of a pair of image portions match. The neural network trainer adjusts the link weights of the neural network 480 to reduce the loss metric. The neural network 480 is thus trained to generate matching feature values for image portions independently of differences in the changeable elements for the same person in the image portions, and to generate feature values that correspond to the static elements for a person. Because the static elements do not typically change for up to a threshold count of years, the neural network 480 can generate feature values that match independently of age differentials up to the threshold count of years.

[0109] Using the neural network 480 thus enables the face comparison engine 158 to determine whether the face 184 matches the face 186 independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the first face and the second face.

[0110] Referring to FIG. 5, an example 552 of a face image match and an example 554 of a face image mismatch are shown. In the examples 552 and 554, the image data 133 corresponds to a plurality of image frames generated by a camera 126 of FIG. 1.

[0111] The face comparison engine 158 performs the face detection 170 of FIG. 1 to determine that a particular image frame of the image data 133 includes an image portion 134 representing a face 184 and an image portion 1 6 representing a face 186. In some examples, the face comparison engine 158 performs the image enhancement 172, the face alignment 174, or both, as described with reference to FIGS. 1-2.

[0112] The face comparison engine 158 performs the feature extraction 176 on a first image portion to generate the feature values 254, as described with reference to FIGS. 1-2. In a particular example, the first image portion includes the image portion 134, the enhanced image portion 234, or the aligned image portion 244 of FIG. 2.

[0113] The face comparison engine 158 performs the feature extraction 176 on a second image portion to generate the feature values 256, as described with reference to FIGS. 1-2. In a particular example, the second image portion includes the image portion 136, an enhanced image portion based on the image portion 136, or the aligned image portion 246 of FIG. 2.

[0114] The face comparison engine 158 determines whether the face 184 matches the face 186 based on differences between the feature values 254 and the feature values 256, as described with reference to FIG. 2. The face comparison engine 158 generates the result 268 (e.g., “matched” or “not matched”) indicating whether the face 184 matches the face 186, as described with reference to FIG. 2.

[0115] In a particular aspect, the face comparison engine 158 determines, based on the differences between the feature values 254 and the feature values 256, a similarity 568 between the face 184 and the face 186. For example, the similarity 568 is based on a ratio of a count of similar features and a count of total features (e.g., similarity = (count of similar features) I (count of total features)). To illustrate, the face comparison engine 158 includes a feature m in the similar features in response to determining that a feature difference (e.g., fvd,m) of the feature m is less than a corresponding threshold. The similarity 568 thus indicates a proportion of features that match (e.g., have a feature difference less than a corresponding threshold) between the face 184 and the face 186. The face comparison engine 158 generates the output 118 indicating the result 268, the similarity 568, or both.

[0116] Tn the example 552, one or more images of a person holding up an identification token 138 are captured by a camera 126. A particular image frame includes the image portion 134 indicating the face 184 with a first hair style (e.g., tied hair) and without eye wear, and the image portion 136 indicating the face 186 with a second hair style (e.g., untied hair) and with eye wear. The face comparison engine 158 determines, based on feature values 254 of the image portion 134 and the feature values 256 of the image portion 136, the similarity 568 (c.g., 55.90%) and the result 268 (e.g., matched). In a particular aspect, the face comparison engine 158 generates the output 118 that includes the similarity 568 (e.g., 55.90%) and the result 268 (e.g., matched) overlaid on the particular image frame, and provides the output 118 to an output device 130 (e.g., display device).

[0117] In a particular aspect, the face comparison engine 158 determines that the particular image frame includes an identification token portion that includes the image portion 134, a text element 530, a text element 532, one or more additional text elements, one or more design elements, or a combination thereof. In a particular aspect, the face comparison engine 158, based on a comparison of the identification token portion and an identification token template, determines that the text element 530 corresponds to an identifier and the text element 532 corresponds to a name. The face comparison engine 158 performs text recognition on the text element 530 and the text element 532 to detect the identifier (e.g., “AB 12FG345968”) and the name (e.g., “Sushmita Rai”), respectively. In a particular aspect, the identification token portion corresponds to a mirror image of the identification token 138. In this aspect, the face comparison engine 158 performs a rotation operation on the text element 530 and the text element 532 prior to performing the text recognition.

[0118] Tn the example 554, one or more images of a person holding up an identification token 138 of another person are captured by a camera 126. A particular image frame includes an image portion 134 indicating the face 184 of a person and an image portion 136 indicating the face 186 of another person. The face comparison engine 158 determines, based on feature values 254 of the image portion 134 and the feature values 256 of the image portion 136, the similarity 568 (e.g., 4.63%) and the result 268 (e.g., not matched). In a particular aspect, the face comparison engine 158 generates the output 118 that includes the similarity 568 (e.g., 4.63%) and the result 268 (e.g., not matched) overlaid on the particular image frame, and provides the output 118 to an output device 130 (e.g., display device).

[0119] FIG. 6 is a flow chart of an example of a method 600 in accordance with some examples of the present disclosure. One or more operations described with reference to FIG. 6 may be performed by the face comparison engine 158, the system 100 of FIG. 1 , or both, such as by the one or more processors 110 executing the instructions 146.

[0120] The method 600 includes, at 602, determining that image data includes a first image portion corresponding to a first face. For example, the face comparison engine 158 performs the face detection 170 to determine that the image data 133 includes the image portion 134 corresponding to the face 184, as described with reference to FIGS. 1 and 2.

[0121] The method 600 also includes, at 604, determining that the image data includes a second image portion corresponding to a second face. For example, the face comparison engine 158 performs the face detection 170 to determine that the image data 133 includes the image portion 136 corresponding to the face 186, as described with reference to FIGS. 1 and 2.

[0122] The method 600 further includes, at 606, generating, based on the first image portion, first feature values representing the first face. For example, the face comparison engine 158 performs the feature extraction 176 to generate, based on the image portion 134, the feature values 254 representing the face 184, as described with reference to FIGS. 1, 2, and 4.

[0123] The method 600 also includes, at 608, generating, based on the second image portion, second feature values representing the second face. For example, the face comparison engine 158 performs the feature extraction 176 to generate, based on the image portion 136, the feature values 256 representing the face 186, as described with reference to FIGS. 1 , 2, and 4.

[0124] The method 600 further includes, at 610, determining, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face. For example, the face comparison engine 158 performs the feature comparison 178 of the feature values 254 and the feature values 256 to determine whether the face 184 matches the face 186, as described with reference to FIGS. 1 and 2.

[0125] The method 600 thus enables performing face image matching based on feature comparison. The face image matching can be performed without access to a database storing representations of faces, and independently of network access. The feature extraction 176 enables matching the face 184 and the face 186 independently of elements that can change for the same person, such as hairstyle, eyewear, facial hair, etc. With more importance given to elements that typically do not change substantially over time for a person, such as skull shape, nose width, relative eye width, ear shape, etc., the feature extraction 176 enables matching the face 186 to the face 184 even if the face 184 corresponds to an older photo of the same person.

[0126] The systems and methods illustrated herein may be described in terms of functional block components, screen shots, optional selections and various processing steps. It should be appreciated that such functional blocks may be realized by any number of hardware and/or software components configured to perform the specified functions. For example, the system may employ various integrated circuit components, e.g., memory elements, processing elements, logic elements, look-up tables, and the like, which may carry out a variety of functions under the control of one or more microprocessors or other control devices. Similarly, the software elements of the system may be implemented with any programming or scripting language such as C, C++, C#, Java, JavaScript, VBScript, Macromedia Cold Fusion, COBOL, Microsoft Active Server Pages, assembly, PERL, PHP, AWK, Python, Visual Basic, SQL Stored Procedures, PL/SQL, any UNIX shell script, and extensible markup language (XML) with the various algorithms being implemented with any combination of data structures, objects, processes, routines or other programming elements. Further, it should be noted that the system may employ any number of techniques for data transmission, signaling, data processing, network control, and the like.

[0127] The systems and methods of the present disclosure may be embodied as a customization of an existing system, an add-on product, a processing apparatus executing upgraded software, a standalone system, a distributed system, a method, a data processing system, a device for data processing, and/or a computer program product. Accordingly, any portion of the system or a module or a decision model may take the form of a processing apparatus executing code, an internet based (e.g., cloud computing) embodiment, an entirely hardware embodiment, or an embodiment combining aspects of the internet, software and hardware. Furthermore, the system may take the form of a computer program product on a computer-readable storage medium or device having computer-readable program code (e.g., instructions) embodied or stored in the storage medium or device. Any suitable computer-readable storage medium or device may be utilized, including hard disks, CD-ROM, optical storage devices, magnetic storage devices, and/or other storage media. As used herein, a “computer-readable storage medium” or “computer-readable storage device” is not a signal. [0128] Systems and methods may be described herein with reference to screen shots, block diagrams and flowchart illustrations of methods, apparatuses (c.g., systems), and computer media according to various aspects. It will be understood that each functional block of a block diagrams and flowchart illustration, and combinations of functional blocks in block diagrams and flowchart illustrations, respectively, can be implemented by computer program instructions.

[0129] Computer program instructions may be loaded onto a computer or other programmable data processing apparatus to produce a machine, such that the instructions that execute on the computer or other programmable data processing apparatus create means for implementing the functions specified in the flowchart block or blocks. These computer program instructions may also be stored in a computer-readable memory or device that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer- implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.

[0130] Accordingly, functional blocks of the block diagrams and flowchart illustrations support combinations of means for performing the specified functions, combinations of steps for performing the specified functions, and program instruction means for performing the specified functions. It will also be understood that each functional block of the block diagrams and flowchart illustrations, and combinations of functional blocks in the block diagrams and flowchart illustrations, can be implemented by either special purpose hardware-based computer systems which perform the specified functions or steps, or suitable combinations of special purpose hardware and computer instructions.

[0131] Particular aspects of the disclosure are described below in the following Examples:

[0132] According to Example 1, a device includes: one or more processors configured to: determine that image data includes a first image portion corresponding to a first face; determine that the image data includes a second image portion corresponding to a second face; generate, based on the first image portion, first feature values representing the first face; generate, based on the second image portion, second feature values representing the second face; and determine, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face.

[0133] Example 2 includes the device of Example 1 , wherein the first image portion and the second image portion are included in a single image frame.

[0134] Example 3 includes the device of Example 2, wherein the one or more processors are configured to receive the single image frame from a camera.

[0135] Example 4 includes the device of Example 1 , wherein the first image portion is included in a first image frame that is distinct from a second image frame that includes the second image portion.

[0136] Example 5 includes the device of Example 4, wherein the first image frame is received from a first camera, and wherein the second image frame is received from a second camera that is distinct from the first camera.

[0137] Example 6 includes the device of Example 4, wherein the one or more processors are configured to receive the first image frame and the second image frame from a single camera.

[0138] Example 7 includes the device of any of Example 1 to Example 6, wherein the one or more processors are configured to receive the image data from one or more cameras.

[0139] Example 8 includes the device of any of Example 1 to Example 7, wherein the one or more processors are configured to perform one or more enhancements of the first image portion to generate a first enhanced image portion, wherein the first feature values are generated based on the first enhanced image portion.

[0140] Example 9 includes the device of Example 8, wherein the one or more processors are configured to use one or more neural networks to perform the one or more enhancements. [0141] Example 10 includes the device of Example 8 or Example 9, wherein the one or more enhancements include dc-blurring, brightening, increasing resolution, or a combination thereof.

[0142] Example 11 includes the device of any of Example 8 to Example 10, wherein the first image portion includes first pixels representing a facial element of the first face, wherein the facial element is of a facial element type, and wherein the one or more processors are configured to: apply a resolution enhancement neural network to first pixel values of the first pixels to generate second pixel values, the second pixel values based on the facial element type; and generate the first enhanced image portion including first particular pixels having the first pixel values and second particular pixels having the second pixel values, wherein the first enhanced image portion has a higher resolution than the first image portion, and wherein the first particular pixels and the second particular pixels represent the facial element in the first enhanced image portion.

[0143] Example 12 includes the device of Example 11, wherein the facial element type indicates that the facial element corresponds to at least one of a forehead, an eyebrow, an eye, a nose, a cheek, a lip, or an ear.

[0144] Example 13 includes the device of any of Example 1 to Example 12, wherein the first feature values indicate features of the first face including at least one of a shape of a skull, a width of a nose, a shape of the nose, a space between eyes relative to a width of the first face, a shape of an eye, a shape of an ear, or a shape of the first face.

[0145] Example 14 includes the device of any of Example 1 to Example 13, wherein the one or more processors are configured to use a neural network to process the first image portion to generate the first feature values.

[0146] Example 15 includes the device of any of Example 1 to Example 14, wherein the first image portion corresponds to a photo of the first face captured by one or more cameras.

[0147] Example 16 includes the device of any of Example 1 to Example 15, wherein the second image portion corresponds to a live image of the second face captured by one or more cameras. [0148] Example 17 includes the device of any of Example 1 to Example 16, wherein the first image portion corresponds to an image captured of an identification token, and wherein the one or more processors are configured to, in response to determining that the first face matches the second face: perform text recognition on the first image portion to generate text; and determine whether the text satisfies a criterion.

[0149] Example 18 includes the device of Example 17, wherein the identification token includes a physical token or a digital token.

[0150] Example 19 includes the device of Example 17 or Example 18, wherein the text indicates a date of birth, and wherein the criterion includes an age restriction.

[0151] Example 20 includes the device of any of Example 17 to Example 19, wherein the text indicates a name and an identifier, wherein the criterion includes valid name and identifier matching, and wherein the one or more processors are configured to, based on determining that identifier mapping data indicates that the identifier is valid for the name, determine that the text satisfies the criterion.

[0152] Example 21 includes the device of Example 20, wherein the identifier includes a social security number, a driver’s license number, a passport number, a badge number, another identifier type, or a combination thereof.

[0153] Example 22 includes the device of any of Example 17 to Example 21, wherein the identification token includes a passport, a driver’s license, another type of government issued identification token, an access badge, or another type of private issued identification token.

[0154] Example 23 includes the device of any of Example 1 to Example 22, wherein the first image portion corresponds to an image captured of an identification token, and wherein the one or more processors are configured to, in response to determining that the first face matches the second face, determine whether at least the first image portion matches an identification token template. [0155] Example 24 includes the device of any of Example 1 to Example 23, wherein the one or more processors arc configured to determine whether the first face matches the second face without accessing a database of stored representations of faces.

[0156] Example 25 includes the device of any of Example 1 to Example 24, wherein the one or more processors are configured to, independently of network access, generate the first feature values, generate the second feature values, and determine whether the first face matches the second face.

[0157] Example 26 includes the device of any of Example 1 to Example 25, wherein the one or more processors are configured to use a neural network to generate the first feature values and the second feature values, and wherein the neural network is trained to generate the first feature values that match the second feature values independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the first face and the second face.

[0158] Example 27 includes the device of any of Example 1 to Example 26, wherein the one or more processors are configured to determine whether the first face matches the second face independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the first face and the second face.

[0159] According to Example 28, a method includes: determining, at a device, that image data includes a first image portion corresponding to a first face; determining, at the device, that the image data includes a second image portion corresponding to a second face; generating, based on the first image portion, first feature values representing the first face; generating, based on the second image portion, second feature values representing the second face; determining, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face; and generating, at the device, an output indicating whether the first face matches the second face.

[0160] Example 29 includes the method of Example 28, wherein the first image portion and the second image portion are included in a single image frame. [0161] Example 30 includes the method of Example 29, further including receiving the single image frame from a camera.

[0162] Example 31 includes the method of Example 28, wherein the first image portion is included in a first image frame that is distinct from a second image frame that includes the second image portion.

[0163] Example 32 includes the method of Example 31, wherein the first image frame is received from a first camera, and wherein the second image frame is received from a second camera that is distinct from the first camera.

[0164] Example 33 includes the method of Example 31, further including receiving the first image frame and the second image frame from a single camera.

[0165] Example 34 includes the method of any of Example 28 to Example 33, further including receiving the image data from one or more cameras.

[0166] Example 35 includes the method of any of Example 28 to Example 34, further including performing one or more enhancements of the first image portion to generate a first enhanced image portion, wherein the first feature values are generated based on the first enhanced image portion.

[0167] Example 36 includes the method of Example 35, further including using one or more neural networks to perform the one or more enhancements.

[0168] Example 37 includes the method of Example 35 or Example 36, wherein the one or more enhancements include de-blurring, brightening, increasing resolution, or a combination thereof.

[0169] Example 38 includes the method of any of Example 35 to Example 37, further including: applying a resolution enhancement neural network to first pixel values of first pixels to generate second pixel values, the first image portion including the first pixels that represent a facial element of the first face, wherein the facial element is of a facial element type, and wherein the second pixel values are based on the facial element type; and generating the first enhanced image portion including first particular pixels having the first pixel values and second particular pixels having the second pixel values, wherein the first enhanced image portion has a higher resolution than the first image portion, and wherein the first particular pixels and the second particular pixels represent the facial clement in the first enhanced image portion.

[0170] Example 39 includes the method of Example 38, wherein the facial element type indicates that the facial element corresponds to at least one of a forehead, an eyebrow, an eye, a nose, a cheek, a lip, or an ear.

[0171] Example 40 includes the method of any of Example 28 to Example 39, wherein the first feature values indicate features of the first face including at least one of a shape of a skull, a width of a nose, a shape of the nose, a space between eyes relative to a width of the first face, a shape of an eye, a shape of an ear, or a shape of the first face.

[0172] Example 41 includes the method of any of Example 28 to Example 40, further including using a neural network to process the first image portion to generate the first feature values.

[0173] Example 42 includes the method of any of Example 28 to Example 41, wherein the first image portion corresponds to a photo of the first face captured by one or more cameras.

[0174] Example 43 includes the method of any of Example 28 to Example 42, wherein the second image portion corresponds to a live image of the second face captured by one or more cameras.

[0175] Example 44 includes the method of any of Example 28 to Example 43, further including, in response to determining that the first face matches the second face: performing text recognition on the first image portion to generate text, wherein the first image portion corresponds to an image captured of an identification token; and determining whether the text satisfies a criterion.

[0176] Example 45 includes the method of Example 44, wherein the identification token includes a physical token or a digital token.

[0177] Example 46 includes the method of Example 44 or Example 45, wherein the text indicates a date of birth, and wherein the criterion includes an age restriction. [0178] Example 47 includes the method of any of Example 44 to Example 46, further including, based on determining that identifier mapping data indicates that an identifier is valid for a name, determining that the text satisfies the criterion, wherein the text indicates a name and an identifier, and wherein the criterion includes valid name and identifier matching.

[0179] Example 48 includes the method of Example 47, wherein the identifier includes a social security number, a driver’s license number, a passport number, a badge number, another identifier type, or a combination thereof.

[0180] Example 49 includes the method of any of Example 44 to Example 48, wherein the identification token includes a passport, a driver’s license, another type of government issued identification token, an access badge, or another type of private issued identification token.

[0181] Example 50 includes the method of any of Example 28 to Example 49, further including, in response to determining that the first face matches the second face, determining whether at least the first image portion matches an identification token template, wherein the first image portion corresponds to an image captured of an identification token.

[0182] Example 51 includes the method of any of Example 28 to Example 50, further including determining whether the first face matches the second face without accessing a database of stored representations of faces.

[0183] Example 52 includes the method of any of Example 28 to Example 51, further including, independently of network access, generating the first feature values, generating the second feature values, and determining whether the first face matches the second face.

[0184] Example 53 includes the method of any of Example 28 to Example 52, further including using a neural network to generate the first feature values and the second feature values, wherein the neural network is trained to generate the first feature values that match the second feature values independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the first face and the second face. [0185] Example 54 includes the method of any of Example 28 to Example 53, further including determining whether the first face matches the second face independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the first face and the second face.

[0186] According to Example 55, a device includes one or more processors configured to execute instructions to perform the method of any of Examples 28-54.

[0187] According to Example 56, a non-transitory computer readable medium stores instructions that are executable by one or more processors to perform the method of any of Examples 28-54.

[0188] According to Example 57, a non-transitory computer-readable storage device stores instructions that, when executed by one or more processors, cause the one or more processors to: determine that image data includes a first image portion corresponding to a first face; determine that the image data includes a second image portion corresponding to a second face; generate, based on the first image portion, first feature values representing the first face; generate, based on the second image portion, second feature values representing the second face; determine, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face; and generate an output indicating whether the first face matches the second face.

[0189] Example 58 includes the non-transitory computer-readable storage device of Example 57, wherein the first image portion is included in a first image frame that is distinct from a second image frame that includes the second image portion.

[0190] Although the disclosure may include one or more methods, it is contemplated that it may be embodied as computer program instructions on a tangible computer-readable medium, such as a magnetic or optical memory or a magnetic or optical disk/disc. All structural, chemical, and functional equivalents to the elements of the above-described exemplary embodiments that are known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the present claims. Moreover, it is not necessary for a device or method to address each and every problem sought to be solved by the present disclosure, for it to be encompassed by the present claims. Furthermore, no element, component, or method step in the present disclosure is intended to be dedicated to the public regardless of whether the element, component, or method step is explicitly recited in the claims. As used herein, the terms “comprises,” “comprising,” or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

[0191] Changes and modifications may be made to the disclosed embodiments without departing from the scope of the present disclosure. These and other changes or modifications are intended to be included within the scope of the present disclosure, as expressed in the following claims.

Claims

WHAT IS CLAIMED IS:

1. A device comprising: one or more processors configured to: determine that image data includes a first image portion corresponding to a first face; determine that the image data includes a second image portion corresponding to a second face; generate, based on the first image portion, first feature values representing the first face; generate, based on the second image portion, second feature values representing the second face; and determine, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face.

2. The device of claim 1, wherein the first image portion and the second image portion are included in a single image frame, and wherein the one or more processors are configured to receive the single image frame from a camera.

3. The device of claim 1 , wherein the first image portion is included in a first image frame that is distinct from a second image frame that includes the second image portion.

4. The device of claim 3, wherein the first image frame is received from a first camera, and wherein the second image frame is received from a second camera that is distinct from the first camera.

5. The device of claim 3, wherein the one or more processors are configured to receive the first image frame and the second image frame from a single camera.

6. The device of claim 1, wherein the one or more processors are configured to perform one or more enhancements of the first image portion to generate a first enhanced image portion, wherein the first feature values are generated based on the first enhanced image portion, and wherein the one or more enhancements include de-blurring, brightening, increasing resolution, or a combination thereof.

7. The device of claim 6, wherein the first image portion includes first pixels representing a facial clement of the first face, wherein the facial clement is of a facial clement type, and wherein the one or more processors are configured to: apply a resolution enhancement neural network to first pixel values of the first pixels to generate second pixel values, the second pixel values based on the facial element type; and generate the first enhanced image portion including first particular pixels having the first pixel values and second particular pixels having the second pixel values, wherein the first enhanced image portion has a higher resolution than the first image portion, and wherein the first particular pixels and the second particular pixels represent the facial element in the first enhanced image portion.

8. The device of claim 1, wherein the first image portion corresponds to a photo of the first face captured by one or more cameras, and wherein the second image portion corresponds to a live image of the second face captured by the one or more cameras.

9. The device of claim 1, wherein the first image portion corresponds to an image captured of an identification token, and wherein the one or more processors are configured to, in response to determining that the first face matches the second face: perform text recognition on the first image portion to generate text; and determine whether the text satisfies a criterion.

10. The device of claim 9, wherein the identification token includes a physical token or a digital token.

11. The device of claim 9, wherein the text indicates a date of birth, and wherein the criterion includes an age restriction.

12. The device of claim 9, wherein the text indicates a name and an identifier, wherein the criterion includes valid name and identifier matching, and wherein the one or more processors are configured to, based on determining that identifier mapping data indicates that the identifier is valid for the name, determine that the text satisfies the criterion, and wherein the identifier includes a social security number, a driver’s license number, a passport number, a badge number, another identifier type, or a combination thereof.

13. The device of claim 9, wherein the identification token includes a passport, a driver’s license, another type of government issued identification token, an access badge, or another type of private issued identification token.

14. The device of claim 1, wherein the first image portion corresponds to an image captured of an identification token, and wherein the one or more processors are configured to, in response to determining that the first face matches the second face, determine whether at least the first image portion matches an identification token template.

15. The device of claim 1, wherein the one or more processors are configured to determine whether the first face matches the second face without accessing a database of stored representations of faces.

16. The device of claim 1, wherein the one or more processors are configured to, independently of network access, generate the first feature values, generate the second feature values, and determine whether the first face matches the second face.

17. The device of claim 1, wherein the one or more processors are configured to use a neural network to generate the first feature values and the second feature values, and wherein the neural network is trained to generate the first feature values that match the second feature values independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the first face and the second face.

18. The device of claim 1, wherein the one or more processors are configured to determine whether the first face matches the second face independently of differences in eyewear, hair style, hair color, facial hair, makeup usage, age differentials up to a threshold count of years, or a combination thereof, between the first face and the second face.

19. A method comprising: determining, at a device, that image data includes a first image portion corresponding to a first face; determining, at the device, that the image data includes a second image portion corresponding to a second face; generating, based on the first image portion, first feature values representing the first face; generating, based on the second image portion, second feature values representing the second face; determining, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face; and generating, at the device, an output indicating whether the first face matches the second face.

20. A non-transitory computer-readable storage device storing instructions that, when executed by one or more processors, cause the one or more processors to: determine that image data includes a first image portion corresponding to a first face; determine that the image data includes a second image portion corresponding to a second face; generate, based on the first image portion, first feature values representing the first face; generate, based on the second image portion, second feature values representing the second face; determine, based on a comparison of the first feature values and the second feature values, whether the first face matches the second face; and generate an output indicating whether the first face matches the second face.