CA3191888A1 - Systems and methods for private authentication with helper networks - Google Patents

Systems and methods for private authentication with helper networks

Info

Publication number
CA3191888A1
CA3191888A1 CA3191888A CA3191888A CA3191888A1 CA 3191888 A1 CA3191888 A1 CA 3191888A1 CA 3191888 A CA3191888 A CA 3191888A CA 3191888 A CA3191888 A CA 3191888A CA 3191888 A1 CA3191888 A1 CA 3191888A1
Authority
CA
Canada
Prior art keywords
helper
identification
network
authentication
validation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3191888A
Other languages
French (fr)
Inventor
Scott Edward Streit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Private Identity LLC
Original Assignee
Private Identity LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/993,596 external-priority patent/US10938852B1/en
Priority claimed from US17/155,890 external-priority patent/US11789699B2/en
Priority claimed from US17/398,555 external-priority patent/US11489866B2/en
Application filed by Private Identity LLC filed Critical Private Identity LLC
Publication of CA3191888A1 publication Critical patent/CA3191888A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

Helper neural network can play a role in augmenting authentication services that are based on neural network architectures. For example, helper networks are configured to operate as a gateway on identification information used to identify users, enroll users, and/or construct authentication models (e.g., embedding and/or prediction networks). Assuming, that both good and bad identification information samples are taken as part of identification information capture, the helper networks operate to filter out bad identification information prior to training, which prevents, for example, identification information that is valid but poorly captured from impacting identification, training, and/or prediction using various neural networks. Additionally, helper networks can also identify and prevent presentation attacks or submission of spoofed identification information as part of processing and/or validation.

Description

SYSTEMS AND METHODS FOR PRIVATE
AUTHENTICATION WITH HELPER NETWORKS
BACKGROUND
Various conventional approaches exist that attempt to implement authentication and/or identification in the context of machine learning. Some conventional approaches have developed optimizations to improve the training and predictive accuracy of the machine learning models. For example, a number of solutions use procedural programming to prepare data for processing by machine learning models. In one example, procedural programming can be used to process user images (e.g., face images) to crop or align images around user faces, to improve the image data used to train machine learning models to recognize the users. A number of approaches exist to filter training data sets to improve the training of respective machine learning models based on procedural programming or rules.
SUMMARY
The inventors have realized that there is still a need to utilize the power of machine learning models as gateways or filters on data being used for subsequent machine learning based recognition whether in authentication settings or identification settings. A similar need exists in the context of procedural recognition and other processing tasks, and machine learning models can be used as gateways or filters on data being used for any subsequent operation, including for example, procedural based or other recognition tasks whether in authentication or identification settings. According to some aspects, using machine learning to filter data or remove bad data instances enables any subsequent operation to be performed more effectively and/or with reduced error over many conventional approaches.
For example, recognition operations (e.g. identity, authentication, and/or enrollment, etc.) can be improved by validating the date used, and/or identifying invalid data before further processing occurs. It is further realized that approaches to filter data based on procedural programming fail to achieve the level of filtering required, and further fail to provide a good balance between processing requirements and accuracy.
According to various aspects, provided are authentication systems that are configured to leverage machine learning approaches in the context of pre-processing data for use in subsequent tasks, for example, recognition tasks (including e.g., recognition by machine learning models that support identification and/or authentication). The inventors have further realized that, unlike prior solutions, it is possible to create lightweight models (e.g., small file size models) that provide sufficient accuracy (e.g., >90%) in identifying features or states of input identification/authentication data to serve as a gateway for further processing. For example, the system can implement a plurality of helper networks configured to process incoming identification data (e.g., biometrics, behavioral, passive, active, etc.) and exclude data instances that would not improve identification/authentication. For example, a helper network can be trained on identification data to ensure that "good" data improves the ability to distinguish between targets to be identified or expands the circumstances (e.g., poor lighting conditions, noisy environment, bad image capture, etc.) in which subsequent operations can identify or authenticate a target. Stated broadly various embodiment validate the data used for subsequent processing, eliminating, for example, poor data instances, malicious data instances, etc.
In further example, the helper network can be trained to identify "bad" data which if used would result in a reduction in the ability to recognize a target. To illustrate, an image of a first target that is too blurry may make the blurry image of the first target resemble an image of another target. If used in a recognition data set, the result could be a reduction in the ability to distinguish between the first target and another target because of an image of the first target that, inappropriately, bears a closer resemblance to another target than the first.
Various instances of the helper networks are configured to identify and validate good data for use in recognition tasks, and identify and, for example, discard bad data that would reduce the ability to perform a recognition task.
According to some embodiments, the helper networks validate submitted identification information as good or bad data and filter the bad data from use in subsequent operations, for example, identification, authentication, enrollment, training, and in some examples, prediction.
In further embodiments, helper networks can be implemented in an authentication system and operate as a gateway for embedding neural networks, where the embedding neural networks are configured to extract encrypted features from authentication information.
The helper network can also operate as a gateway for prediction models that predict matches between input and enrolled authentication information. In other examples, the helper networks can be configured to filter identification data for any recognition task (e.g., identification, authentication, enrollment, etc.), which can be based in machine learning approaches, procedural programming approaches, etc.
According to various aspects, embedding machine learning models are used to generate encrypted embeddings from input plaintext identification information.
The
-2-embedding machine learning models can be tailored to respective authentication modalities, and similarly, helper networks can be configured to process specific authentication inputs or authentication modalities and validate the same before they are used in subsequent models.
An authentication modality can be associated with the sensor/system used to capture the authentication information (e.g., image capture for face, iris, or fingerprint, audio capture for voice, etc.), and may be further limited based on the type of information being analyzed within a data capture (e.g., face, iris, fingerprint, voice, behavior, etc.).
Broadly stated, authentication modality refers to the capability in the first instance to identify a subject to confirm an assertion of identity and/or to authenticate the subject to adjudicate identity and/or authorization based on a common set of identity information. In one example, an authentication modality can collect facial images to train a neural network on a common authentication data input. In another example, speech inputs or more generally audio inputs can be processed by a first network, where another physical biometric input (e.g., face, iris, etc.) can be processed by another network trained on the different authentication modality. In further example, image captures for user faces can be processed as a different modality from image capture for iris identification, and/or fingerprint identification.
Other authentication modalities can include behavioral identification information (e.g., speech pattern, movement patterns (e.g., angle of carrying mobile device, etc.), timing of activity, location of activity, etc.), passive identification information capture, active identification information capture, among other options.
According to another aspect, helper networks, also referred to as pre-processing neural networks and/or validation networks, are configured to operate as a gateway on identification information used to identify and/or authenticate entities.
Assuming, that both good and bad identification information samples are taken as part of information capture, the helper networks operate to filter out bad information, for example, prior to training, which prevents, for example, information that is valid but poorly captured from impacting training or prediction using various neural networks. Additionally, helper networks can also identify and prevent presentation attacks or submission of spoofed authentication. In various embodiments, filtering bad identification information samples can be used to improve machine learning identification, enrollment, and/or authentication operations as well as procedural based identification, enrollment, and/or authentication operations.
According to various aspects, training of machine learning models typically involves expansion and generation of variants of training data. These operations increase the size of the training data pool and improve the accuracy of the trained model. However, the inventors
-3-have realized that including bad data in such expanded training data sets compromises accuracy. Worse, capturing and expanding bad instances of data can multiply the detrimental effect. According to various embodiments, data validation by helper networks identifies and eliminates data that would reduce identification or authentication accuracy (i.e. bad data).
Unexpectedly, the helper networks are also able to identify bad data in this context that is undetected by human perception. This allows various embodiments to yield capability that cannot naturally be produced in a procedural programming context, where a programmer is attempting to code human based analysis (limited by human perception) of identification data.
In further aspects, the authentication system can be configured to leverage a plurality of helper neural networks (e.g., a plurality of neural networks (e.g., deep neural networks (e.g., DNNs))), where sets of helper networks can be trained to acquire and transform biometric values or types of biometrics to improve biometric capture, increase accuracy, reduce training time for embedding and/or classification networks, eliminate vulnerabilities (e.g., liveness checking and validation), and further sets of helper networks can be used to validate any type or modality of identification input. In further example, data is validated if it improves the accuracy or capability of recognition operations (e.g., improves feature embedding models, prediction models, distance evaluations, etc.). In some embodiments, by only using validated data, downstream recognition tasks can be improved over conventional approaches.
According to one aspect, an authentication system for privacy-enabled authentication is provided. The system comprises at least one processor operatively connected to a memory;
an authentication data gateway, executed by the at least one processor, configured to filter invalid identification information, the authentication data gateway comprising at least a first pre-trained geometry helper network configured to process identification information of a first type, accept as input unencrypted identification information of the fist type, and output processed identification information of the first type; and a first pre-trained validation helper network associated with the geometry helper network configured to process identification information of the first type, accept the output of the geometry helper neural network, and validate the input identification information of the first type or reject the identification information of the first type.
According to one embodiment, the authentication data gateway is configured to filter bad authentication data from training data sets used to build embedding network models.
According to one embodiment, the first pre-trained validation helper network is trained on
-4-evaluation criteria independent of the subject seeking to be enrolled or authenticated.
According to one embodiment, the authentication data gateway further comprises at least a second geometry helper network and a second validation helper network pair configured to process and valid identification information of a second type. According to one embodiment, the authentication data gateway further comprises a plurality of validation helper networks each associated with a respective type of identification information, wherein each of the plurality of validation helper networks generate a binary evaluation of respective authentication inputs to establish validity. According to one embodiment, the first pre-trained validation helper network is configured process an image input as identification information, and output a probability that the image input is invalid. According to one embodiment, the first pre-trained validation helper network is configured to process an image input as identification information, and output a probability that the image input is a presentation attack. According to one embodiment, the first pre-trained validation helper network is configured to process a video input as identification information and output a probability that the video input is invalid. According to one embodiment, the first pre-trained validation helper network is configured to process a video input as identification information and output a probability that the video input is a presentation attack.
According to one aspect, an authentication system for privacy-enabled authentication is provided. The system comprises at least one processor operatively connected to a memory;
an authentication data gateway, executed by the at least one processor, configured to filter invalid identification information, the authentication data gateway comprising at least a merged validation network associated with a first type of identification information, the merged validation network configured to process identification information of the first type and output a probability that the identification information of the first type is valid for use in enrolling a user for subsequent identification or a probability that the identification information is invalid.
According to one embodiment, the merged validation network is configured to test a plurality of binary characteristics of the identification information input.
According to one embodiment, the output probability is based at least in part on a state determined for the plurality of binary characteristics. According to one embodiment, the merged validation .. network is configured to determine if an identification information input is based on a presentation attack. According to one embodiment, the merged validation network is configured to determine if an identification information input improves training set entropy.
According to one aspect, a computer implemented method for privacy-enabled authentication is provided. The method comprises filtering, by at least one processor, invalid
-5-identification information; executing by the at least one processor, a first pre-trained geometry helper network; accepting, by the first pre-trained geometry helper network, unencrypted identification information of the fist type as input; generating processed identification information of the first type; executing by the at least one processor, a first pre-trained validation helper network; accepting the output of the geometry helper neural network; and validating the input identification information of the first type or reject the identification information of the first type.
According to one embodiment, the method further comprises filtering bad authentication data from training data sets used to build embedding network models. According to one embodiment, the method further comprises training the first pre-trained validation helper network on evaluation criteria independent of the subject seeking to be enrolled or authenticated. According to one embodiment, the method further comprises executing at least a second geometry helper network and a second validation helper network pair configured to process and validate identification information of a second type. According to one embodiment, the method further comprises executing a plurality of validation helper networks each associated with a respective type of identification information, and generating a binary evaluation of respective authentication inputs by respective ones of the plurality of validation helper networks to establish validity. According to one embodiment, the method further comprises processing, by the first pre-trained validation helper network an image input as identification information, and output a probability that the image input is invalid.
According to one embodiment, the method further comprises processing an image input as identification information, and generating a probability that the image input is a presentation attack, by the first pre-trained validation helper network. According to one embodiment, the method further comprises processing, the first pre-trained validation helper network, a video input as identification information; and generating, the first pre-trained validation helper network, a probability that the video input is invalid, by the first pre-trained validation helper network. According to one embodiment, the method further comprises processing, the first pre-trained validation helper network, a video input as identification information, and generating, the first pre-trained validation helper network, a probability that the video input is a presentation attack.
According to one aspect, an authentication system for privacy-enabled authentication is provided. The method comprises executing, by at least one processor, a merged validation network associated with a first type of identification information;
processing, by the merged validation network, identification information of the first type, generating, by the merged
-6-
7 validation network, a probability that the identification information of the first type is valid for use in enrolling a user for subsequent identification or a probability that the identification information is invalid. According to one embodiment, the method further comprises testing, by the merged validation network, a plurality of binary characteristics of the identification .. information input. According to one embodiment, generating the probability is based at least in part on a state determined for the plurality of binary characteristics.
According to one embodiment, the method further comprises determining, by the merged validation network if an identification information input is based on a presentation attack.
According to one embodiment, the method further comprises determining if an identification information input improves training set entropy.
According to one aspect, a system for managing privacy-enabled identification or authentication is provided. The system comprises at least one processor operatively connected to a memory; an identification data gateway, executed by the at least one processor, configured to filter invalid identification information from subsequent verification, enrollment, identification, or authentication functions, the identification data gateway comprising at least a first pre-trained validation helper network associated with identification information of a first type, wherein the first pre-trained validation helper network is configured to evaluate an identification instance of the first type, responsive to input of the identification instance of the first type to the first pre-trained validation helper network, wherein the first pre-trained validation helper network is pre-trained on evaluation criteria that is independent of a subject of the identification instance seeking to be enrolled, identified, or authenticated, responsive to a determination that the identification instance meets the evaluation criteria, validate the identification instance for use in subsequent verification, enrollment, identification, or authentication, responsive to a determination that the identification instance fails the evaluation criteria, reject the unknown information instance for use in subsequent verification, enrollment, identification, or authentication, and generate at least a binary evaluation of the identification information instance based on the determination of the evaluation criteria, wherein the at least the binary evaluation includes generation of an output probability by the first pre-trained validation helper network that the identification instance is valid or invalid.
According to one embodiment, the identification data gateway is configured to filter bad audio data from use in subsequent processing. According to one embodiment, the identification data gateway is configured to accept audio data input and validate the audio input for use in transcription. According to one embodiment, the first pre-trained validation helper network is trained on presence data, and configured to determine the presence of a target to be evaluated. According to one embodiment, the first pre-trained validation helper network is configured to validate the presence data independent of the subject seeking to be enrolled, identified, or authenticated. According to one embodiment, the authentication data gateway further comprises a plurality of validation helper networks each associated with a respective type of identification information, wherein each of the plurality of validation helper networks generate a binary evaluation of respective identification inputs to establish validity, wherein at least a plurality of the validation helper networks are configured to validate respective identification information independent of the subject seeking to be enrolled, identified, or authenticated. According to one embodiment, the first pre-trained validation helper network is configured process an image as identification information, and output a probability that the subject is wearing a mask. According to one embodiment, the first pre-trained validation helper network is configured to determine the mask is being worn properly by the subject. According to one embodiment, the first pre-trained validation helper network is configured to determine the mask is being worn properly by the subject irrespective of the subject to be identified.
According to one embodiment, the first pre-trained validation helper network is configured to process location associated input as identification information, and output a probability that the location associated input is invalid.
According to one aspect, a computer implemented method for managing privacy-enabled identification or authentication is provided. The system comprises filtering, by at least one processor, invalid identification information from subsequent verification, enrollment, identification, or authentication functions, wherein the act of filtering includes executing, by the at least one processor, a first pre-trained validation helper network associated with identification information of a first type; evaluating, by the first pre-trained validation helper network, an identification instance of the first type, responsive to input of the identification instance of the first type to the first pre-trained validation helper network, wherein the first pre-trained validation helper network is pre-trained on evaluation criteria that is independent of a subject of the identification instance seeking to be verified, enrolled, identified, or authenticated; validating, by the at least one processor, the identification instance for use in subsequent verification, enrollment, identification, or authentication, in response to determining that the identification instance meets the evaluation criteria;
rejecting, by the at least one processor, the unknown information instance for use in subsequent verification, enrollment, identification, or authentication responsive to determining that the identification instance fails the evaluation criteria; and generating, by the at least one processor, at least a binary evaluation of the identification instance based on the determination of the evaluation
-8-criteria, wherein the at least the binary evaluation includes generation of an output probability by the first pre-trained validation helper network that the identification instance is valid or invalid.
According to one embodiment, the act of filtering includes an act of filtering bad audio data from use in subsequent processing. According to one embodiment, the method further comprises accepting audio data input and validating the audio input for use in transcription.
According to one embodiment, the first pre-trained validation helper network is trained on presence data, and the method further comprises determining the presence of a valid target to be evaluated. According to one embodiment, the method further comprises validating the presence data independent of the subject seeking to be verified, enrolled, identified, or authenticated. According to one embodiment, the method further comprises executing a plurality of validation helper networks each associated with a respective type of identification information, wherein each of the plurality of validation helper networks generates at least a binary evaluation of respective identification inputs to establish validity;
and validating respective identification information independent of the subject seeking to be verified, enrolled, identified, or authenticated.
According to one embodiment, the first pre-trained validation helper network is configured process an image as identification information, and the method further comprises an act of outputting a probability that the subject is wearing a mask.
According to one embodiment, the method further comprises determining by the first pre-trained validation helper network that the mask is being worn properly by the subject. According to one embodiment, the method further comprises determining by the first pre-trained validation helper network that the mask is being worn properly by the subject irrespective of the subject to be identified. According to one embodiment, method further comprises processing a location associated input as identification information by the first pre-trained validation helper network and generating by the first pre-trained validation helper network a probability that the location associated input is invalid.
Still other aspects, examples, and advantages of these exemplary aspects and examples, are discussed in detail below. Moreover, it is to be understood that both the foregoing information and the following detailed description are merely illustrative examples of various aspects and examples, and are intended to provide an overview or framework for understanding the nature and character of the claimed aspects and examples. Any example disclosed herein may be combined with any other example in any manner consistent with at least one of the objects, aims, and needs disclosed herein, and references to "an example,"
"some examples,"
-9-"an alternate example," "various examples," "one example," "at least one example," "this and other examples" or the like are not necessarily mutually exclusive and are intended to indicate that a particular feature, structure, or characteristic described in connection with the example may be included in at least one example. The appearances of such terms herein are not necessarily all referring to the same example.
BRIEF DESCRIPTION OF DRAWINGS
Various aspects of at least one embodiment are discussed below with reference to the accompanying figures, which are not intended to be drawn to scale. The figures are included to provide an illustration and a further understanding of the various aspects and embodiments and are incorporated in and constitute a part of this specification but are not intended as a definition of the limits of any particular embodiment. The drawings, together with the remainder of the specification, serve to explain principles and operations of the described and claimed aspects and embodiments. In the figures, each identical or nearly identical component that is illustrated in various figures is represented by a like numeral. For purposes of clarity, not every component may be labeled in every figure. In the figures:
FIG. 1 is a block diagram of a helper network implementation, according to one embodiment;
FIG. 2 is a block diagram of examples helper networks for processing respective authentication inputs, according to one embodiment;
FIG. 3 illustrates example multiclass and binary helper network models, according to some embodiments;
FIG. 4 illustrates example processing for detecting presentation attacks, according to some embodiments;
FIG. 5 illustrates example process flow for voice processing, according to some embodiments;
FIG. 6 illustrates example process flow for facial image processing, according to some embodiments;
FIG. 7 illustrates example process flow for fingerprint processing, according to some embodiments;
FIG. 8 is a block diagram of an example authentication system, according to one embodiment;
FIG. 9 is an example process flow for processing authentication information, according to one embodiment,
-10-FIG. 10 is an example process flow for processing authentication information, according to one embodiment;
FIG. 11 is an example process flow for processing authentication information, according to one embodiment;
FIG. 12 is block diagram of a special purpose computer system on which the disclosed functions can be implemented;
FIG. 13 is an example process flow for classifying biometric information, according to one embodiment;
FIG. 14 is an example process flow for authentication with secured biometric data, .. according to one embodiment;
FIG. 15 is an example process flow for one to many matching execution, according to one embodiment;
FIG. 16 is a block diagram of an embodiment of a privacy-enabled biometric system, according to one embodiment;
FIGs. 17-20 are diagrams of embodiments of a fully connected neural network for classification;
FIGs. 21-24 illustrate example processing steps and example outputs during identification, according to one embodiment;
FIG. 25 is a block diagram of an embodiment of a privacy-enabled biometric system with liveness validation, according to one embodiment;
FIG. 26A-B is a table showing comparative considerations of example implementation, according to various embodiments;
FIG. 27 is an example process for determining identity and liveness, according to one embodiment;
FIG. 28 is an example process for determining identity and liveness, according to one embodiment;
FIG. 29 is an example process flow for validating an output of a classification network, according to one embodiment; and FIGs. 30-31 illustrate execution timing during operation with accuracy percentages for the respective examples.
DETAILED DESCRIPTION
According to some embodiments, validation and generation of identification information can be supported by execution of various helper networks.
According to one
-11-embodiment, these specially configured helper networks can be architected based on the type of identification information/credential to be processed or more generally based on an authentication modality being processed. Various embodiments describe example functions with respect to authentication and authentication systems. The nomenclature "authentication system" is used for illustration, and in various embodiments describes systems that perform identification operations that employ helper networks in the context of identifying an entity or subject, and the disclosed operations should be understood to encompass data validation in the context of identification. The described examples and embodiments can also be used for authentication where identification is a first step, and adjudication of the identity and/or permissions for the entity is required or desired.
In various embodiments, the system can execute a plurality of helper networks that are configured to filter inputs (including, for example, inputs to training models) that are later used in authentication or identification. For example, geometry helper networks can be executed to facilitate analysis of features within authentication information, by identifying salient features and, for example, providing location information. In various embodiments, examples are described to process authentication information, and are not intended to limit the operations on the input to authentication assertions, but rather include operations that include identification, and identification with authentication.
According to one embodiment, validation helper networks are configured to .. determine that an identification sample is a good identification and/or authentication sample.
For example, only identification samples that improve accuracy or expand recognition can be validated. The validation network can, for example, identify that a face image is too blurry for use, the image of the user has been taken in poor lighting conditions, the imaged face is too far away from the capture device, the imaged face is obscured, the imaged face is too near to the capture device, the imaged face is out of focus, the imaged face is looking away from the camera, among other options. In various examples, the helper networks are pre-trained using bad identification samples. For example, the bad identification samples are identified as samples that reduce the entropy of the resulting data set. To illustrate, if a blurry image of a first user is used to create encrypted features, the resulting encrypted features will then match on more encrypted features, and which may include matches reflecting source identification information not of the first user ¨ this is an example of reduced identification entropy. In another example, the helper networks are pre-trained on bad identification samples that reduce or hamper the execution or efficiency of subsequent processing.
-12-In further example, various state determinations can be used to identify data instances that reduce the effectiveness of recognition operations and then exclude such bad identification information (e.g., a face image from an identification data set). Stated more generally, the validation helper networks are configured to weed out bad identification data and prevent bad data from impacting subsequent operations, including for example, training of machine learning models for various identification and/or authentication scenarios or other subsequent processing scenarios. In further embodiments, the validation helper networks can be configured to validate data instances whose use and/or incorporation into a body of identification data will result in improvement in recognition circumstances and/or processing accuracy. In some examples, the validation helper networks are trained to identify identification data instances that improve identification entropy.
In further examples, some helper networks include a face plus mask helper network tailored to operate on identification instances of facial images, where the identification target is wearing a mask, mask on/off detection helper network, eyeglasses on/off detection helper network, fingerprint validation network, eye geometry helper network, eyes open/closed detection helper network, training data helper networks, eye validation helper network, etc.
In various embodiments, the helper networks are configured to: improve processing of identification credentials, for example, to eliminate noise in processed credentials; ensure valid credentials are captured, including for example, quality processing to ensure proper credentials are captured. In further embodiments, various helper networks can be configured to establish liveness of a data capture, for example, based on liveness validation (e.g., submitted identification credential is not a spoofed credential submission), among other options.
Fig. 1 is a block diagram of an authentication system 100. According to various embodiments the authentication system 100 can accept a variety of identification inputs (e.g., 101) and produce filtered identification data (e.g., at 120) for use in identification/
enrollment/authentication functions (e.g., 130). For example, the authentication system 100 can be configured to accept various biometric inputs 101A including images of a user's face, 101B including images of a user's fingerprint, 101C including captures of the user's voice, among other options (e.g., as shown by the three dots appearing under the various inputs).
Various embodiments can be configured to operate on the various inputs shown, or subsets of those instances. According to some embodiments, the authentication system can be configured with an authentication gateway 102. The authentication gateway may include a plurality of helper networks each tailored to process a respective identification input. For
-13-example, a helper network can be tailored specifically to deal with facial recognition images and/or video for identifying a user face. Different types of helper networks can be tailored to specific functions, including, for example, geometry helper networks (e.g., 104) that are configured to identify characteristics within an identification/authentication input and/or positional information within the input that can be used for validation and/or creation of embeddings (e.g., encrypted feature vectors produced by an embedding network ¨
discussed below).
In various embodiments, geometry helper networks can be configured to support analysis by validation helper networks (e.g., 106). Although in other embodiments, validation helper networks are configured to operate on input data without requiring the output or analysis of geometry helper networks. In yet other embodiments, some validation networks can receive information from geometry helper networks while other helper networks operate independently and ultimately deliver an assessment of the validity of an identification/authentication instance. In the context of image inputs, the validation helper network can determine that the submitted image is too blurry, off-center, skewed, taken in poor lighting conditions, among other options, that lead to a determination of a bad instance.
In some embodiments, the various helper networks can include processing helper networks configured to manage inputs that are not readily adaptable to geometric analysis. In some examples, the processing helper networks (e.g., 108) can also be loosely described as geometry helper networks and the two classifications are not mutually exclusive, and are describe herein to facilitate understanding and to illustrate potential applications without limitation. According to one example, processing helper networks can take input audio information and isolate singular voices within the audio sample. In one example, a processing helper network can be configured for voice input segmentation and configured to acquire voice samples of various time windows across an audio input (e.g., multiple samples of 10ms may be captured from one second to input). The processing helper networks can take audio input and include pulse code modulation transformation (PCM) that down samples the audio time segments to a multiple of the frequency range (e.g., two times the frequency range). In further example, PCM can be coupled with fast fourier transforms to convert the audio signal from the time domain to a frequency domain.
In some embodiments, a series of helper networks can be merged into a singular neural network (e.g., 110) that performs the operations of all the neural networks that have been merged. For example, geometry helper networks can be merged with validation helper
-14-networks and the merged network can be configured to provide an output associated with validity of the identification/authentication data input.
Regardless of whether a plurality of helper networks is used or a merged network is used or even combinations thereof, the authentication data gateway 102 produces a set of filtered authentication data (e.g., 120) that has pruned bad authentication instances from the data set. Shown in Fig. 1 is communication of the filtered authentication data 120 for use in identification, enrollment, and/or authentication services at 130. In some embodiments, an authentication system can include components for performing identification of entities, enrollment of users, and components for authenticating enrolled users.
Filtered data can be used for any of the example preceding operations. In some examples, filtering of training data can be prioritized, and an authentication system does not need to filter authentication inputs when performing a specific request for authentication against enrolled data.
In some other embodiments, an authentication system can provide data gateway operations and pass the filtered data onto other systems that may be used to identify, enroll, and/or authenticate users.
Other implementations can provide data gateway operations, identification operations, enrollment operations and/or authentication operations as part of a single system or as part of a distributed system with multiple participants. Some embodiments can used helper network validation or invalidation determinations to request an identification target re-submit identification information, among other options.
In other embodiments, the operation of the helper networks shown can be used in the context of identification. The helper networks are used to ensure valid data capture that can then be used in identifying an individual or entity based on acquired information. Broadly stated, the geometry and/or processing helper networks operate to find identification data in an input, which is communicated to respective validation helper networks to ensure a valid submission has been presented. One example of an identification setting versus an authentication setting, can include airport security and identification of passengers.
According to various embodiments, identification is the goal in such example and authentication (e.g., additional functions for role gathering and adjudication) is not necessary once a passenger has been identified. Conversely, the system may be tasked with authenticating a pilot (e.g., identification of the pilot, determining role information for the pilot, and adjudication) when seeking to access a plane or plane flight control systems.
Fig. 2 is a block diagram of authentication system 200 executing a variety of example helper networks. The respective helper networks are configured to process (e.g., at 220) respective identification credential input (e.g., biometric input (e.g., 251 face image, 252 face
-15-image with mask, 253 fingerprint capture, 254, voice capture, among other input options and corresponding helper networks, shown by three dots)) and filter bad credentials (e.g., at 230) from being used in subsequent recognition tasks, for example, incorporation into embedding generation networks (e.g., at 240). Description of various functions, operations, embedding network architecture, and uses of generated embeddings for identification.
authentication and/or for training classification networks, among other examples, are described in co-pending US. Application 16/832,014, filed on March 27, 2020, titled "SYSTEMS
AND
METHODS FOR PRIVACY-ENABLE BIOMETRIC PROCESSING," (the '014 Application) incorporated herein in its entirety.
Various embodiments of an authentication system can be configured to process and filter authentication data using helper networks, where the filtered data is made available for subsequent use by, for example, the embedding networks described in the '014 application.
Stated broadly embedding networks can be executed to accept authentication inputs in a plain-text or unencrypted form and transform the input into an encoded representation. In one example, embedding networks are configured to transform an authentication input into a geometrically measurable one-way encoding of an authentication input (e.g., a one way homomorphic encryption). Use of such encodings preserves the secrecy of underlying authentication data, while providing embeddings than can be evaluated/classified in an encoded space. The inventors have realized that improvements in data enrollment using helper networks results in improved accuracy for embedding networks and resulting authentication operations.
Returning to Fig. 2, the respective biometric inputs (e.g., 251 ¨ 254) are captured and used as input in a processing stage (e.g., 220) configured to confirm or identify relevant or interesting characteristics within the respective biometric input. For example, respective helper networks (e.g., 202 ¨ 208) are configured to process input biometric information and establish characteristics for analysis based on the input data. In one example, the geometric helper network 202 can be configured to process an input face image and return coordinates for characteristic features within the image (e.g., eyes, nose, mouth, ears, etc.). Another geometric helper network (e.g., 204) can be configured to analyze facial images where the user is wearing a mask. The output of these geometric helper networks can be processed by similar validation helper networks configured to validate (e.g., at 230).
Other geometric helper networks include a fingerprint geometric helper networks 206 and a voice helper network 208.
-16-According to one embodiment, the fingerprint helper networks 206 can be configured to align, crop, and/or identify fingerprint characteristics within an image.
For example, the helper network 206 can identify position information for ridges and whorls and other characteristics that would be analyzed in a fingerprint image. The outputs of helper network 206 can then be processed by a validation network (e.g., 212) to filter any bad inputs.
Likewise, the voice geometric helper network 208 is configured to capture characteristics from an audio sample and communicate processed samples to a validation network (e.g., 214). Processing by the voice geometric helper network can include PCM and fast fourier transformation of audio samples, which are then validated as good or bad samples by, for example, validation network 214.
According to various embodiments, the validation networks are configured to protect the embedding neural networks shown in phase 240. For example, if a poor image is allowed into the embedding network 215 the poor image will disturb the distance measurements on the output of the embedding network and the embedding model 215 itself.
Incorporation of bad data can compromise the entire network, which results in false positives and false negatives for subsequent authentications.
Returning to the validation phase (e.g., 230), a plurality of validation networks is configured to determine if an authentication input is valid for use or not.
For example, a face validation helper network can be configured to determine if an input image was taken with the camera too far away from the subject or too close to the subject, where either condition is used to identify the bad credential and exclude it from use. In other examples, face validation helper networks can also determine if an image is too blurry, if an image is spoofed (e.g., a photo of a user is presented rather than a capture of the user directly), if video input used for submitting facial information is spoofed rather than presented by the actual user, if the user or subject is wearing a mask or not, among other options.
In various embodiments the validation networks are architected based on a deep neural network model and each can return the probability, score, or value that determines if an input is valid or bad. In further embodiments, the helper network can return state information, including whether a user is wearing a mask or not. In some examples, a determination that a user is wearing a mask may cause an authentication system to exclude the identification information from use, and in other examples, the authentication system can use the state determination, wearing mask, to select a respective embedding DNN (e.g., 216 ¨
an embedding network trained on images with users wearing masks).
-17-In further example, an authentication system can include a fingerprint validation helper network (e.g., 212) that is configured to determine if a fingerprint capture includes enough ridges or characteristics to provide good analysis. In addition, fingerprint helper networks can also determine liveness - confirm that spoofed video is not the source of a submission or an image spoof is not the source of submission.
Additional embodiments can include voice validation helper networks configured to determine if too many voices are present in an input, and if no sound is present in an input, if too much external noise is present in an input, among other options.
Once an input is validated the inputs can undergo further processing, including, identification, authentication, enrollment, etc. For example, the input can be processed by a respective embedding network in stage 240. For example, a face embedding DNN
215 can process user face images. In further example, a face with mask embedding network 216 can process images of users wearing masks. Other examples include a fingerprint embedding DNN 217 for processing fingerprint images and voice embedding DNN 218 for processing audio inputs.
In various embodiments, the output of stage 240 is an embedding or feature vector representative of the input but in an encoded form. For example, the embedding networks can generate encrypted feature vectors or other one-way encoded representations that are geometrically measurable for comparison. In one example, an embedding network can accept an unencrypted input and produce encrypted feature vectors that are a homomorphic one-way encryption of the input.
Fig. 3 is a block diagram illustrating various example helper networks, according to various embodiments. According to one embodiment, an authentication system can execute a variety of different helper networks architected on a variety of models. For example, a group of helper networks can be configured to establish one of a pair of states.
Stated broadly, the helper networks configured to establish one of a pair of states responsive to input can be referred to as binary models. For example, a respective binary helper network is configured to determine if an input is associated with the first or second state. In an identification or authentication setting, a variety of helper networks can be configured to process images for facial recognition (e.g., 360) using a plurality of binary or other models.
According to some embodiments, face processing helper networks can include evaluations of whether, or not, an image is too blurry to use in the context of identification, authentication, and/or training. In another example, a face helper network can be configured to determine if there are not enough landmarks in an input image for facial recognition or in
-18-the alternative if there are enough landmarks (e.g., 362). Further embodiments include any combination of the prior helper networks and may also include helper networks configured to determine if the user is wearing a mask or not, if the user is wearing glasses or not, if the user's eyes are closed or not, if an image of the user was taken too far from or too close to the camera or image source (e.g., see 361 ¨ 368), among other options.
Other helper networks may be used in conjunction with different embodiments to determine a state of an authentication input which may involve more than binary state conditions. In further embodiments, other authentication modalities can be processed by different helper networks. According to one embodiment, a fingerprint helper network can be configured to accept an image input of a user's fingerprint and process that image to determine if a valid authentication instance has been presented (e.g., 370).
For example, the fingerprint validation network can be configured to accept an image input and determine a state output specifying if not enough fingerprint landmarks (e.g., ridges) are present for authentication, or alternatively that enough fingerprint ridges are present (e.g. 371). In another example, a fingerprint validation network can be configured to determine if a fingerprint image is too blurry to use (e.g. 372). In further example, the fingerprint validation network can also be configured to determine if a fingerprint image is too close to the image source that captured it or too far from the image source that captured it (e.g. 373). Similar to face validation, a fingerprint validation network can also be configured to identify submissions that are spoofed video (e.g. 374), or spoofed images (e.g. 375).
According to some embodiments, validation models can be configured to score an authentication input and based on evaluation of the score a respective state can be determined. For example, a validation helper network can produce a probability score as an output. Scores above the threshold can be classified as being one state with scores below the threshold being another. In some examples, intermediate values or probability scores can be excluded or assigned an inconclusive state.
Further embodiments are configured to execute helper networks to process additional authentication modalities. According to one embodiment, an authentication system can include voice validation helper networks (e.g. 380) configured to accept an audio input and output of probability of validity. In one example, a voice helper network is configured to determine if too many voices are present in a sample (e.g., 381). In another example, a voice validation network can be configured to determine if no sound is present in an audio sample (e.g. 382). Further examples include voice validation networks configured to determine if too much external noise is present in an audio sample for proper validation (e.g., 383).
-19-According to some embodiments, audio spoof detection can use an induced audio signal. Such an induced audio signal can be an audible tone or frequency and may also include a signal outside human hearing. Various patterns and/or randomized sounds can be triggered to aid in presentation attack detection. Various validation networks can be configured to identify the induced audio signal as part of authentication input collection to confirm live authentication input.
Shown at 310 are examples of multiclass models that can be based on combinations and/or collections of various binary or other state models. For example, a face validation model can incorporate a variety of operations to output a collective determination on validity based on the underlying state determinations. In one example, the face validation network (e.g., 320) can analyze an image of a user face to determine if any of the following characteristics make the image a bad authentication input: image is too far or too close, image is too blurry, image is spoofed, video spoof produced the input, the user is wearing a mask, the user's eyes are open or closed, the user is or is not wearing eyeglasses, etc. (e.g., 321). In other embodiments, any combination of the foregoing conditions can be tested and as few as two of the foregoing options can be tested to determine the validity. In still other embodiments, different numbers of conditions can be used to determine if an authentication input is valid.
According to other embodiments, different multiclass models can be applied to different authentication inputs. For example, at 330 shown is a fingerprint validation model that can test a number of conditions to determine validity. In one example, a fingerprint validation network (e.g. 331) is configured to test if enough ridges are present, if the input is a video spoof, if the input is an image spoof, if the image is too blurry, and if the image was captured too far or too close to an image source, among other options.
According to one embodiment, a voice validation network (e.g., 340) is configured to validate an audio input as a good authentication instance. In another example, the voice validation network can be configured to determine if there are too many voices present, no sound present, if too much external noise is present in an audio input, among other options (e.g., 341). In addition, the voice validation network can also include operations to determine liveness. In one example, an authentication system can induce an audio tone, sound, or frequency that should be detected by a validation network in order to determine that an authentication input is live and not spoofed. Certain time sequences or patterns may be induced, as well as random audio sequences and/or patterns.
-20-Fig. 4 is a block diagram illustrating operations performed by validation helper networks configured to determine liveness. Fig. 4 illustrates various considerations for implementing validation networks to detect input spoofing according to some embodiments.
The illustrated examples of helper networks (e.g. 408, 458) are trained by creating a .. multitude of input spoofed images that are created in a variety of lighting conditions and backgrounds. The spoofed images are received at 454, and the spoofed images are transformed into augmented image format that limits lighting effects, and limits the effects of subject skin color, and facial contour. The augmented image format can include for example an HSL image format. Various considerations for color harmonization are discussed in, "Color Harmonization," by D. Cohen-Or et al., published 2006 by Association for Computing Machinery, Inc. Other augmentation/ homogenization formats could be used including, for example, LAB color space or contrast limited adaptive histogram equalization "CLAHE"
method for light normalization.
Once a variety of spoofed images are produced and the lighting conditions normalized, various additional spoofed instances can be created with multiple alignments, cropping's, zooms (e.g., in and out) to have a body of approximately two million approved images. The validation network is trained on the images and its determinations tested. After each training, false positives and false negatives remain in the training set.
In some example executions, the initial two million images are reduced to about 100,000. The validation network is retrained on the remaining samples. In further embodiments, retraining can be executed repeatedly until no false positives or false negatives remain. A
similar training process can be used in the context of video spoofed video inputs. A video liveness validation network can be trained similarly on false positives and false negatives until the network identifies all valid inputs without false positives or false negatives.
Once trained, processing follows a similar approach with any authentication input.
Shown are two pathways one for video spoof inputs and one for image spoof inputs (e.g. 402 and 452 respectively). The spoofed data is received as 404/454 and the data is transformed into the HSL format at 406/456, which is processed by respective validation networks (e.g.
408/458 - which can be, for example, pre-trained helper validation deep neural networks). In response to the input of potentially spoofed authentication data, the validation networks 408/458 output respective scores 410/460, and based on the respective scores an authentication system can determine if an authentication input is valid or simply a replay or spoof of a valid authentication input.
-21-Unlike some conventional systems that can use machine learning approaches to cluster images before processing, the validation networks are trained on universal characteristics that apply to all authentication inputs, and each determination of validity establishes that a singular authentication instance is valid or not. In various embodiments, the validation network is trained on characteristics within the data set that are independent of the subject to identified, authentication, and/or enrolled. With the training as described above, various embodiments provide helper networks that are capable of presentation attack detection (e.g., spoofed submission of a valid image). Clustering of similar images, as done in some conventional approaches, is not expected to solve this issue, and the likely result of such an approach would include introduction of spoofed images into such clusters, which ultimately will result in incorporation into and successful attacks on resulting authentication models.
Shown in Fig. 5 are various embodiments of helper networks configured to analyze voice input and determine if a valid authentication input has been submitted.
According to some embodiments, voice helper networks can be configured to determine if too many voices are present in an authentication instance, if no sound is present, and/or if external noise is too loud, among other options to validate that a good authentication instance has been provided.
Various sets of training data can be used to train respective voice helper networks (e.g., voice training data with multiple voices, training data with no voice data, training data with external noise, etc.).
According to one embodiment, voice validation helper networks are trained to identify various states to determine if an authentication instance is valid for use in authentication. The helper networks can be trained on various audio inputs. In one example, a body of audio inputs are captured that are clean and valid (e.g., capture of known valid users' voices). The initial audio data is mixed and/or modified with external noises that impact how good they are in terms of authentication sources. For example, to determine impact of the noise, an output of a voice embedding network can be used to evaluate a cosine distance between various audio inputs. Where the introduction of external noise impacts the cosine distance evaluation, those instances are useful in establishing a training data set for identifying valid/invalid audio instances.
According to one embodiment, a set of 500 clean samples are captured and used to mix with external noises (e.g., 500 external noises evaluated for impact on cosine distance).
The 500 initial samples are expanded and mixed with external voices until a large number of audio samples are available for training. In one example, helper networks can be trained on
-22-over eight million audio samples. Once trained, the results produced by the helper networks are tested to determine how well the helper networks identified valid data.
False-positive results and false negative results are then used for subsequent training operations. According to one embodiment, millions of samples can be reduced to hundreds of thousands of false positives and false negatives. In various example executions, human perception is incapable of determining a difference between the spoofed audio and a valid instance once the training data has been reduced to the level of ¨100K instances, however, the trained model is able to distinguish between such audio samples.
In some implementations, false positives and false negatives are used repeatedly to train the model until the model is able to execute with no false positives or false negatives.
Once that result is achieved or substantially close to that result (e.g. less than 1 ¨ 5 % false-positive/false-negative exists) the voice validation model is trained and ready for use.
According to one example, an authentication system can use any number of voice validation helper networks that are pre-trained to detect spoofed audio instances.
Returning to Fig. 5, three example pre-trained voice helper networks (e.g., DNNs) are illustrated. In the first block illustrated each helper network is configured to detect a state ¨ at 502 too many voices, at 522 no sound is present, and/or at 542 too much external noise. The respective helper networks receive audio for processing (e.g. 504, 524, 544).
According to various embodiments, PCM is executed on received audio (e.g., 506, 526, 546).
The result is .. transformed into the frequency domain (e.g. 508, 528, 548 ¨ fourier transform). The respective outputs are evaluated by pre-trained helper DNNs at 510, 530, and 550. The respective helper networks are configured to output scores associated with their state evaluation. For example, the respective networks output scores at 512, 532, and 552. The scores can be used to determine if the audio input is valid for use in authentication. For example, the output value can reflect a probability an instance is valid or invalid. In one implementation, values above a threshold are deemed invalid and vice versa. In further example, some ranges for probable matching can be determined to be inconclusive.
According to some embodiments, the various states described above (e.g., too many voices, no sound, external noise issues, among other options) can be tested via a merged network that incorporates the illustrated pre-trained helper networks into a single neural network, and the output represents a collective evaluation of validity of an audio input.
Fig. 6 illustrates a variety of helper networks configured to evaluate facial images and output a scoring for determining validity. In the first column shown in Fig.
6, the state being tested is specified. For example, at 604 some of the states that respective helper networks can
-23-test are illustrated. Various embodiments include tests for whether an image is too blurry, does not contain enough landmarks, images a user with a mask on or off, images a user with glasses on or off, images the user with eyes closed or open, an imaged face is too far or too close to an image source or camera, etc. According to some embodiments, processing by the helper networks proceeds at column 608 where the respective helper networks receive image data that is processed into normalized image data at 612 (e.g., processed into an HSL image).
At column 616, the respective helper networks evaluate respective HSL images and at column 620 output a score used to determine validity based on the evaluated state specified in column 604.
According to various embodiments face validation helper networks are trained based on an initial set of valid input images which are taken in a variety of lighting conditions and background so that each lighting condition has multiple backgrounds and each background has multiple lighting conditions. A large training set is beneficial according to some embodiments. In some examples 500,000 images can be used to establish the variety of lighting conditions and backgrounds. The initial set of images can then be normalized to produce HSL images. Other processes can be used to normalize the training set of images.
The resulting images are manipulated to generate an expanded set of training images. For example, a variety of alignments and/or cropping of the images can be executed. In other examples, and in addition or in the alternative, a variety of zoom operations (e.g., in and out) can be applied to the images. As part of expanding the training set, the images can be integrated with defects, including, adding bad lighting, occlusions, simulating light beams over a facial image, eliminating landmarks on faces present, having images that are too far and too close to an image source and or introducing blurring into the training images, among other options. The initial body of training images can be expanded significantly and for example, a set of 500,000 images can be expanded into 2 million images for a training set.
Once the training set is prepared, the helper network is trained against the data to recognized valid authentication inputs. The results produced by the helper network are evaluated. Based on the results evaluation, any false positives and any false negatives are used for further training of the model. According to one example execution, about one hundred thousand images remain that are false-positives or false-negatives after the first attempt. Training can be repeated until no new false-positive or false-negative remain, using the remaining false results to retrain. In other examples once a sufficient level of accuracy is achieved greater than 95% training can be considered complete. According to some embodiments, facial validation helper networks are architected on a deep neural network
-24-model that can identify any of a number of states associated with a facial image, and further can be used to determine if the image is valid for use in authentication.
Shown in Fig. 7 is a similar approach for executing helper networks on fingerprint images, according to some embodiments. In the first column at 702, specified is a state being tested by a respective helper network. For example, a validation helper network can determine if not enough fingerprint ridges are available, if an image is too blurry, is a fingerprint image is too far or too close to an image source, among other options. At column 708, image data is received, and at column 714, the received image data is transformed into HSL image format. The HSL image is reduced to a grayscale image at column 720.
The result is analyzed by respective helper networks (e.g., input to pre-trained helper DNNs) at 726. Once analyzed, the respective networks output a score used to determine validity of the authentication instance (e.g., at column 732).
Similar to the approach discussed with respect to Fig. 6, fingerprint image data can be captured in multiple lighting conditions and with multiple backgrounds to produce training data sets used to define the helper network models. Once a body of images is produced, the images are transformed into HSL images and then into grayscale. A variety of alignments, crops, zooms (e.g. in and out), are applied to the body of images. In addition, operations are executed to various ones of the body of training images to introduce defects.
For example, bad lighting conditions can be added, as well as occlusions, introduction of light beams into .. images, removal of landmarks from the image, as well as using images where the fingerprint image is too far and/or too close to an image source. Other example images can include blurry fingerprint captures or introduction of blur into training data images.
According to some embodiments, an initial body of 500,000 images can be expanded into a body of 2 million images to train the model.
According to one embodiment, once the expanded set of images is created a helper network model can be trained on the body of images to identify valid authentication inputs.
Initially the output determination of the helper network yields false positives and false negatives. Any resulting false-positives and false negatives are used to continue training of the helper network. In one example execution, an initial set of two million images yields approximately 100,000 false-positives and/or false negatives when the helper networks results are evaluated. The helper network model is retrained based on the remaining images and tested to identify any further false-positives and/or false negatives. The approach can be repeated to refine the model until no false positives or false negatives are identified. In other embodiments, an authentication system can use a threshold level of accuracy to determine a
-25-model is fully trained for use (e.g. greater than 90% accuracy, greater than 95% accuracy, among other options).
Once respective helper networks are trained on their expanded data sets and iterated until no false positives or false negatives are output, an authentication system can execute the pre-trained helper network to determine the validity of any authentication input and filter bad inputs from use in training authentication models (e.g., embedding generation networks).
Further helper network embodiments include a transcription helper network. For example, some embodiments include one or more helper networks configured to accept an audio input and evaluate where the audio sample is of suitable quality to use in subsequent processing. In some examples, subsequent processing includes identification and/or authentication settings. In other examples, the transcription helper network (and any helper network described can be used in other subsequent processing. In one example, the transcription helper network is configured to evaluate input audio and generate a determination that the audio sample is of suitable quality to forward for a voice transcription.
In some embodiments, the transcription network can be trained as described with respect to the audio and/or voice networks herein. In further example, the transcription can be trained to identify transcribable audio by defining a training set of good audio and bad audio. Training can be iterative as described herein. For example, bad data and false positives can be used to iteratively train a transcription helper network until no further result are left. The resulting network can then be used on any new audio input to evaluate whether the input is transcribable. In some settings, an indication that the audio input is not transcribable can end the analysis.
Further embodiments can include a helper network trained to verify presence or a target. For example, similar in effect as a captcha check, the helper network can work on its own to identify the presence of a human being or other entity. In some embodiments, the presence verification can be configured to operate without a requirement for determining identity, and can provide a determination on if a face is a human face.
Further examples of the presence network can also determine if the information submitter is "live"
- not an image or video spoof. In still other examples, the helper networks can be configured to determined liveness in the context of a submitter who is wearing a face mask (e.g., face+mask network), a submitter who is wearing a human facsimile mask, and in the context of fingerprint submission. For example, a fingerprint validation network can be trained on a variety of valid fingerprint submissions inputs and a variety of invalid input submissions. Various
-26-approaches for generating invalid face submission instances are described herein and can be extended to the fingerprint instance.
According to various embodiments, helper network can be configured to provide a CAPTCHA type service. For example, ones or combinations of helper networks can be used to verify a human subject is seeking identification, authentication, verification, etc. In further embodiments, one or more helper networks can be executed for detecting and differentiating input provided by a human or machine. In an example environment, the system and associated helper networks can be used primarily in Internet applications for verifying that data originating from a source is from a human, and not from an unauthorized computer program/software agent/robot. The following helper network can be used alone and/or in any combination to identify human versus computer actors:
1. Camera input analysis networks: determines valid identification input (e.g., biometric of user's face (therefore is not a robot)) a. Video spoofing DNN - protects against video presentation attack (PAD) b. Image spoofing DNN - protects against image presentation attack (PAD) c. Geometry DNN (finds valid face input (e.g., face biometric) in image) d. Blurry image DNN (makes sure face input in image is not too blurry) 2. Microphone Input analysis networks: determines valid biometric of user's voice (therefore is not a robot) a. Voice spoofing DNN - protects against deepfake or recorded audio attack b. Validation DNN - finds valid human voice c. Random sentence (optional) - displays a random sentence, then uses automatic speech recognition (ASR) DNN to convert speech to text to ensure the human said the requested words.
Various embodiments for captcha operation relate to electronic systems for detecting and differentiating input provided by humans and machines. These systems are used primarily in Internet applications for verifying that data originating from a source is from a human, and not from an unauthorized computer program/software agent/robot.
According to one embodiment, a method of validating a source of image data input to a computing system is provided. The method comprises: receiving one or more images, processing the images using helper networks to ascertain the validity, and generating a determination of whether the face images originated from a machine or a human. A second embodiment concerns a method of validating a source of audio data input to a computing system comprising:
receiving speech utterance from a microphone that (optionally) read out loud a randomly selected
-27-challenge text; processing the speech audio with helper networks to ascertain the validity, and generating a determination of whether the audio images originated from a machine or a human.
Further embodiments can include a step of: granting or denying access to data and/or a data processing device based on the results of the CAPTCHA like function, including a signup for an email account or a blog posting. For others the step of granting or denying access to an advertisement based on the determination is performed. Other embodiments perform a separate automated visual challenge test so that both visual processing and articulation processing is considered in one or more of the determinations.
The access is preferably used for one or more of the following processing contexts: a) establishing an online account; and/or b) accessing an online account; and/or c) establishing a universal online ID; and/or d) accessing a universal online ID; and/or e) sending email;
and/or f) accessing email; and/or g) posting on a message board; and/or h) posting on a web log; and/or i) posting on a social network site page; j) buying or selling on an auction site;
and/or k) posting a recommendation for an item/service; and/or 1) selecting an electronic ad.
In some embodiments, the various helper networks described are intended to operate independently of other processing and/or functions. For example, the helper networks can be configured to determine if face information or fingerprint information is suitable for continued processing. In an identification/authentication context, the attempt to identify and/or authenticate may terminate upon identification of an unsuitable input (e.g., bad collection, spoof, etc.). In other processing contexts, the helper network can also stop subsequent processing or require resubmission.
Other embodiments can include one or more stand-alone helper network functionality and/or integrate the one or more helper networks into a processing flow.
In other embodiments, helper networks embodiment can be configured to determine if a person (e.g. a doctor entering a hospital) is wearing a mask or wearing a mask in the correct way. In some settings, the helper network and its determination can be used to prevent or allow entry (which can also be coupled with identity and/or authentication protocols). For example, the system can be connected to a physical controller that is configured to only allow entry if a mask is on and/or being worn properly. In various embodiments, the mask helper network is configured to validate a state of mask on/off, and can also be configured to validate a state mask worn properly or not irrespective of a subject to be identified.
In further embodiments, a helper network can be trained on location information and validate that a current geolocation of a requesting device is not blacklisted.
In some
-28-examples, the location helper networks are trained on location information inputs that are known to be valid as well and location information inputs that are known to be invalid (e.g., as described herein with respect to various helper networks). The trained network can then validate location information captured at the time of an identification function request.
Still other embodiments can include helper networks that validate accelerometer information captured from a device (e.g., a device requesting an identification function, a device associated with an identification function request, etc.). Helper networks can be trained on accelerometer information that reflects valid position information (e.g., normal or range of angles for known valid requests) and/or invalid position information (e.g., angles or ranges of angles for invalid requests). In one example, a helper network is configured to access and process accelerometer information to determine the user's angle (holding the phone), which can be used by the system to assert/validate liveness and/or identity. Further embodiments can include helper network trained on and configured to validate temperature information to ensure the user/device is where the user/device asserts they are. It is implicit .. in such location assertions, for example, is that it will not be 0 degrees in California during the summer. Various embodiments are configured to employ weather for helping with the determination of validity. As discussed with respect to various examples, validity determinations can be made independent of a subject to be identified and various helper networks are configured to validate submitted data before it is used for identification .. functions.
According to one embodiment, liveness helper networks can be trained on and configured to test if a person is live (not a spoof) using a microphone. The system can employ a spoken random liveness sentence to make sure the person making the request is active (alive). If the user's spoken words match the requested words (above a predetermined .. threshold), the system can then establish a liveness dimension. Fig. 8 is a block diagram of an example embodiment of an authentication system 1400 employing private biometrics with supporting helper networks. As shown in Fig. 8 the system can be configured to accept various authentication credentials in plain text or unencrypted form (e.g., 1401) processes the unencrypted authentication credentials (e.g., via an authentication credential processing .. component 1402), to ensure the input is valid and good for authentication.
For example, a plurality of helper networks can process authentication input to determine validity before they a processed by embedding neural networks (e.g., 1425) into one-way homomorphic representations of the same, wherein the one-way homomorphic representations can be analyzed by a classification component (e.g., 1418) to determine if submitted credentials
-29-matched enrolled credentials (e.g., return known for match or unknown at 1450), for example, with a neural network trained on encrypted feature vectors produced by the embedding networks. Evaluations of matches can be validated for example, with a validation component 1420 that is configured to provide validation function once matches or unknown results are determined. In further embodiments, the classification component can operate by itself and in others as a part of a classification subsystem 1416 that can also include various validation functions to confirm matches or unknown results.
Various embodiments include architectures that separate authentication credential processing (e.g., 1402) from operations of the classification subsystem (e.g., 1416), and other embodiments can provide either or both operations as a service-based architecture for authentication on private encryptions of authentication credentials.
The various functions, processes, and/or algorithms that can be executed by the authentication credential processing component 1402 are discussed throughout, and the various functions, processes, and/or algorithms that can be executed by the classification subsystem 1416 are also described with respect to the '014 Application. Fig. 8 is included to provide some examples of helper networks and support functionality and/or algorithms that can be incorporated in the various examples, embodiments, and aspects disclosed herein. The following descriptions focus on the helper network functions to provide illustration, but are not limited to the examples discussed with Fig. 8.
For example, credential processing can include various helper networks (e.g., face 1404, face and mask 1406, fingerprint 1408, eyeglasses 1410, eye geometry 1412, and the "..." at 1414, and the preceding networks can each be associated with a validation network configured to determine the validity of the submitted/processed authentication instance. In some examples, geometry or processing networks (e.g., 1404 & 1408) are configured to .. identify relevant characteristics in respective authentication input (e.g., position of eyes in a face image, position of ridges in a fingerprint image respectively, etc.). The output of such networks is then validated by a validation network trained on that type of authentication input. The "..." at 1414 illustrates the option of including additional helper networks, and/or processing functions, where any number or combination of helper network can be used in any combination with various embodiments disclosed herein.
According to some embodiments, the helper networks can be based on similar neural network architectures, including, for example, Tensorflow models that are lightweight in size and processing requirements. In further examples, the helper networks can be configured to execute as part of a web-based client that incorporates pre-trained neural networks to acquire,
-30-validate, align, reduce noise, transform, test, and once validated to communicate validated data to embedding networks to produce, for example, one-way encrypted input authentication credentials. Unlike many conventional approaches, the lightweight helper networks can be universally employed by conventional browsers without expensive hardware or on-device training. In further example, the helper networks are configured to operate with millisecond response time on commercially available processing power. This is in contrast to many conventional approaches that require specialized hardware and/or on-device training, and still that fail to provide millisecond response time.
According to some embodiments, various helper networks can be based on deep neural network architectures, and in further examples, can employ you only look once ("YOLO") architectures. In further embodiments, the helper networks are configured to be sized in the range of 10kB to 100kB, and are configured to process authentication credentials in < 10 ms with accuracies > 99%. The data footprint of these helper network demonstrates improved capability over a variety of systems that provide authentication based on complex, bulky, and size intensive neural network architectures.
According to one aspect, each authentication credential modality requires an associated helper DNN ¨ for example, for each biometric type one or more tailored helper networks can be instantiated to handle that biometric type. In one example, a face helper network and a fingerprint helper network (e.g., 1404 and 1408) can be configured to identify specific landmarks, boundaries, and/or other features appearing in input authentication credentials (e.g., face and fingerprint images respectively). Additional helper networks can include face and fingerprint validation models configured to determine that the submitted authentication credential is valid. Testing for validity can include determining that a submitted authentication credential is a good training data instance. In various embodiments, trained validation models are tailored during training so that validated outputs improve the entropy of the training data set, either expanding the circumstances in which trained models will authenticate correctly or refining the trained model to better distinguish between authentication classes and/or unknown results. In one example, distances metrics can be used to evaluate outputs of an embedding model. For example, valid instances improve the distance measure between dissimilar instances as well as to identify similar instances, and the validity networks can be trained to achieve this property.
In the context of image data, a validation helper network can identify if appropriate lighting and clarity is present. Other helper networks can provide processing of image data prior to validation, for example, to support crop and align functions performed on the
-31-authentication credentials prior to communication to embedding network for transforming them into one-way encryptions.
Other options include: helper networks configured to determine if an input credential includes an eyes open/eyes closed state ¨ which can be used for passive liveness in face recognition settings, among other options; helper networks configured to determine an eyeglasses on or eyeglasses off state within an input credential. The difference in eyeglass state can be used by the system to prevent false negatives in face recognition. Further options include data augmentation helper networks for various authentication credential modalities that are configured to increase the entropy of the enrollment set, for example, based on increasing the volume and robustness of the training data set.
In the voice biometric acquisition space, helper networks (e.g., helper DNNs) can be configured to isolate singular voices, and voice geometry voice helper networks can be trained to isolate single voices in audio data. In another example, helper network processing can include voice input segmentation to acquire voice samples using a sliding time (e.g.,10ms) window across, for example, one second of input. In some embodiments, processing of voice data includes pulse code modulation transformation that down samples each time segment to 2x the frequency range, which may be coupled with voice fast fourier transforms to convert the signal from the time domain to the frequency domain.
Various embodiments can use any one or more and/or any combination of the .. following helper networks and/or associated functions. In one embodiment, the system can include a helper network that includes a face geometry detection DNN. The face geometry DNN can be configured to support locating face(s) and associated characteristics in an image by transforming each image into geometric primitives and measuring the relative position, width, and other parameters of eyes, mouth(s), nose(s), and chin(s).
Facial recognition functions can be similar to fingerprint recognition functions executed by fingerprint helper networks as both networks process similar modalities (e.g., image data and identification of structures within the images data to build an authentication representation). According to one embodiment, a helper network can include a fingerprint geometry detection DNN configured to accurately locate finger(s) in an image, and analysis .. can include transforming each image into geometric primitives to measure each finger's relative position, width, and other parameters. In one example, helper networks that process image data can be configured to identify relevant structures in the image and return positional information in the image (e.g., X and Y coordinates), video frame, and/or video stream submitted for processing of the relevant structures. In one example, geometry networks
-32-process image credentials and their output can be used in validating the authentication instance or rejecting the instance as invalid.
In another embodiment, a helper network can include a face validation DNN
configured validate face input images (e.g., front looking face images). In various embodiments, the validation DNN is configured to validate any one or more or any combination of the following: a valid image input image was received, the submitted image data has forward facing face images, the image includes features consistent with a facial image (e.g., facial characteristics are present, and/or present in sufficient volume, etc.);
lighting is sufficient; boundaries within image are consistent with facial images, etc.
Similarly, a helper network can include a fingerprint validation DNN
configured to validate fingerprint input images. Such validation networks can be configured to return a validation score used to determine if an image is valid for further processing. In one example, the validation networks can return a score in the range between 0 to 100, where 100 is a perfect image, although other scoring systems and/or ranges can be used.
In further embodiments, a helper network can include one or more image state detection neural networks. The image state neural networks can be configured to detect various states (e.g., binary image conditions (e.g., face mask on/face mask off, eye blink yes/eye blink no, etc.)) or other more complex state values. The state values can be used in authentication credential processing. In one example, the system can employ an image state value to select an embedding generation neural network or to select a neural network to process an input authentication credential, among other options. In one example, a detection helper network can include a face mask detection DNN configured to determine if image data includes an entity wearing a face mask.
In further example, the system can also execute face mask detection algorithms to determine if a subject is wearing a mask. Stated broadly, masks used during enrollment lower subsequent prediction performance. In some embodiments, the face + mask on/off detection DNN accepts a face input image (e.g., a forward-looking facial image) and returns a value 0 to 100, where 0 is mask off and 100 is mask on. Various thresholds can be applied to a range of values to establish an on/off state.
In one example, a web client can include a URL parameter for enrollment and prediction (e.g., "maskCheck=true"), and based on the output (e.g., state =
Mask On) can communicate real-time instructions to the user to remove the mask. In other examples, the system can be set to automatically select a face + mask embedding DNN tailored to process images with face and masks. In various embodiments, the face + mask embedding DNN is a
-33-specialized pre-trained neural network configured to process user image data where the user to be authenticated is wearing a mask. A corresponding classification network can be trained on such data (e.g., one-way encryptions of image data where users are in masks), and once trained to predict matches on user's wearing masks.
In another embodiment, a helper network can be configured to determine a state of image data where a user is or is not wearing glasses. In one example, a detection helper network can include an eyeglasses detection DNN configured to determine if image data includes an entity wearing eyeglasses. In further example, the system can also execute eyeglass helper network to determine if a subject is wearing eyeglasses. In one example, the system can also execute an eyeglass detection algorithm to determine if a subject is wearing eyeglasses before allowing enrollment. Stated broadly, eyeglasses used during enrollment can lower subsequent prediction performance. In some embodiments, the eyeglasses on/off detection DNN accepts a front view of face input image, returns a value 0 to 100, where 0 is eyeglasses off and 100 is eyeglasses on. In some embodiments, various thresholds can be applied to a range of values to establish an on/off state. For example, values above 60 can be assigned to an on state with values below 40 assigned to an off state (or, for example, above 50/below 50). Intermediate values can be deemed inconclusive or in other embodiments the complete range between 0 to 100 can be assigned to either state.
Various authentication system can test if a user is wearing glasses. For example, a web client can include a URL parameter for enrollment and prediction (e.g., "eyeGlassCheck=true"), and based on the output (e.g., state = Glasses On) can communicate real-time instructions to the user to remove the glasses. In other embodiments, generation/classification networks can be trained on image data of a user with glasses and the associated networks can be selected based on processing images of users with glasses and predicting on encrypted representations of the same.
In another embodiment, a helper network can include an eye geometry detection DNN. The detection DNN is configured to locate eye(s) in an image by transforming a front facing facial image into geometric primitives and measuring relative position of the geometric primitives. In one example, the DNN is configured to return positional information (e.g., x, y coordinates) of eyes in an image, video frame or video stream.
In one embodiment, a helper network can include an eyes open/closed detection DNN. For example, a real-time determination that an entity seeking authentication is blinking provides real-time passive facial liveness confirmation. Determining that a user is actually submitting their authentication information at the time of the authentication request
-34-prevents spoofing attacks (e.g., holding up an image of an authentic user). In various examples, the system can include algorithms to test liveness and mitigate the risk of a photo or video spoofing attack during unattended operation. In one example, the eye open detection DNN receives an input image of an eye and outputs a validation score between 0 and 100, where 0 is eyes closed and 100 is eyes open. Various thresholds can be applied to a range of values to establish an eye open/closed state as discussed herein.
According to one embodiment, the authentication system prevents a user/entity from proceeding until the detection of a pair of eye-open/eye-closed events. In one example, the web client can be configured with a URL parameter "faceLiveness=true" that allows the system to require an eye-blink check. The parameter can be used to change operation of blinking testing and/or default settings. In further examples, rates of blinking can be established and linked to users as behavioral characteristics to validate.
In some embodiments, helper networks can be configured to augment authentication credential data. For example, a helper network can include facial and fingerprint augmentation DNNs that are used as part of training validation networks. In various embodiments, data augmentation via helper networks is configured to generalize the enrollment of authentication information, improve accuracy and performance during subsequent prediction, and allow the classification component and/or subsystem to handle real-world conditions. Stated generally, enrollment can be defined on the system to require a certain number of instances to achieve a level of accuracy while balancing performance. For example, the system can require >50 instances of an authentication credential (e.g., >50 biometric input images) to maintain accuracy and performance. The system can be configured to execute algorithms to augment valid credential inputs to reach or exceed 50 instances. For example, a set of images can be expanded to 50 or more instances that can also be broadened to add boundary conditions to generalize the enrollment. The broadening can include any one or more and/or any combination of: enhanced image rotations flips, color and lighting homogenizations, among other options. Each instance of an augmentation can be tested to require improvement in evaluation of the distance metric (Euclidean distances or cosine similarity) comparison, and also be required not to surpass class boundaries. For example, the system can be configured to execute algorithms to remove any authentication credentials (e.g., images) that exceed class boundaries. Once filtered, the remaining images challenge the distance metric boundaries without surpassing them.
In the example of image data used to authenticate, if only one image is available for enrollment, the system is configured to augment the facial input image >50 (e.g., 60, 70, 80,
-35-etc.) times, remove any outliers, and then enroll the user. According to one embodiment, the web client is configured to capture 8 images, morph each image, for example, 9 times, remove any outliers and then enroll the user. As discussed, the system can be configured to require a baseline number of instances for enrollment. For example, enrollment can require >50 augmented biometric input images to maintain the health, accuracy, and performance of the recognition operations. In various embodiments, the system accepts biometric input image(s), morphs and homogenizes the lighting and contrast once, and discards the original images once encrypted representations are produced.
It is realized that that there is no intrinsic requirement to morph images for prediction.
Thus, some embodiments are configured to morph/augment images only during enrollment.
In other embodiments, the system can also be configured to homogenize images submitted for prediction (e.g., via HSL transforms, etc.). In some examples, homogenized images used during prediction can increase system performance when compared to non-homogenized images. According to some examples, image homogenization can be executed based on convenience libraries (e.g., in Python and JavaScript). According to some embodiments, during prediction the web client is configured to capture three images, morph and homogenize the lighting and contrast once, and then discards the original images once encrypted representations are generated.
In various embodiments, helper networks can be configured to support transformation of authentication credentials into encrypted representations by pre-trained neural networks (e.g., referred to as embedding networks or generation networks). The embedding networks can be tailored to specific authentication credential input. According to one embodiment, the system includes face, face + mask, and fingerprint embedding neural networks, among others.
Where respective embedding networks are configured to transform the input image to distance measurable one-way homomorphic encryptions (e.g., embedding, or vector encryption) which can be a two-dimensional positional array of 128 floating-point numbers.
In various implementations, face, face + mask, and fingerprint embedding neural networks maintain full accuracy through real-world boundary conditions. Real world conditions have been tested to include poor lighting; inconsistent camera positioning;
expression; image rotation of up to 22.5'; variable distance; focus impacted by blur and movement; occlusions of 20-30% including facial hair, glasses, scars, makeup, colored lenses and filters, and abrasions; and B/W and grayscale images. In various embodiments, the embedding neural networks are architected on the MobileNetV2 architecture and are configured to output a one-way encrypted payload in <100ms.
-36-In various embodiments, voice input can include additional processing. For example, the system can be configured to execute voice input segmentation that generalizes the enrollment data, improves accuracy and performance during prediction, and allows the system to handle real-world conditions. In various embodiments, the system is configured to require >50 10ms voice samples, to establish a desired level of accuracy and performance. In one example, the system is configured to capture voice instances based on a sliding 10ms window that can be captured across one second of voice input, which enables the system to reach or exceed 50 samples.
In some embodiments, the system is configured to execute pulse code modulation to reduce the input to two times the frequency range, and PCM enables the system to use the smallest possible Fourier transform without computational loss. In other embodiments, the system is configured to execute voice fast fourier transform (FFT) which transforms the pulse code modulated audio signal from the time domain to a representation in the frequency domain. According to some examples, the transform output is a 2-dimensional array of frequencies that can be input to a voice embedding DNN. For example, the system can include a voice embedding network that is configured to accept input of one 2-dimensional array of frequencies and transform the input to a 4kB, 2-dimensional positional array of 128 floating-point numbers (e.g., cosine-measurable embedding and/or 1-way vector encryption), and then deletes the original biometric.
According to various embodiments, the web client can be configured to acquire authentication credentials (e.g., biometrics) at the edge with or without a network. For example, the web client can be configured to automatically switch to a local mode after detection of loss of network. According to some embodiments, the web client can support offline operation ("local mode") using Edge computing. In one example, the device in local mode authenticates a user using face and fingerprint recognition, and can do so in 10ms with intermittent or no Internet connection as long as the user authenticates at least once to the device while online. In some embodiments, the device is configured to store the user's embeddings and/or encrypted feature vectors locally using a web storage API
during the prediction.
Fig. 9 illustrates an example process flow 1500 for facial recognition according to one embodiment. At 1502 facial image data is processed by a face geometry neural network using a probe. As part of execution of 1502, the neural network operates to transform the input data into geometric primitives and uses the geometric primitives to locate facial structures including, for example, eyes, mouth, nose, chin, and other relevant facial
-37-structures. Based on the analysis of the geometric primitives positional information can be output as part of 1502, and the positional information can be used in subsequent processing steps. For example, process 1500 can continue 1504 with processing via a face validation neural network. The processing of 1504 can include validation of the image data is including facial structures, information, and may employ the position information developed in 1502.
In further example, processing and validation in 1502-1504 can include operations to align an input image on facial features and can include additional operations to crop an input image around relevant facial features (e.g., using position information). Process 1500 continues at 1506 with processing by an eyes open/closed neural network. The neural network is configured to detect whether facial input data includes transitions between eyes open and closed states, which is indicative of a live person or more specifically a blinking person during use of the authentication functions. According to some embodiments, detection of blinking can be used to validate "liveness" of authentication information submission (e.g., not spoofed submission).
According to some embodiments, the process flow 1500 can also include operations to detect whether the user is wearing glasses. For example, at 1508, submitted user data can be processed to determine if a submitted image includes the user wearing eyeglasses or not.
In one example, an image capture is processed through a neural network (e.g., eyeglasses on/off neural network) to determine if the image data includes the user wearing eyeglasses or not. The system can be configured to respond to the determination in a variety of ways. In one example if eyeglasses are detected a user may be requested to re-image their face for authentication. In other examples, the system can be configured to use different neural networks to process the image data. For example, a first neural network can be configured to process image data in which users are wearing glasses and a second different neural network to process image data of users (e.g., even the same user) when wearing glasses. The state determination glasses on/off can be used to select between such networks.
In some embodiments, process 1500 can include data augmentation operations.
For example, at 1510, data augmentation can be executed to flip and rotate acquired images, and/or morph acquired images to achieve a system defined requisite number of image samples. Various embodiments are configured to confirm and validate input authentication information prior to performing data expansion operations (e.g., 1510).
Ensuring valid data and filtering bad data ensures the accuracy of any resulting enrollment. In another example at 1510, data augmentation neural networks can be employed to homogenize lighting conditions for submitted image data. In another example at 1510, data augmentation neural networks can
-38-be employed to homogenize lighting conditions for submitted image data.
According to various embodiments, multiple techniques can be used to augment and/or homogenize the lighting for a subject image. In one example, two homogenization techniques are used to update the image data.
As shown in process flow 1500, a number of steps can be executed prior to creation of encrypted feature vectors/embeddings that are one-way encrypted representations of submitted authentication inputs. In other embodiments, the processing can be omitted and/or executed in fewer steps and such process flows can be reduced to functions for creation of one-way encryptions of authentication credentials by an embedding network (e.g., at 1512).
In still other embodiments, processing to validate authentication inputs can be executed to improve enrollment and subsequent authentication can be handled by other processes and/or systems.
According to various embodiments, the process 1500 includes steps 1502 through 1510 which can be performed by various helper networks that improve the data provided for .. enrollment and creation of one-way encryptions of submitted authentication information that are derived to be measurable in their encrypted form. For example, the operations performed at 1502 through 1510 can improve the data input to an embedding network that is configured to take a plain text input and produce a one-way encrypted output of the authentication information. As shown in the process flow 1500, once an encrypted representation of an authentication input is produced, the original authentication credential (e.g., original biometric) can be deleted at 1514.
Fig. 10 is an example process flow 1600 for biometric acquisition of a fingerprint. At 1602, image data captured by a probe is transformed into geometric primitives based on input to a fingerprint geometry neural network (e.g., a fingerprint geometry DNN).
The neural network can be configured to transform image data into geometric primitives and locate fingerprints within the image data based on analysis of the geometric primitives, relative spacing, boundaries, structures, etc. In some embodiments, output of the fingerprint geometry DNN can include positional information for fingerprints and/or characteristics within the image data.
In step 1604, submitted data can be processed to determine validity. For example, the image data can be input into a fingerprint validation neural network at 1604.
In one example, the fingerprint validation neural network can be architected as a DNN. The neural network can be configured to validate a proper fingerprint capture exists in the image data (e.g., based on analysis of the image data by the neural network and/or geometric primitives produced by
-39-the fingerprint geometry neural network). In further embodiments the fingerprint validation neural network can also be configured to determine the validity of the submitted fingerprint data. For example, the validity helper network can be configured to determine that a live sample (and not spoofed) is being presented, as well as validating the input as a good authentication data source.
Similar to process 1500, process 1600 includes operations to augment data submission. Data augmentation (e.g., 1606) can be executed as part of enrollment to ensure a threshold number of data instances are provided during enrollment. In various embodiment, process flow 1600 is configured to validate authentication inputs to ensure good inputs are augmented for training further models.
In further examples, data augmentation can also be used during prediction operations.
In one example, data augmentation during prediction can be limited to homogenizing light conditions for submitted image data (e.g., face image, fingerprint image, other image, etc.).
According to one embodiment, fingerprint image data is manipulated to improve the image .. data and or create additional instances as part of data augmentation steps.
Manipulation can include image flips, rotations, skews, offsets, cropping, among other options.
Operations executed during data augmentation can also include homogenization of the lighting conditions for an input image (e.g., transform into HSL). Various lighting homogenization functions can be executed on the image data. In one example, the system is configured to .. execute at least two homogenization techniques to standardize lighting conditions. According to some embodiments, the operations of 1606 can also include conversion of the image to a grayscale image.
Steps 1602 through 1606 can be executed to improve and/or prepare fingerprint image data for enrollment by a fingerprint embedding neural network (e.g., at 1608).
The fingerprint .. embedding neural network is configured to generate one-way distance measurable encrypted representations of input authentication credentials. For example, the fingerprint embedding neural network can be architected as a deep neural network. The fingerprint embedding DNN
can be configured to create one-way homomorphic encryptions of input fingerprint data.
Once the encrypted representations are produced, the encrypted representations can be used .. in subsequent operations (e.g., classification and/or prediction), and the process flow 1600 can include a step (e.g., 1610) to delete any original authentication credential information, including any original biometric.
Fig. 11 is an example process flow 1700 for acquisition of vocal authentication credentials. According to one embodiment, process 1700 can begin based on transformation
-40-of voice data captured by a probe at 1702. According to one example, input voice data is transformed based on voice pulse code modulation (PCM). Processing of the audio data can include capturing samples of time segments from the audio information. In one example, silence is removed from the audio information and PCM is executed against one second samples from the remaining audio data. In other embodiments, different sample sizes can be used to achieve a minimum number of authentication instances for enrollment and/or prediction. According to some embodiments, the PCM operation is configured to down sample the audio information to two times the frequency range. In other embodiments different down sampling frequencies can be used. Once PCM is complete at 1702, process 1700 continues at 1704 with a fourier transformation of the PCM signal from the time domain to the frequency domain. According to some embodiments, a voice fast fourier transformation operation is executed at 1704 to produce the frequency domain output.
Process 1700 continues at 1706, where the frequency domain output of 1704 can be input into a voice embedding neural network. According to some embodiments, the voice embedding neural network can include or be based on a deep neural network architecture. As discussed herein, the embedding neural network is configured to produce a one-way encryption of input authentication information. In this example, the voice embedding DNN is configured to generate an encrypted representation of audio/voice data that is geometrically measurable (e.g., cosine measurable). Once the encrypted representation is generated, any original authentication information can be deleted at 1708. For example, once the voice embedding DNN produces its encryption, the original audio input can be deleted to preserve privacy.
Modifications and variations of the discussed embodiments will be apparent to those of ordinary skill in the art and all such modifications and variations are included within the scope of the appended claims. For example, while many examples and embodiments are discussed above with respect to a user or person, and identification/authentication of same, it is realized that the system can identify and/or authenticate any item or thing or entity for which image capture is possible (e.g., family pet, heirloom, necklace, ring, landscape, etc.) or other type of digital capture is possible (e.g., ambient noise in a location, song, signing, specific gestures by an individual, sign language movements, words in sign language, etc.).
Once digitally captured the object of identification/authentication can be processed by a first generation/embedding network, whose output is used to train a second classification network, enabling identification of the object in both distance measure and classification settings on fully encrypted identifying information. In further aspects, the authentication systems (e.g.,
-41-embedding and classification networks) are protected by various helper networks that process and validate authentication data as good or bad sources of data. Filtering of bad data sources protects subsequent embedding models and yields authentication systems that are more accurate and flexible than conventional approaches.
An illustrative computer system on which the discussed functions, algorithms, and/or neural network can be implements is shown by way of computer system 1200, FIG.
12, which may be used in connection with any of the embodiments of the disclosure provided herein. The computer system 1200 may include one or more processors 1210 and one or more articles of manufacture that comprise non-transitory computer-readable storage media (e.g., memory 1220 and one or more non-volatile storage media 1230). The processor 1210 may control writing data to and reading data from the memory 1220 and the non-volatile storage device 1230 in any suitable manner. To perform any of the functionality described herein, the processor 1210 may execute one or more processor-executable instructions stored in one or more non-transitory computer-readable storage media (e.g., the memory 1220), which may serve as non-transitory computer-readable storage media storing processor-executable instructions for execution by the processor 1210.
Private Biometric Implementation (Figs. 1-5d) Various embodiments are discussed below for enrolling users with private biometrics and prediction on the same. Various embodiments describe some considerations broadly and .. provide illustrative examples for implementation of private biometrics.
These examples and embodiments can be used with liveness verification of the respective private biometrics as discussed above. Further embodiments can include and/or be coupled with various helper networks to facilitate authentication information acquisition, validation, and/or enrollment of the same, and establish a fully private implementation for identification and authentication.
Fig. 13 is an example process flow 2100 for enrolling in a privacy-enabled biometric system (e.g., Fig. 3, 304 described in greater detail below or Fig. 7, 704 below). Process 2100 begins with acquisition of unencrypted biometric data at 2102. The unencrypted biometric data (e.g., plaintext, reference biometric, etc.) can be directly captured on a user device, received from an acquisition device, or communicated from stored biometric information.
In one .. example, a user takes a photo of themselves on their mobile device for enrollment. Pre-processing steps can be executed on the biometric information at 2104. For example, given a photo of a user, pre-processing can include cropping the image to significant portions (e.g., around the face or facial features). Various examples exist of photo processing options that can take a reference image and identify facial areas automatically.
-42-In another example, the end user can be provided a user interface that displays a reference area, and the user is instructed to position their face from an existing image into the designated area. Alternatively, when the user takes a photo, the identified area can direct the user to focus on their face so that it appears within the highlighted area. In other options, the system can analyze other types of images to identify areas of interest (e.g., iris scans, hand images, fingerprint, etc.) and crop images accordingly. In yet other options, samples of voice recordings can be used to select data of the highest quality (e.g., lowest background noise), or can be processed to eliminate interference from the acquired biometric (e.g., filter out background noise).
Having a given biometric, the process 2100 continues with generation of additional training biometrics at 2106. For example, a number of additional images can be generated from an acquired facial image. In one example, an additional twenty five images are created to form a training set of images. In some examples, as few as three or even one images can be used but with the tradeoff of reduced accuracy. In other examples, as many as forty training images may be created or acquired. The training set is used to provide for variation of the initial biometric information, and the specific number of additional training points can be tailored to a desired accuracy (see e.g., Tables 1-VIII below provide example implementation and test results).
Other embodiments can omit generation of additional training biometrics.
Various ranges of training set production can be used in different embodiments (e.g., any set of images from two to one thousand). For an image set, the training group can include images of different lighting, capture angle, positioning, etc. For audio based biometrics different background noises can be introduced, different words can be used, different samples from the same vocal biometric can be used in the training set, among other options. Various embodiments of the system are configured to handle multiple different biometric inputs including even health profiles that are based at least in part on health readings from health sensors (e.g., heart rate, blood pressure, EEG signals, body mass scans, genome, etc.), and can, in some examples, include behavioral biometric capture/processing. According to various embodiments, biometric information includes Initial Biometric Values (IBV) a set of plaintext values (pictures, voice, SSNO, driver's license number, etc.) that together define a person.
At 2108, feature vectors are generated from the initial biometric information (e.g., one or more plain text values that identify an individual). Feature vectors are generated based on all available biometric information which can include a set of and training biometrics generated from the initial unencrypted biometric information received on an individual or individuals.
-43-According to one embodiment, the IBV is used in enrollment and for example in process 2100.
The set of IBVs are processed into a set of initial biometric vectors (e.g., encrypted feature vectors) which are used downstream in a subsequent neural network.
In one implementation, users are directed to a website to input multiple data points for biometric information (e.g., multiple pictures including facial images), which can occur in conjunction with personally identifiable information ("PIT"). The system and/or execution of process 2100 can include tying the PIT to encryptions of the biometric as discussed below.
In one embodiment, a convolutional deep neural network is executed to process the unencrypted biometric information and transform it into feature vector(s) which have a property of being one-way encrypted cipher text. The neural network is applied (2108) to compute a one-way homomorphic encryption of the biometric ¨ resulting in feature vectors (e.g., at 2110). These outputs can be computed from an original biometric using the neural network but the values are one-way in that the neural network cannot then be used to regenerate the original biometrics from the outputs.
Various embodiments employ networks that take as input a plaintext input and return Euclidean measurable output. One such implementation is FaceNet which takes in any image of a face and returns 128 floating point numbers, as the feature vector. The neural network is fairly open ended, where various implementations are configured to return a distance or Euclidean measurable feature vector that maps to the input. This feature vector is nearly impossible to use to recreate the original input biometric and is therefore considered a one-way encryption.
Various embodiments are configured to accept the feature vector(s) produced by a first neural network and use it as input to a new neural network (e.g., a second classifying neural network). According to one example, the new neural network has additional properties. This neural network is specially configured to enable incremental training (e.g., on new users and/or new feature vectors) and configured to distinguish between a known person and an unknown person. In one example, a fully connected neural network with 2 hidden layers and a "hinge"
loss function is used to process input feature vectors and return a known person identifier (e.g., person label or class) or indicate that the processed biometric feature vectors are not mapped to a known person. For example, the hinge loss function outputs one or more negative values if the feature vector is unknown. In other examples, the output of the second neural network is an array of values, wherein the values and their positions in the array determined a match to a person or identification label.
-44-Various embodiments use different machine learning models for capturing feature vectors in the first network. According to various embodiments, the feature vector capture is accomplished via a pre-trained neural network (including, for example, a convolutional neural network) where the output is distance measurable (e.g., Euclidean measurable).
In some examples, this can include models having a softmax layer as part of the model, and capture of feature vectors can occur preceding such layers. Feature vectors can be extracted from the pre-trained neural network by capturing results from the layers that are Euclidean measurable. In some examples, the softmax layer or categorical distribution layer is the final layer of the model, and feature vectors can be extracted from the n-1 layer (e.g., the immediately preceding layer). In other examples, the feature vectors can be extracted from the model in layers preceding the last layer. Some implementations may offer the feature vector as the last layer.
In some embodiments, an optional step can be executed as part of process 2100 (not shown). The optional step can be executed as a branch or fork in process 2100 so that authentication of a user can immediately follow enrollment of a new user or authentication information. In one example, a first phase of enrollment can be executed to generate encrypted feature vectors. The system can use the generated encrypted feature vectors directly for subsequent authentication. For example, distance measures can be application to determine a distance between enrolled encrypted feature vectors and a newly generated encrypted feature vector. Where the distance is within a threshold, the user can be authenticated or an authentication signal returned. In various embodiments, this optional authentication approach can be used while a classification network is being trained on encrypted feature vectors in the following steps.
The resulting feature vectors are bound to a specific user classification at 2112. For example, deep learning is executed at 2112 on the feature vectors based on a fully connected neural network (e.g., a second neural network, an example classifier network).
The execution is run against all the biometric data (i.e., feature vectors from the initial biometric and training biometric data) to create the classification information. According to one example, a fully connected neural network having two hidden layers is employed for classification of the biometric data. In another example, a fully connected network with no hidden layers can be used for the classification. However, the use of the fully connected network with two hidden layers generated better accuracy in classification in some example executions (see e.g., Tables 1-VIII described in greater detail below). According to one embodiment, process 2100 can be executed to receive an original biometric (e.g., at 2102) generate feature vectors (e.g., 2110), and apply a FCNN classifier to return a label for identification at 2112 (e.g., output #people).
-45-In further embodiments, step 2112 can also include filtering operations executed on the encrypted feature vectors before binding the vectors to a label via training the second network.
For example, encrypted feature vectors can be analyzed to determine if they are within a certain distance of each other. Where the generated feature vectors are too far apart, they can be rejected for enrollment (i.e., not used to train the classifier network). In other examples, the system is configured to request additional biometric samples, and re-evaluate the distance threshold until satisfied. In still other examples, the system rejects the encrypted biometrics and request new submissions to enroll.
Process 2100 continues with discarding any unencrypted biometric data at 2114.
In one example, an application on the user's phone is configured to enable enrollment of captured biometric information and configured to delete the original biometric information once processed (e.g., at 2114). In other embodiments, a server system can process received biometric information and delete the original biometric information once processed.
According to some aspects, only requiring that original biometric information exists for a short period during processing or enrollment significantly improves the security of the system over conventional approaches. For example, systems that persistently store or employ original biometric data become a source of vulnerability. Unlike a password that can be reset, a compromised biometric remains compromised, virtually forever.
Returning to process 2100, at 2116 the resulting cipher text (e.g., feature vectors) biometric is stored. In one example, the encrypted biometric can be stored locally on a user device. In other examples, the generated encrypted biometric can be stored on a server, in the cloud, a dedicated data store, or any combination thereof. In one example, the encrypted biometrics and classification is stored for use in subsequent matching or searching. For instance, new biometric information can be processed to determine if the new biometric information matches any classifications. The match (depending on a probability threshold) can then be used for authentication or validation.
In cases where a single match is executed, the neural network model employed at 2112 can be optimized for one to one matching. For example, the neural network can be trained on the individual expected to use a mobile phone (assuming no other authorized individuals for the device). In some examples, the neural network model can include training allocation to accommodate incremental training of the model on acquired feature vectors over time. Various embodiments, discussed in great detail below incorporate incremental training operations for the neural network to permit additional people and to incorporate newly acquired feature vectors.
-46-In other embodiments, an optimized neural network model (e.g., FCNN) can be used for a primary user of a device, for example, stored locally, and remote authentication can use a data store and one to many models (e.g., if the first model returns unknown).
Other embodiments may provide the one to many models locally as well. In some instances, the authentication scenario (e.g., primary user or not) can be used by the system to dynamically select a neural network model for matching, and thereby provide additional options for processing efficiency.
Fig. 14 illustrates an example process 2200 for authentication with secured biometric data. Process 2200 begins with acquisition of multiple unencrypted biometrics for analysis at 2202. In one example, the privacy-enabled biometric system is configured to require at least three biometric identifiers (e.g., as plaintext data, reference biometric, or similar identifiers).
If for example, an authentication session is initiated, the process can be executed so that it only continues to the subsequent steps if a sufficient number of biometric samples are taken, given, and/or acquired. The number of required biometric samples can vary, and take place with as few as one.
Similar to process 222100, the acquired biometrics can be pre-processed at 2204 (e.g., images cropped to facial features, voice sampled, iris scans cropped to relevant portions, etc.).
Once pre-processing is executed the biometric information is transformed into a one-way homomorphic encryption of the biometric information to acquire the feature vectors for the biometrics under analysis (e.g., at 2206). Similar to process 222100, the feature vectors can be acquired using any pre-trained neural network that outputs distance measurable encrypted feature vectors (e.g., Euclidean measurable feature vectors, homomorphic encrypted feature vectors, among other options). In one example, this includes a pre-trained neural network that incorporates a softmax layer. However, other examples do not require the pre-trained neural network to include a softmax layer, only that they output Euclidean measurable feature vectors.
In one example, the feature vectors can be obtained in the layer preceding the softmax layer as part of step 2206.
In various embodiments, authentication can be executed based on comparing distances between enrolled encrypted biometrics and subsequently created encrypted biometrics. In further embodiments, this is executed as a first phase of authentication. Once a classifying network is trained on the encrypted biometrics a second phase of authentication can be used, and authentication determinations made via 2208.
According to some embodiments, the phases of authentication can be executed together and even simultaneously. In one example, an enrolled user will be authenticated using the
-47-classifier network (e.g., second phase), and a new user will be authenticated by comparing distances between encrypted biometrics (e.g., first phase). As discussed, the new user will eventually be authenticated using a classifier network trained on the new user's encrypted biometric information, once the classifier network is ready.
At 2208, a prediction (e.g., a via deep learning neural network) is executed to determine if there is a match for the person associated with the analyzed biometrics. As discussed above with respect to process 2100, the prediction can be executed as a fully connected neural network having two hidden layers (during enrollment the neural network is configured to identify input feature vectors as (previously enrolled) individuals or unknown, and an unknown individual (not previously enrolled) can be added via incremental training or full retraining of the model).
In other examples, a fully connected neural network having no hidden layers can be used.
Examples of neural networks are described in greater detail below (e.g., Figs.
17-20 illustrates an example neural network). Other embodiments of the neural network can be used in process 2200. According to some embodiments, the neural network features include operates as a classifier during enrollment to map feature vectors to identifications;
operates as a predictor to identify a known person or an unknown. In some embodiments, different neural networks can be tailored to different types of biometrics, and facial images processed by one, while voice biometrics are processed by another.
According to some embodiments, process 2208 is described agnostic to submitter security. In other words, process 2200 relies on front end application configuration to ensure submitted biometrics are captured from the person trying to authenticate. As process 2200 is agnostic to submitter security, the process can be executed in local and remote settings in the same manner. However, according to some implementations the execution relies on the native application or additional functionality in an application to ensure an acquired biometric represents the user to be authenticated or matched.
Fig. 15 illustrates an example process flow 2250 showing additional details for a one to many matching execution (also referred to as prediction). According to one embodiment, process 2250 begins with acquisition of feature vectors (e.g., step 2206 of Fig. 22A or 2110 of Fig. 13). At 2254, the acquired feature vectors are matched against existing classifications via a deep learning neural network. In one example, the deep learning neural network has been trained during enrollment on s set of individuals. The acquired feature vectors will be processed by the trained deep learning network to predict if the input is a match to known individual or does not match and returns unknown. In one example, the deep learning network
-48-is a fully connected neural network ("FCNN"). In other embodiments, different network models are used for the second neural network.
According to one embodiment, the FCNN outputs an array of values. These values, based on their position and the value itself, determine the label or unknown.
According to one embodiment, returned from a one to many case are a series of probabilities associated with the match ¨ assuming five people in the trained data: the output layer showing probability of match by person: [0.1, 0.9, 0.3, 0.2, 0.1] yields a match on Person 2 based on a threshold set for the classifier (e.g., > .5). In another run, the output layer: [0.1, 0.6, 0.3, 0.8, 0.1] yields a match on Person 2 & Person 4 (e.g., using the same threshold).
However, where two results exceed the match threshold, the process and or system is configured to select the maximum value and yield a (probabilistic) match Person 4. In another example, the output layer: [0.1, 0.2, 0.3, 0.2, 0.1] shows no match to a known person ¨ hence an UNKNOWN person - as no values exceed the threshold. Interestingly, this may result in adding the person into the list of authorized people (e.g., via enrollment discussed above), or this may result in the person being denied access or privileges on an application. According to various embodiments, process 250 is executed to determine if the person is known or not. The functions that result can be dictated by the application that requests identification of an analyzed biometrics.
For an UNKNOWN person, i.e. a person never trained to the deep learning enrollment and prediction neural network, an output layer of an UNKNOWN person looks like [-0.7, -1.7, -6.0, -4.3]. In this case, the hinge loss function has guaranteed that the vector output is all negative. This is the case of an UNKNOWN person. In various embodiments, the deep learning neural network must have the capability to determine if a person is UNKNOWN.
Other solutions that appear viable, for example, support vector machine ("SVM") solutions break when considering the UNKNOWN case. In one example, the issue is scalability. An svm implementation cannot scale in the many-to-many matching space becoming increasing unworkable until the model simply cannot be used to return a match in any time deemed functional (e.g., 100 person matching cannot return a result in less than 20 minutes). According to various embodiments, the deep learning neural network (e.g., an enrollment & prediction neural network) is configured to train and predict in polynomial time.
Step 2256 can be executed to vote on matching. According to one embodiment, multiple images or biometrics are processed to identify a match. In an example where three images are processed the FCNN is configured to generate an identification on each and use each match as a vote for an individual's identification. Once a majority is reached (e.g., at least
-49-two votes for person A) the system returns as output identification of person A. In other instance, for example, where there is a possibility that an unknown person may result ¨ voting can be used to facilitate determination of the match or no match. In one example, each result that exceeds the threshold probability can count as one vote, and the final tally of votes (e.g., often 4 out of 5) is used to establish the match. In some implementations, an unknown class may be trained in the model ¨ in the examples above a sixth number would appear with a probability of matching the unknown model. In other embodiments, the unknown class is not used, and matching is made or not against known persons. Where a sufficient match does not result, the submitted biometric information is unknown.
Responsive to matching on newly acquired biometric information, process 2250 can include an optional step 2258 for retraining of the classification model. In one example, a threshold is set such that step 2258 tests if a threshold match has been exceeded, and if yes, the deep learning neural network (e.g., classifier & prediction network) is retrained to include the new feature vectors being analyzed. According to some embodiments, retraining to include newer feature vectors permits biometrics that change over time (e.g., weight loss, weight gain, aging or other events that alter biometric information, haircuts, among other options).
Fig. 16 is a block diagram of an example privacy-enabled biometric system 2304.
According to some embodiments, the system can be installed on a mobile device or called from a mobile device (e.g., on a remote server or cloud based resource) to return an authenticated or not signal. In various embodiments system 2304 can executed any of the preceding processes.
For example, system 2304 can enroll users (e.g., via process 2100), identify enrolled users (e.g., process 2200), and search for matches to users (e.g., process 2250).
According to various embodiments, system 2304 can accept, create or receive original biometric information (e.g., input 2302). The input 2302 can include images of people, images of faces, thumbprint scans, voice recordings, sensor data, etc. A biometric processing component (e.g., 2308) can be configured to crop received images, sample voice biometrics, etc., to focus the biometric information on distinguishable features (e.g., automatically crop image around face). Various forms of pre-processing can be executed on the received biometrics, designed to limit the biometric information to important features.
In some embodiments, the pre-processing (e.g., via 2308) is not executed or available.
In other embodiments, only biometrics that meet quality standards are passed on for further processing.
Processed biometrics can be used to generate additional training data, for example, to enroll a new user. A training generation component 2310 can be configured to generate new biometrics for a user. For example, the training generation component can be configured to
-50-create new images of the users face having different lighting, different capture angles, etc., in order to build a train set of biometrics. In one example, the system includes a training threshold specifying how many training samples to generate from a given or received biometric. In another example, the system and/or training generation component 2310 is configured to build twenty five additional images from a picture of a user's face. Other numbers of training images, or voice samples, etc., can be used.
The system is configured to generate feature vectors from the biometrics (e.g., process images from input and generated training images). In some examples, the system 2304 can include a feature vector component 2312 configured to generate the feature vectors. According to one embodiment, component 2312 executes a convolution neural network ("CNN"), where the CNN includes a layer which generates Euclidean measurable output. The feature vector component 2312 is configured to extract the feature vectors from the layers preceding the softmax layer (including for example, the n-1 layer). As discussed above, various neural networks can be used to define feature vectors tailored to an analyzed biometric (e.g., voice, image, health data, etc.), where an output of or with the model is Euclidean measurable. Some examples of these neural networks include model having a softmax layer. Other embodiments use a model that does not include a softmax layer to generate Euclidean measurable vectors.
Various embodiments of the system and/or feature vector component are configured to generate and capture feature vectors for the processed biometrics in the layer or layer preceding the softmax layer.
According to another embodiment, the feature vectors from the feature vector component 2312 or system 2304 are used by the classifier component 2314 to bind a user to a classification (i.e., mapping biometrics to a match able /searchable identity). According to one embodiment, the deep learning neural network (e.g., enrollment and prediction network) is executed as a FCNN trained on enrollment data. In one example, the FCNN
generates an output identifying a person or indicating an UNKNOWN individual (e.g., at 2306). Other examples, use not fully connected neural networks.
According to various embodiments, the deep learning neural network (e.g., which can be an FCNN) must differentiate between known persons and the UNKNOWN. In some examples, this can be implemented as a sigmoid function in the last layer that outputs probability of class matching based on newly input biometrics or showing failure to match.
Other examples achieve matching based on a hinge loss functions.
In further embodiments, the system 2304 and/or classifier component 2314 are configured to generate a probability to establish when a sufficiently close match is found. In
-51-some implementations, an unknown person is determined based on negative return values. In other embodiments, multiple matches can be developed and voting can also be used to increase accuracy in matching.
Various implementations of the system have the capacity to use this approach for more than one set of input. The approach itself is biometric agnostic. Various embodiments employ feature vectors that are distance measurable and/or Euclidean measurable, which is generated using the first neural network. In some instances, different neural networks are configured to process different types of biometrics. Using that approach the encrypted feature vector generating neural network may be swapped for or use a different neural network in conjunction with others where each is capable of creating a distance and/or Euclidean measurable feature vector based on the respective biometric. Similarly, the system may enroll in two or more biometric types (e.g., use two or more vector generating networks) and predict on the feature vectors generated for both (or more) types of biometrics using both neural networks for processing respective biometric type simultaneously. In one embodiment, feature vectors from each type of biometric can likewise be processed in respective deep learning networks configured to predict matches based on feature vector inputs or return unknown. The simultaneous results (e.g., one from each biometric type) may be used to identify using a voting scheme or may better perform by firing both predictions simultaneously According to further embodiments, the system can be configured to incorporate new identification classes responsive to receiving new biometric information. In one embodiment, the system 2304 includes a retraining component configured to monitor a number of new biometrics (e.g., per user/identification class or by total number of new biometrics) and automatically trigger a re-enrollment with the new feature vectors derived from the new biometric information (e.g., produced by 2312). In other embodiments, the system can be configured to trigger re-enrollment on new feature vectors based on time or time period elapsing.
The system 2304 and/or retraining component 316 can be configured to store feature vectors as they are processed, and retain those feature vectors for retraining (including for example feature vectors that are unknown to retrain an unknown class in some examples).
Various embodiments of the system are configured to incrementally retrain the model on system assigned numbers of newly received biometrics. Further, once a system set number of incremental retraining have occurred the system is further configured to complete a full retrain of the model. The variables for incremental retraining and full retraining can be set on the system via an administrative function. Some defaults include incremental retrain every 3, 4, 5,
-52-6 identifications, and full retrain every 3, 4, 5, 6, 7, 8, 9, 10 incremental retrains. Additionally, this requirement may be met by using calendar time, such as retraining once a year. These operations can be performed on offline (e.g., locked) copies of the model, and once complete the offline copy can be made live.
Additionally, the system 2304 and/or retraining component 2316 is configured to update the existing classification model with new users/identification classes. According to various embodiments, the system builds a classification model for an initial number of users, which can be based on an expected initial enrollment. The model is generated with empty or unallocated spaces to accommodate new users. For example, a fifty user base is generated as a one hundred user model. This over allocation in the model enables incremental training to be executed on the classification model. When a new user is added, the system is and/or retraining component 316 is configured to incrementally retrain the classification model ¨
ultimately saving significant computation time over convention retraining executions. Once the over allocation is exhausted (e.g., 100 total identification classes) a full retrain with an additional over allocation can be made (e.g., fully retrain the 100 classes to a model with 150 classes). In other embodiments, an incremental retrain process can be executed to add additional unallocated slots.
Even with the reduced time retraining, the system can be configured to operate with multiple copies of the classification model. One copy may be live that is used for authentication or identification. A second copy may be an updated version, that is taken offline (e.g., locked from access) to accomplish retraining while permitting identification operations to continue with a live model. Once retraining is accomplished, the updated model can be made live and the other model locked and updated as well. Multiple instances of both live and locked models can be used to increase concurrency.
According to some embodiments, the system 2300 can receive encrypted feature vectors instead of original biometrics and processing original biometrics can occur on different systems ¨ in these cases system 2300 may not include, for example, 2308, 2310, 2312, and instead receive feature vectors from other systems, components or processes.
Figs. 17-20 illustrate example embodiments of a classifier network. The embodiments show a fully connected neural network for classifying feature vectors for training and for prediction. Other embodiments implement different neural networks, including for example, neural networks that are not fully connected. Each of the networks accepts distance and/or Euclidean measurable feature vectors and returns a label or unknown result for prediction or binds the feature vectors to a label during training.
-53-Figs. 21-24 illustrate examples of processing that can be performed on input biometrics (e.g., facial image) using a neural network. Encrypted feature vectors can be extracted from such neural networks and used by a classifier (e.g., Figs. 21-24) during training or prediction operations. According to various embodiments, the system implements a first pre-trained neural network for generating distance and/or Euclidean measurable feature vectors that are used as inputs for a second classification neural network. In other embodiments, other neural networks are used to process biometrics in the first instance. In still other examples, multiple neural networks can be used to generate Euclidean measurable feature vectors from unencrypted biometric inputs each may feed the feature vectors to a respective classifier. In some examples, each generator neural network can be tailored to a respective classifier neural network, where each pair (or multiples of each) is configured to process a biometric data type (e.g., facial image, iris images, voice, health data, etc.).
User Interface Examples According to some embodiments, the user interface screens can include visual representations showing operation of helper network functions or operations to support helper network functions. For example, and eye blink status can be displayed in the user interface showing a lockout condition that prevents further operation until a threshold number of eye blinks are detected. In other examples, the user interface can display a detected mask status, a detected glasses status, among other options. Depending on system configuration, the detected status can prevent advancement or authentication until remedial action is taken ¨ remove mask, remove glasses, etc. In other embodiments, the system can use detected statuses to select further authentication steps (e.g., tailor selection of embedding networks and associated classification networks, among other options).
Implementation Examples The following example instantiations are provided to illustrate various aspects of privacy-enabled biometric systems and processes. The examples are provided to illustrate various implementation details and provide illustration of execution options as well as efficiency metrics. Any of the details discussed in the examples can be used in conjunction with various embodiments.
It is realized that conventional biometric solutions have security vulnerability and efficiency/scalability issues. Apple, Samsung, Google and MasterCard have each launched biometric security solutions that share at least three technical limitations.
These solutions are (1) unable to search biometrics in polynomial time; (2) do not one-way encrypt the reference biometric; and (3) require significant computing resources for confidentiality and matching.
-54-Modern biometric security solutions are unable to scale (e.g. Apple Face IDTM
authenticates only one user) as they are unable to search biometrics in polynomial time. In fact, the current "exhaustive search" technique requires significant computing resources to perform a linear scan of an entire biometric datastore to successfully one-to-one record match each reference biometric and each new input record ¨ this is as a result of inherent variations in the biometric instances of a single individual.
Similarly, conventional solutions are unable to one-way encrypt the reference biometric because exhaustive search (as described above) requires a decryption key and a decryption to plaintext in the application layer for every attempted match. This limitation results in an unacceptable risk in privacy (anyone can view a biometric) and authentication (anyone can use the stolen biometric). And, once compromised, a biometric -- unlike a password -- cannot be reset.
Finally, modern solutions require the biometric to return to plaintext in order to match since the encrypted form is not Euclidean measurable. It is possible to choose to make a biometric two-way encrypted and return to plaintext -- but this requires extensive key management and, since a two-way encrypted biometric is not Euclidean measurable, it also returns the solution to linear scan limitations.
Various embodiments of the privacy-enabled biometric system and/or methods provide enhancement over conventional implementation (e.g., in security, scalability, and/or management functions). Various embodiments enable scalability (e.g., via "encrypted search") and fully encrypt the reference biometric (e.g., "encrypted match"). The system is configured to provide an "identity" that is no longer tied independently to each application and a further enables a single, global "Identity Trust Store" that can service any identity request for any application.
Various operations are enabled by various embodiments, and the functions include. For example:
- Encrypted Match: using the techniques described herein, a deep neural network ("DNN") is used to process a reference biometric to compute a one-way, homomorphic encryption of the biometric before transmitting or storing any data. This allows for computations and comparisons on cipher texts without decryption, and ensures that only the distance and/or Euclidean measurable, homomorphic encrypted biometric is available to execute subsequent matches in the encrypted space. The plaintext data can then be discarded and the resultant homomorphic encryption is then transmitted and stored in a datastore. This example allows for computations and comparisons on cipher
-55-texts without decryption and ensures that only the Euclidean measurable, homomorphic encrypted biometric is available to execute subsequent matches in the encrypted space.
- Encrypted Search: using the techniques described herein, encrypted search is done in polynomial time according to various embodiments. This allows for comparisons of biometrics and achieve values for comparison that indicate "closeness" of two biometrics to one another in the encrypted space (e.g. a biometric to a reference biometric) while at the same time providing for the highest level of privacy.
Various examples detail implementation of one-to-many identification using, for example, the N-1 layer of a deep neural network. The various techniques are biometric agnostic, allowing the same approach irrespective of the biometric or the biometric type. Each biometric (face, voice, IRIS, etc.) can be processed with a different, fully trained, neural network to create the biometric feature vector.
According to some aspects, an issue with current biometric schemes is they require a mechanism for: (1) acquiring the biometric, (2) plaintext biometric match, (3) encrypting the biometric, (4) performing a Euclidean measurable match, and (5) searching using the second neural network prediction call. To execute steps 1 through 5 for every biometric is time consuming, error prone and frequently nearly impossible to do before the biometric becomes deprecated. One goal with various embodiments, is to develop schemes, techniques and technologies that allow the system to work with biometrics in a privacy protected and polynomial-time based way that is also biometric agnostic. Various embodiments employ machine learning to solve problems issues with (2)-(5).
According to various embodiments, assumed is or no control over devices such as cameras or sensors that acquire the to be analyzed biometrics (thus arriving as plain text).
According to various embodiments, if that data is encrypted immediately and only process the biometric information as cipher text, the system provides the maximum practical level of privacy. According to another aspect, a one-way encryption of the biometric, meaning that given cipher text, there is no mechanism to get to the original plaintext, reduces/eliminates the complexity of key management of various conventional approaches. Many one-way encryption algorithms exist, such as MD5 and SHA-512 - however, these algorithms are not homomorphic because they are not Euclidean measurable. Various embodiments discussed herein enable a general purpose solution that produces biometric cipher text that is Euclidean measurable using a neural network. Apply a classifying algorithm to the resulting feature vectors enables one-to-many identification. In various examples, this maximizes privacy and runs between 0(n) = 1 and 0(n) = log(n) time.
-56-As discussed above, some capture devices can encrypt the biometric via a one-way encryption and provide feature vectors directly to the system. This enables some embodiments, to forgo biometric processing components, training generation components, and feature vector generation components, or alternatively to not use these elements for already encrypted feature vectors.
Example Execution and Accuracy In some executions, the system is evaluated on different numbers of images per person to establish ranges of operating parameters and thresholds. For example, in the experimental execution the num-epochs establishes the number of interactions which can be varied on the system (e.g., between embodiments, between examples, and between executions, among other options). The LFW dataset is taken from the known labeled faces in the wild data set. Eleven people is a custom set of images and faces94 from the known source ¨ faces94.
For our examples, the epochs are the number of new images that are morphed from the original images.
So if the epochs are 25, and we have 10 enrollment images, then we train with 250 images.
The morphing of the images changed the lighting, angels and the like to increase the accuracy in training.
TABLE I
(fully connected neural network model with 2 hidden layers + output sigmoid layer):
)=- Input => [100, 50] => num people (train for 100 people given 50 individuals to identify). Other embodiments improve over these accuracies for the UNKNOWN.
Dataset Training Test UNKNOWN #images Amages Parameters Accuracy Accuracy Set Set PERSON in Test Set in UNKNOWN in Test set ill UNKNOWN
Set PERSON Set PERSON Set LFW 70% 30% 11 people 1304 257 m4Limages_oer_person =
98.90% 86.40%
dataset ............................................. nu-epochs = 25 LFW 70% 30% 11 people 2226 257 min_imaoes_per_persari = 3 93.90% 8720%
dataset num-epochs = 25 11 people 70% 30% C;c9Y 2 Pe Ple 77 4 min_imaces_per_person = 2 100.00% 50.00%
from LFW num-epochs = 25 faces94 70% 30% 11 people 916 257 min_in-zges_per_person = 2 9E00% 79.40%
data set num-epochs = 25 TABLE II
(0 hidden layers & output linear with decision f(x); Decision at .5 value) Improves accuracy for the UNKNOWN case, but other implementations achieve higher accuracy.
-57-Dataset Training Test UNKNOWN *images #inlages Parameters Accuracy Accuracy Set Set PERSON In Test Set In UNKNOWN In Test Set In UNKNOWN
Set PERSON Set PERSON Set ................................................................... i ......
LFW 70% 30% 11 people 1304 257 min_images_oer_person =10 88.80% 9110% 3s dataset ¨ ------ ¨ --- ¨ ------- num-epochs = 25 - - ¨
LFW 70% i 30% 11 people 2225 257 rnin_images per_person = 3 96.60% 97.70% 'in dataset num-epochs = 25 11 people 70% 430% Copy 2 people 7 4 min_images_per_person = 2 98.70% 50 00% 'k from LFW nurn-epocns = 25 -------- _.....

1,' fa min¨images_ per_person = 2 faces94 70% i 30% 11 people 915 257 99.10% 82.10%
nun 1-epochs = 25 dataset Cut-off = 0õ5 ¨ ...........................................................................
faces94 70% 30% 11 people 918 257 rnm jrnages_per_person = 2 . 98.30% 95.70%
num-epochs = 25 dataset -------------- -i ........................ Cut-off = 1 .0 TABLE III ¨ FCNN with 1 hidden layer (500 nodes) + output linear with decision Dataset Training Test UNKNOWN *images *images Parameters Accuracy Accuracy Set PERSON Set In Test Set in UNKNOWN
In Test Set In UNKNOWN
Set PERSON Set PERSON Set ........................................... =i= ................... =i= ....
LFW 70% 30% 11 people dataset 1304 257 min_images oer_person = 10 99.30% 92.20%
num-epochs = 25 ........................................... + ..................... + ......
LFW 70% 30% 11 people dataset 2226 257 min_imades per_person = 3 97.50% 97.70%
num-epochs = 25 ........................................... + ..................... + ......
11 people 70% 30% Copy 2 people 77 4 miri_imaces_per_person = 2 from LFW num-epochs = 25 ........ + ..
faces94 70% 30% 11 people dataset 918 257 min_imaces_per_person = 2 99.20% 92.60%
num-epochs = 25 Cut-off = 0.5 ........................................... + ..................... + ......
faces94 70% 30% 11 people dataset 918 257 mirLimades per_p.erson = 2 num-epochs = 25 Cut-off = 1.0 TABLE IV
¨ FCNN 2 Hidden Layers (500, 2*num people) + output linear, decisions f(x) #images #images Accuracy Accuracy UNKNOWN
Training Test In In Dataset PERSON Parameters Set Set UNKNOWN In Test UNKNOWN
SET In Test Set PERSON Set PERSON
SET Set LFW min_images_ 11 per_person =
70% 30% people 1304 257 10 98.30%
97.70%
data set num-epochs =25 LFW 11 min_images_ 70% 30% people 2226 257 per_person =
98.50% 98.10%
data set 3
-58-num-epochs =25 Cut-off = 0 11 min_images_ people Copy per_person =
people 70% 30% 77 4 2 from LFW num-epochs =25 min_images_ 11 per_person =

70% 30% people 918 257 98.60%
93.80%
data set num-epochs faces94 =25 Cut-off = 0 In various embodiments, the neural network model is generated initially to accommodate incremental additions of new individuals to identify (e.g., 2*num people is an example of a model initially trained for 100 people given an initial 50 individuals of biometric information). The multiple or training room provides can be tailored to the specific implementation. For example, where additions to the identifiable users is anticipated to be small additional incremental training options can include any number with ranges of 1% to 200%. In other embodiments, larger percentages can be implemented as well.
TABLE V
¨ FCNN: 2 Hidden Layers (500, 2*num people) + output linear, decisions f(x), and voting ¨ where the model is trained on 2* the number of class identifiers for incremental training.
#images #images Accuracy Accuracy Accuracy UNKNOWN In In In Training Test UNKNOWN
Dataset PERSON In Test UNKNOWN
Parameters In Test UNKNOWN
Set Set PERSON
SET Set PERSON Set PERSON
Set = 11 SET Set=
faces94 people min_images_ 11 people per_person = 98.20% 98.80%
88.40%
LFW 70% 30% 1304 257 10 dataset num-epochs (vote) (vote) (vote) =25 100.00%
100.00% 90.80%
min_images_ per_person = 98.10% 98.40%
93.60%
LFW 70% 30% 11 people 32226 257 dataset num-epochs (vote) (vote) (vote) =25 98.60%
100.00% 95.40%
Cut-off = 0 min_images_ Copy 2 per_person =

70% 30% people 77 4 2 people from LFW num-epochs =25 11 people min_images_ 70% 30% 918 257 dataset per_person =
-59-num-epochs =25 faces94 Cut-off = 0 According to one embodiment the system can be implemented as a REST compliant API that can be integrated and/or called by various programs, applications, systems, system components, etc., and can be requested locally or remotely.
In one example, the privacy-enabled biometric API includes the following specifications:
= Preparing data: this function takes the images & labels and saves them into the local directory.

def add training data(list of images, list of label) :
@params list of images: the list of images @params list of label: the list of corresponding labels = Training model: each label (person/individual) can include at least 2 images.
In some examples, if the person does not have the minimum that person will be ignored.

def train() :

= Prediction:

def predict(list of images) :
@params list of images: the list of images of the same person @return label: a person name or "UNKNOWN PERSON"

Further embodiments can be configured to handle new people (e.g., labels or classes in the model) in multiple ways. In one example, the current model can be retrained every time (e.g., with a threshold number) a certain number of new people are introduced.
In this example, the benefit is improved accuracy ¨ the system can guarantee a level of accuracy even with new people. There exists a trade-off in that full retraining is a slow time consuming and a heavy
-60-computation process. This can be mitigated with live and offline copies of the model so the retraining occurs offline and the newly retrain model is swapped for the live version. In one example, training time executed in over 20 minutes. With more data the training time increases.
According to another example, the model is initialized with slots for new people. The expanded model is configured to support incremental training (e.g., the network structure is not changed when adding new people). In this example, the time to add new people is significantly reduced (even over other embodiments of the privacy-enabled biometric system).
It is realized that there may be some reduction in accuracy with incremental training, and as more and more people are added the model can trends towards overfit on the new people i.e., become less accurate with old people. However, various implementations have been tested to operate at the same accuracy even under incremental retraining.
Yet another embodiment implements both incremental retraining and full retraining at a threshold level (e.g., build the initial model with a multiple of the people as needed ¨ (e.g., 2 times - 100 labels for an initial 50 people, 50 labels for an initial 25 people, etc.)). Once the number of people reaches the upper bound (or approaches the upper bound) the system can be configured to execute a full retrain on the model, while building in the additional slots for new users. In one example, given 100 labels in the model with 50 initial people (50 unallocated) reaches 50 new people, the system will execute a full retrain for 150 labels and now 100 actual people. This provides for 50 additional users and incremental retraining before a full retrain is executed.
Stated generally, the system in various embodiments is configured to retrain the whole network from beginning for every N people. Training data: have 100 people;
step 1: train the network with N = 1000 people; assign 100 people and reserving 900 to train incremental; train incrementally with new people until we reach 1000 people; and reach 1000 people, full retrain.
Full retrain: train the network with 2N = 2000 people; now have 1000 people for reserving to train incremental; train incrementally with new people until we reach 2000 people; and repeat the full retrain with open allocations when reach the limit.
An example implementation of the API includes the following code:
drop database if exists trueid;
create database trueid;
grant all on trueid.* to trueid@'localhost' identified by 'trueid';
drop table if exists feature;
drop table if exists image;
drop table if exists PII;
-61-drop table if exists subject;
CREATE TABLE subject ( id MT PRIMARY KEY AUTO INCREMENT, when created TIMESTAMP DEFAULT CURRENT TIMESTAMP
);
CREATE TABLE PII
( id MT PRIMARY KEY AUTO INCREMENT, subject id INT, tag VARCHAR(254), value VARCHAR(254) );
CREATE TABLE image ( id MT PRIMARY KEY AUTO INCREMENT, subject id INT, image name VARCHAR(254), is train boolean, when created TIMESTAMP DEFAULT CURRENT _TIMESTAMP
);
CREATE TABLE feature ( id MT PRIMARY KEY AUTO INCREMENT, image id INT NOT NULL, feature order INT NOT NULL, feature value DECIMAL(32,24) NOT NULL
);
ALTER TABLE image ADD CONSTRAINT fk subject id FOREIGN KEY
(subject id) REFERENCES subject(id);
ALTER TABLE PII ADD CONSTRAINT fk subject id pii FOREIGN KEY
(subject id) REFERENCES subject(id);
ALTER TABLE feature ADD CONSTRAINT fk image id FOREIGN KEY
(image id) REFERENCES image(id);
-62-CREATE INDEX piisubjectid ON PII(subject id);
CREATE INDEX imagesubjectid ON image(subject id);
CREATE INDEX imagesubjectidimage ON image(subject id, image name);
CREATE INDEX featureimage id ON feature(image id);
API Execution Example:
- Push the known LFW feature embeddings to biometric feature database.
- Simulate the incremental training process:
num seed = 50 # build the model network, and first num seed people was trained fully num window = 50 # For every num window people: build the model network, and people trained fully num step = 1 # train incremental every new num step people num eval = 10 # evaluate the model every num eval people - Build the model network with #class = 100. Train from beginning (#epochs =
100) with the first 50 people. The remaining 50 classes are reserved for incremental training.
i) Incremental training for the 51st person. Train the previous model with all 51 people (#epochs = 20) ii) Incremental training for the 52st person. Train the previous model with all 52 people (#epochs = 20) iii) continue ...
- (Self or automatic monitoring can be executed by various embodiments to ensure accuracy over time ¨ alert flags can be produced if deviation or excessive inaccuracy is detected; alternatively or in conjunction full retraining can be executed responsive to excess inaccuracy and the fully retrained model evaluated to determine is accuracy issues are resolved ¨ if so the full retrain threshold can be automatically adjusted). Evaluate the accuracy of the previous model (e.g., at every 10 steps), optionally record the training time for every step.
- Achieve incremental training for maximum allocation (e.g., the 100th person). Full train of the previous model with all 100 people (e.g., #epochs = 20)
-63-- Build the model network with #class = 150. Train from beginning (e.g., #epochs =
100) with the first 100 people. The remaining 50 classes are reserved for incremental training.
i) Incremental training for the 101st person. Train the previous model with all 101 people (#epochs = 20) ii) continue ...
- Build the model network with #class = 200. Train from beginning (e.g., #epochs = 100) with the first 150 people. The remaining 50 classes are reserved for incremental training.
i) Incremental training for the 151st person. Train the previous model with all 151 people (#epochs = 20) ii) Continue ...
Refactor problem:
According to various embodiments, it is realized that incremental training can trigger concurrency problems: e.g., a multi-thread problem with the same model, thus the system can be configured to avoid retrain incrementally at the same time for two different people (data can be lost if retraining occurs concurrently). In one example, the system implements a lock or a semaphore to resolve. In another example, multiple models can be running simultaneously ¨
and reconciliation can be executed between the models in stages. In further examples, the system can include monitoring models to ensure only one retrain is executed one multiple live models, and in yet others use locks on the models to ensure singular updates via incremental retrain. Reconciliation can be executed after an update between models. In further examples, the system can cache feature vectors for subsequent access in the reconciliation.
According to some embodiments, the system design resolves a data pipeline problem:
in some examples, the data pipeline supports running one time due to queue and thread characteristics. Other embodiments, avoid this issue by extracting the embeddings. In examples, that do not include that functionality the system can still run multiple times without issue based on saving the embedding to file, and loading the embedding from file. This approach can be used, for example, where the extracted embedding is unavailable via other approaches. Various embodiments can employ different options for operating with embeddings: when we give a value to a tensorflow, we have several ways: Feed dict (speed trade-off for easier access); and Queue: faster via multi-threads, but can only run one time (the queue will be ended after it's looped).
-64-Table VI (Fig. 30) & TABLE VII (Fig. 31) show execution timing during operation and accuracy percentages for the respective example.
-65-TABLE VIII shows summary information for additional executions.
#images #images Accuracy UNKNOWN #people in Parameter Training Test In Test Set In Dataset PERSON Training s Set Set UNKNOWN
Set Set PERSON In Test Set Set min_images _per_person 98.70%
11 people =10 LFW 70% 30% 158 1304 257 num-epochs dataset (vote) =25 100.00%
Cut-off= 0 min_images _per_person 93.80%
=3 LFW 70% 30% 11 people 901 2226 .. 257 num-epochs dataset = 25 (vote) 95.42%
Cut-off= 0 According to one embodiment, the system can be described broadly to include any one or more or any combination of the following elements and associated functions:
-Preprocessing: where the system takes in an unprocessed biometric, which can include cropping and aligning and either continues processing or returns that the biometric cannot be processed.
- Neural network 1: Pre-trained. Takes in unencrypted biometrics. Returns biometric feature vectors that are one-way encrypted and distance and/or Euclidean measurable.
Regardless of biometric type being processed ¨ NN1 generates Euclidean measurable encrypted feature vectors. In various embodiments, the system can instantiate multiple NN 1(s) for individual credentials and also where each or groups of NN is are tailored to different authentication credential.
-Distance evaluation of NN1 output for a phase of authentication and/or to filter output of NN1: As discussed above, a first phase of authentication can use encrypted feature vectors to determine a distance and authenticate or not based on being within a threshold distance. Similarly during enrollment the generated feature vectors can be evaluated to ensure they are within a threshold distance and otherwise require new biometric samples.
- Neural network 2: Not pre-trained. It is a deep learning neural network that does classification. Includes incremental training, takes a set of label, feature vector pairs as input and returns nothing during training ¨ the trained network is used for matching or
-66-prediction on newly input biometric information. Does prediction, which takes a feature vector as input and returns an array of values. These values, based on their position and the value itself, determine the label or unknown.
- Voting functions can be executed with neural network 2 e.g., during prediction.
- System may have more than one neural network 1 for different biometrics. Each would generate Euclidean measurable encrypted feature vectors based on unencrypted input.
- System may have multiple neural network 2(s) one for each biometric type.
According to further aspects, the system achieves significant improvements in accuracy of identification based at least in part on bounded enrollment of encrypted feature vectors over conventional approaches. For example, at any point when encrypted feature vectors are created for enrollment (e.g., captured by device and processed by a generation network, built from captures to expand enrollment pool and processes by a generation network), those encrypted feature vectors are analyzed to determine that they are similar enough to each other to use for a valid enrollment. In some embodiments, the system evaluates the produced encryptions and tests whether any encrypted features vectors have a Euclidean distance of greater than 1 from each other (e.g., other thresholds can be used). If so, those values are discarded. If a minimum number of values is not met, the entire enrollment can be deemed a failure, and new inputs requested, processed and validated prior to training a respective classification network. Stated broadly, the bounded enrollment thresholds can be established based, at least in part, on what threshold is being used to determine a measurement (e.g., two encrypted feature vectors) is the same as another. Constraining training inputs to the classification network so that all the inputs are within a boundary close to the identification threshold ensures that the resulting classification network is stable and accurate. In some examples, even singular outliers can destabilize an entire network, and significantly reduce accuracy.
Fig. 25 is a block diagram of an example privacy-enabled biometric system 2504 with liveness validation. According to some embodiments, the system can be installed on a mobile device or called from a mobile device (e.g., on a remote server or cloud based resource) to return an authenticated or not signal. In further embodiments, the system can include a web based client or application that provides fully private authentication services. In various embodiments, system 2504 can execute any of the following processes. For example, system 2504 can enroll users (e.g., via process 2100), identify enrolled users (e.g., process 2200) and/or include multiple enrollment phases (e.g., distance metric evaluation and fully encrypted input/evaluation), and search for matches to users (e.g., process 2250). In various embodiments, system 2504 includes multiple pairs of neural networks, and any associated
-67-number of helper networks to provide improve data sets used is later identification/authentication, including, for example with the paired neural networks. In some embodiments, each pair includes a processing/generating neural network for accepting an unencrypted authentication credential (e.g., biometric input (e.g., images or voice, etc.), behavioral input (e.g., health data, gesture tracking, eye movement, etc.) and processing to generate an encrypted embedding or encrypted feature vector. Each pair of networks can also include a classification neural network than can be trained on the generated encrypted feature vectors to classify the encrypted information with labels, and that is further used to predict a match to the trained labels or an unknown class based on subsequent input of encrypted feature vectors to the trained network. According to some embodiments, the predicted match(es) can be validated by comparing the input to the classification network (e.g., encrypted embedding/feature vector) against encrypted embedding/feature vectors of the identified match(es). Various distance metrics can be used to compare the encrypted embeddings, including, least squares analysis, L2 analysis, distance matrix analysis, sum of-squared-errors, cosine measure, etc.
In various embodiments, authentication capture and/or validation can be augmented by a plurality of helper networks configured to improve identification of information to capture from provided authentication information, improve validation, improve authentication entropy, among other options. The authentication architecture can be separated in various embodiments. For example , the system can be configured with a trained classification neural network and receive from another processing component, system, or entity, encrypted feature vectors to use for prediction with the trained classification network.
In further example, the system configured to generate encrypted authentication information can be coupled with various helper networks configured to facilitate capture and processing of the unencrypted authentication information into filtered data that can be used in generating one-way encryptions. According to various embodiments, system 2504 can accept, create or receive original biometric information (e.g., input 2502). The input 2502 can include images of people, images of faces, thumbprint scans, voice recordings, sensor data, etc.
Further, the voice inputs can be requested by the system, and correspond to a set of randomly selected biometric instances (including for example, randomly selected words) as part of liveness validation. According to various embodiments, the inputs can be processed for identity matching and in conjunction the inputs can be analyzed to determine matching to the randomly selected biometric instances for liveness verification. As discussed above, the system 2504 can also be architected to provide a prediction on input of an encrypted feature
-68-vector, and another system or component can accept unencrypted biometrics and/or generate encrypted feature vectors, and communicate the same for processing.
According to one embodiment, the system can include a biometric processing component 2508. A biometric processing component (e.g., 2508) can be configured to crop received images, sample voice biometrics, eliminate noise from microphone captures, etc., to focus the biometric information on distinguishable features (e.g., automatically crop image around face, eliminate background noise for voice sample, normalized health data received, generate samples of received health data, etc.). Various forms of pre-processing can be executed on the received biometrics, and the pre-processing can be executed to limit the biometric information to important features or to improve identification by eliminating noise, reducing an analyzed area, etc. In some embodiments, the pre-processing (e.g., via 2508) is not executed or not available. In other embodiments, only biometrics that meet quality standards are passed on for further processing.
According to further embodiments, the system can also include a plurality of neural networks that facilitate processing of plaintext authentication information and the transformation of the same into fully private or one-way encrypted authentication information.
Processed biometrics can also be used to generate additional training data, for example, to enroll a new user, and/or train a classification component/network to perform predictions.
According to one embodiment, the system 2504 can include a training generation component 2510, configured to generate new biometrics for use in training to identify a user. For example, the training generation component 2510 can be configured to create new images of the user's face or voice having different lighting, different capture angles, etc., different samples, filtered noise, introduced noise, etc., in order to build a larger training set of biometrics. In one example, the system includes a training threshold specifying how many training samples to generate from a given or received biometric. In another example, the system and/or training generation component 2510 is configured to build twenty five additional images from a picture of a user's face. Other numbers of training images, or voice samples, etc., can be used. In further examples, additional voice samples can be generated from an initial set of biometric inputs to create a larger set of training samples for training a voice network (e.g., via 2510).
In some other embodiments, the training generation component can include a plurality of helper networks configured to homogenize input identification/authentication information based on a credential modality (e.g., face biometric data, voice biometric data, behavioral data, etc.).
-69-According to one embodiment, the system is configured to generate encrypted feature vectors from an identification/authentication information input (e.g., process images from input and/or generated training images, process voice inputs and/or voice samples and/or generated training voice data, among other options). In various embodiments, the system 2504 can include an embedding component 2512 configured to generate encrypted embeddings or encrypted feature vectors (e.g., image feature vectors, voice feature vectors, health data feature vectors, etc.). The terms authentication information input can be used to referred to information used for identification, for identification and authentication, and for authentication, and each implementation is contemplated, unless other context requires.
According to one embodiment, component 2512 executes a convolution neural network ("CNN") to process image inputs (and for example, facial images), where the CNN includes a layer which generates geometrically (e.g., distance, Euclidean, cosine, etc.) measurable output.
The embedding component 2512 can include multiple neural networks each tailored to specific biometric inputs, and configured to generate encrypted feature vectors (e.g., for captured images, for voice inputs, for health measurements or monitoring, etc.) that are distance measurable. According to various embodiments, the system can be configured to required biometric inputs of various types, and pass the type of input to respective neural networks for processing to capture respective encrypted feature vectors, among other options. In various embodiments, one or more processing neural networks is instantiated as part of the embedding component 2512, and the respective neural network process unencrypted biometric inputs to generate encrypted feature vectors.
In one example, the processing neural network is a convolutional neural network constructed to create encrypted embeddings from unencrypted biometric input.
In one example, encrypted feature vectors can be extracted from a neural network at the layers preceding a softmax layer (including for example, the n-1 layer). As discussed herein, various neural networks can be used to define embeddings or feature vectors with each tailored to an analyzed biometric (e.g., voice, image, health data, etc.), where an output of or with the model is Euclidean measurable. Some examples of these neural network include a model having a softmax layer. Other embodiments use a model that does not include a softmax layer to generate Euclidean measurable feature vectors. Various embodiments of the system and/or embedding component are configured to generate and capture encrypted feature vectors for the processed biometrics in the layer or layer preceding the softmax layer.
Optional processing of the generated encrypted biometrics can include filter operations prior to passing the encrypted biometrics to classifier neural networks (e.g., a DNN). For
-70-example, the generated encrypted feature vectors can be evaluated for distance to determine that they meet a validation threshold. In various embodiments, the validation threshold is used by the system to filter noisy or encrypted values that are too far apart.
According to one aspect, filtering of the encrypted feature vectors improves the subsequent training and prediction accuracy of the classification networks. In essence, if a set of encrypted embeddings for a user are too far apart (e.g., distances between the encrypted values are above the validation threshold) the system can reject the enrollment attempt, request new biometric measurements, generate additional training biometrics, etc.
Each set of encrypted values can be evaluated against the validation threshold and values with too great a distance can be rejected and/or trigger requests for additional/new biometric submission. In one example, the validation threshold is set so that no distance between comparisons (e.g., of face image vectors) is greater than 0.85. In another example, the threshold can be set such that no distance between comparisons is greater than 1Ø Stated broadly, various embodiments of the system are configured to ensure that a set of enrollment vectors are of sufficient quality for use with the classification DNN, and in further embodiments configured to reject enrollment vectors that are bad (e.g., too dissimilar).
According to some embodiments, the system can be configured to handle noisy enrollment conditions. For example, validation thresholds can be tailored to accept distance measures of having an average distance greater than .85 but less than 1 where the minimum distance between compared vectors in an enrollment set is less than .06.
Different thresholds can be implemented in different embodiments, and can vary within 10%, 15%
and/or 20% of the examples provided. In further embodiments, each authentication credential instance (e.g., face, voice, retina scan, behavioral measurement, etc.) can be associated with a respective validation threshold. Additionally, the system can use identification thresholds that are more constrained than the validation threshold. For example, in the context of facial identification, the system can require a validation threshold of no greater than a Euclidean distance of 1 between enrollment face images of an entity to be identified. In one example, the system can be configured to require better precision in actual identification, and for example, that the subsequent authentication/identification measure be within 0.85 Euclidean distance to return a match.
According to some embodiments, the system 2504 can include a classifier component 2514. The classifier component can include one or more deep neural networks trained on encrypted feature vector and label inputs for respective users and their biometric inputs. The trained neural network can then be used during prediction operations to return a match to a
-71-person (e.g., from among a group of labels and people (one to many matching) or from a singular person (one to one matching)) or to return a match to an unknown class.
During training of the classifier component 2514, the feature vectors from the embedding component 2512 or system 2504 are used by the classifier component 2514 to bind a user to a classification (i.e., mapping biometrics to a matchable /searchable identity).
According to one embodiment, a deep learning neural network (e.g., enrollment and prediction network) is executed as a fully connected neural network ("FCNN") trained on enrollment data.
In one example, the FCNN generates an output identifying a person or indicating an UNKNOWN individual (e.g., at 2506). Other examples can implement different neural networks for classification and return a match or unknown class accordingly.
In some examples, the classifier is a neural network but does not require a fully connected neural network.
According to various embodiments, a deep learning neural network (e.g., which can be an FCNN) must differentiate between known persons and the UNKNOWN. In some examples, the deep learning neural network can include a sigmoid function in the last layer that outputs probability of class matching based on newly input biometrics or that outputs values showing failure to match. Other examples achieve matching based on executing a hinge loss function to establish a match to a label/person or an unknown class.
In further embodiments, the system 2504 and/or classifier component 2514 are configured to generate a probability to establish when a sufficiently close match is found. In some implementations, an unknown person is determined based on negative return values (e.g., the model is tuned to return negative values for no match found). In other embodiments, multiple matches can be developed by the classifier component 2514 and voting can also be used to increase accuracy in matching.
Various implementations of the system (e.g., 2504) have the capacity to use this approach for more than one set of input. In various embodiments, the approach itself is biometric agnostic. Various embodiments employ encrypted feature vectors that are distance measurable (e.g., Euclidean, homomorphic, one-way encrypted, etc.), generation of which is handled using the first neural network or a respective first network tailored to a particular biometric.
In some embodiments, the system can invoke multiple threads or processes to handle volumes of distance comparisons. For example, the system can invoke multiple threads to accommodate an increase in user base and/or volume of authentication requests.
According to various aspects, the distance measure authentication is executed in a brute force manner. In
-72-such settings, as the user population grows so does the complexity or work required to resolve the analysis in a brute force (e.g., check all possibilities (e.g., until match)) fashion. Various embodiments are configured to handle this burden by invoking multiple threads, and each thread can be used to check a smaller segment of authentication information to determine a match.
In some examples, different neural networks are instantiated to process different types of biometrics. Using that approach the vector generating neural network may be swapped for or use a different neural network in conjunction with others where each is capable of creating a distance measurable encrypted feature vector based on the respective biometric. Similarly, the system may enroll on both or greater than multiple biometric types (e.g., use two or more vector generating networks) and predict on the feature vectors generated for both types of biometrics using both neural networks for processing respective biometric types, which can also be done simultaneously. In one embodiment, feature vectors from each type of biometric can likewise be processed in respective deep learning networks configured to predict matches based on the feature vector inputs (or return unknown). The co-generated results (e.g., one from each biometric type) may be used to identify a user using a voting scheme and may better perform by executing multiple predictions simultaneously. For each biometric type used, the system can execute multi-phase authentication approaches with a first generation network and distance measures in a first phase, and a network trained on encrypted feature vectors in a second phase. At various times each of the phases may be in use ¨ for example, an enrolled user can be authenticated with the trained network (e.g., second phase), while a newly enrolling user is enrolled and/or authenticated via the generation network and distance measure phase.
In some embodiments, the system can be configured to validate an unknown determination. It is realized that accurately determining that an input to the authentication system is an unknown is an unsolved problem in this space. Various embodiments leverage the deep learning construction (including, for example, the classification network) described herein to enable identification/return of an unknown result. In some embodiments, the DNN
can return a probability of match that is below a threshold probability. If the result is below the threshold, the system is configured to return an unknown result. Further embodiments leverage the distance store to improve the accuracy of the determination of the unknown result.
In one example, upon a below threshold determination output from the DNN, the system can validate the below threshold determination by performing distance comparison(s) on the authentication vectors and the vectors in the distance store for the most likely match (e.g., greatest probability of match under the threshold).
-73-According to another aspect, generating accurate (e.g., greater than 90%
accuracy in example executions described below) identification is only a part of a complete authentication system. In various embodiments, identification is coupled with liveness testing to ensure that authentication credential inputs are not, for example, being recorded and replayed for verification or faked in another manner. For example, the system 2504 can include a liveness component 2518. According to one embodiment, the liveness component can be configured to generate a random set of biometric instances that the system requests a user submit. The random set of biometric instances can serve multiple purposes. For example, the biometric instances provide a biometric input that can be used for identification, and can also be used for liveness (e.g., validate matching to random selected instances). If both tests are valid, the system can provide an authentication indication or provide access or execution of a requested function. Further embodiments can require multiple types of biometric input for identification, and couple identification with liveness validation. In yet other embodiments, liveness testing can span multiple biometric inputs as well.
According to one embodiment, the liveness component 2518 is configured to generate a random set of words that provide a threshold period of voice data from a user requesting authentication. In one example, the system is configured to require a five second voice signal for processing, and the system can be configured to select the random biometric instances accordingly. Other thresholds can be used (e.g., one, two, three, four, six, seven, eight, nine seconds or fractions thereof, among other examples), each having respective random selections that are associated with a threshold period of input.
According to other embodiments, liveness validation can be the accumulation of a variety of many authentication dimensions (e.g., biometric and/or behavioral dimensions). For example, the system can be configured to test a set of authentication credentials to determine liveness. In another example, the system can build a confidence score reflecting a level of assurance certain inputs are "live" or not faked. According to various embodiments, instead of using just one measure (e.g., voice) to test liveness, the system is configured to manage an ensemble model of many dimensions. As an example, the system can be configured to read a sentence from the screen (to prove he/she is alive) -- but by using user behavior analytics ("UBA") the system can validate on an infinite number of additional metrics (additional dimensions) to determine a liveness score. In further embodiments, each factor being analyzed is also contributing to the user's identity score, too.
Various embodiments of the system are configured to handle multiple different behavioral inputs including, for example, health profiles that are based at least in part on health
-74-readings from health sensors (e.g., heart rate, blood pressure, EEG signals, body mass scans, genome, etc.), and can, in some examples, include behavioral biometric capture/processing.
Once processed through a generation network as discussed herein, such UBA data becomes private such that no user actions or behaviors are ever transmitted across the internet in plain form.
According to various aspects, system is configured to manage liveness determinations based on an ensemble of models. In some embodiments, the system uses a behavioral biometric model to get an identity. In various embodiments, the system is configured to bifurcate processing in the following ways - any one test is a valid liveness measure and all the tests together make for a higher measure of confidence the system has accurately determined the user's identity. In further aspects, each test of liveness provides a certain level of confidence a user is being properly identified, and each additional test of liveness increases that level of confidence, in essence stepping up the strength of the identification. Some embodiments can require different levels of authentication confidence to permit various actions ¨ and more secure or risky actions can require ever increasing confidence thresholds.
According to further embodiments, the system (e.g. 2504) can be configured to incorporate new identification classes responsive to receiving new biometric information. In one embodiment, the system 2504 includes a retraining component configured to monitor a number of new biometrics (e.g., per user/identification class or by a total number of new biometrics) and automatically trigger a re-enrollment with the new feature vectors derived from the new biometric information (e.g., produced by 2512). In other embodiments, the system can be configured to trigger re-enrollment on new feature vectors based on time or time period elapsing.
The system 2504 and/or retraining component 2516 can be configured to store feature vectors as they are processed, and retain those feature vectors for retraining (including for example feature vectors that are unknown to retrain an unknown class in some examples).
Various embodiments of the system are configured to incrementally retrain the classification model (e.g., classifier component 2514 and/or a DNN) on system assigned numbers of newly received biometrics. Further, once a system set number of incremental re-trainings have occurred the system is further configured to complete a full retrain of the model.
According to various aspects, the incremental retrain execution avoids the conventional approach of fully retraining a neural network to recognize new classes and generate new identifications and/or to incorporate new feature vectors as they are input.
Incremental re-
-75-training of an existing model to include a new identification without requiring a full retraining provides significant execution efficiency benefits over conventional approaches.
According to various embodiments, the variables for incremental retraining and full retraining can be set on the system via an administrative function. Some defaults include incremental retrain every 3, 4, 5, 6, etc., identifications, and full retrain every 3, 4, 5, 6, 7, 8, 9, 10, etc., incremental retrains. Additionally, this requirement may be met by using calendar time, such as retraining once a year. These operations can be performed on offline (e.g., locked) copies of the model, and once complete, the offline copy can be made live.
Additionally, the system 2504 and/or retraining component 2516 is configured to update the existing classification model with new users/identification classes. According to various embodiments, the system builds a classification model for an initial number of users, which can be based on an expected initial enrollment. The model is generated with empty or unallocated spaces to accommodate new users. For example, a fifty user base is generated as a one hundred user model. This over allocation in the model enables incremental training to be executed and incorporated, for example, new classes without requiring fully retraining the classification model. When a new user is added, the system is and/or retraining component 2516 is configured to incrementally retrain the classification model ¨
ultimately saving significant computation time over convention retraining executions. Once the over allocation is exhausted (e.g., all identification classes) a full retrain with an additional over allocation can be made (e.g., fully retrain the 100 classes to a model with 150 classes). In other embodiments, an incremental retrain process can be executed to add additional unallocated slots.
Even with the reduced time retraining, the system can be configured to operate with multiple copies of the classification model. One copy may be live that is used for authentication or identification. A second copy may be an update version, that is taken offline (e.g., locked from access) to accomplish retraining while permitting identification operations to continue with a live model. Once retraining is accomplished, the updated model can be made live and the other model locked and updated as well. Multiple instances of both live and locked models can be used to increase concurrency.
According to some embodiments, the system 2500 can receive feature vectors instead of original biometrics and processing original biometrics can occur on different systems ¨ in these cases system 2500 may not include, for example, 2508, 2510, 2512, and instead receive feature vectors from other systems, components or processes.
Example Liveness Execution And Considerations
-76-According to one aspect, in establishing identity and authentication an authentication system is configured to determine if the source presenting the features is, in fact, a live source.
In conventional password systems, there is no check for liveliness. A typical example of a conventional approach includes a browser where the user fills in the fields for username and password or saved information is pre-filled in a form on behalf of the user.
The browser is not a live feature, rather the entry of the password is pulled from the browser' form history and essentially replayed. This is an example of replay, and according to another aspect, presents many challenges where biometric input could be copied and replayed.
The inventors have realized that biometrics have the potential to increase security and convenience simultaneously. However, there are many issues associated with such implementation, including, for example, liveness. Some conventional approaches have attempted to introduce biometrics ¨ applying the browser example above, an approach can replace authentication information with an image of a person's face or a video of the face. In such conventional systems that do not employ liveness checks, these conventional systems may be compromised by using a stored image of the face or stored video and replaying for authentication.
The inventors have realized that use of biometrics (e.g., such as face, voice or fingerprint, etc.) include the consequence of the biometric potentially being offered in non-live forms, and thus allowing a replayed biometric to be an offering of a plausible to the system.
Without liveness, the plausible will likely be accepted. The inventors have further realized that to determine if a biometric is live is an increasingly difficult problem.
Examined are some approaches for resolving the liveness problem ¨ which are treated broadly as two classes of liveness approaches (e.g., liveness may be subdivided into active liveness and passive liveness problem domains). Active liveness requires the user to do something to prove the biometric is not a replica. Passive liveness makes no such requirement to the user and the system alone must prove the biometric is not a replica. Various embodiments and examples are directed to active liveness validation (e.g., random words supplied by a user), however, further examples can be applied in a passive context (e.g., system triggered video capture during input of biometric information, ambient sound validation, etc.). Table X (Figs 26A-B) illustrates example implementation that may be employed, and includes analysis of potential issues for various interactions of the example approaches. In some embodiments, various ones of the examples in Table X can be combined to reduce inefficiencies (e.g., potential vulnerabilities) in the implementation. Although some issues are present in the various comparative
-77-embodiments, the implementation can be used, for example, where the potential for the identified replay attacks can be minimized or reduced.
According to one embodiment, randomly requested biometric instances in conjunction with identity validation on the same random biometric instances provides a high level of assurance of both identity and liveness. In one example (Row 8), the random biometric instances include a set of random words selected for liveness validation in conjunction with voice based identification.
According to one embodiment, an authentication system, assesses liveness by asking the user to read a few random words or a random sentence. This can be done in various embodiments, via execution of process 2900, Fig. 27. According to various embodiments, process 2900 can being at 2902 with a request to a user to supply a set of random biometric instances. Process 2900 continues with concurrent (or, for example, simultaneous) authentication functions ¨ identity and liveness at 2904. For example, an authentication system can concurrently or simultaneously process the received voice signal through two algorithms (e.g., liveness algorithm and identity algorithm (e.g., by executing 2904 of process 2900), returning a result in less than one second. The first algorithm (e.g., liveness) performs a speech to text function to compare the pronounced text to the requested text (e.g., random words) to verify that the words were read correctly, and the second algorithm uses a prediction function (e.g., a prediction application programming interface (API)) to perform a one-to-many (1:N) identification on a private voice biometric to ensure that the input correctly identifies the expected person. At 2908, for example, process 2900 can return an authentication value for identified and live inputs 2906 YES. If either check fails 2906 NO, process 2900 can return an invalid indicator at 2910 or alter a confidence score associated with authentication.
Further embodiments implement multiple biometric factor identification with liveness to improve security and convenience. In one example, a first factor, face (e.g., image capture), is used to establish identity. In another example, the second factor, voice (e.g., via random set of words), is used to confirm identity, and establish authentication with the further benefit of confirming (or not) that the source presenting the biometric input is live. In yet other embodiments, the system can implement comprehensive models of liveness validation that span multiple authentication credentials (e.g., biometric and/or behavioral instances).
Various embodiments of private biometric systems are configured to execute liveness.
The system generates random text that is selected to take roughly 5 seconds to speak (in whatever language the user prefers ¨ and with other example threshold minimum periods). The user reads the text and the system (e.g., implemented as a private biometrics cloud service or
-78-component) then captures the audio and performs a speech to text process, comparing the pronounced text to the requested text. The system allows, for example, a private biometric component to assert the liveness of the requestor for authentication. In conjunction with liveness, the system compares the random text voice input and performs an identity assertion on the same input to ensure the voice that spoke the random words matches the user's identity.
For example, input audio is now used for liveness and identity.
In other embodiments, liveness is determined based on multiple dimensions. For example, the system can be configured to handle multiple different behavioral biometric inputs including even health profiles that are based at least in part on health readings from health sensors (e.g., heart rate, blood pressure, EEG signals, body mass scans, genome, etc.), and can, in some examples, include behavioral biometric capture/processing. Once processed through a generation neural network such UBA data becomes private such that no user actions or behaviors are ever transmitted across the internet ¨ rather the encrypted form output by the generation network is used.
According to one embodiment, the solution for liveness uses an ensemble of models.
The system can initially use a behavioral biometric model to establish an identity ¨ on authentication the system can use any one test of dimensions in model to determine a valid liveness measure. Based on an action being requested and/or confidence thresholds established for that action, the system can be configured to test additional dimensions until the threshold is satisfied.
An example flow for multiple dimension liveness testing can include any one or more of the following steps:
1. gather plaintext behavioral biometric input (e.g. face, fingerprint, voice, UBA) and use data as input for the first DNN to generate encrypted embeddings 2. A second DNN (a classifier network) classifies the encrypted embeddings from (1) and returns an identity score (or put another way, the system gathers an original behavioral biometric identity via a prediction after transmitting the embedding.
3. One example test of liveness can be executed with spoken random liveness sentence to make sure the person making the request is active (alive). If the user's spoken words match the requested words (above a predetermined threshold) the system established a liveness dimension.
4. The same audio from Step #1 is employed by the system to predict an identity. If the identity from Step #1 and Step #3 are the same, we have another liveness dimension.
-79-5. The system can then also use private UBA to determine identity and liveness. For example, current actions are input to Private UBA (Step #1) and to return an identity and a probability that the measurements reflect that identity. If the behavior identity is the same as the previous identity, we have an additional liveness dimension.
Example executions can include the following: acquire accelerometer and gyroscope data to determine if the user is holding the phone in the usual manner;
acquire finger tapping data to determine if the user is touching the phone in the expected manner;
and/or acquire optical heart sensor data from a watch to determine if the user's heart is beating in the expected manner.
Table XI describes various example behavioral instances that can be used as input to a generation network to output distance measurable encrypted versions of the input.
-80-TABLE XI
Hum an behavioral biometrics Machine behavioral biometrics Fincw,print. Mous Pro xi ty I s Time GPS
Network. Access Latency, f-ace WiF
Pa c.iets Voice Geolocation uekooth Palm Fingerprint Sens or toothtY Fie 8 C.:0 ns Cloth in o Cam era - Faces Magnetic Field 'vascular scans C;.atri era Avg Light .. LinearPceeration Tim e history Microphone/ Audio Gravity Cheek /ear A.idio Magnitude Orientation Skin color / features Touch sensor Pedometer Hair style color Tom perature -A.mbient Screen state Board I moustache Ac.c.telerotri eter Log messages Eye rn ovem ent (Eye Tracking) Dsr./i ce access App Usage Heart beat App access And rai d Configuration Gait Ciouct access Browsing history Android Apps with 0 m s Gestures Credit card paym ents Lis age Behavior Pr ent ethods GALAXY WATCH
i-Dsychological Healtt rn onitoring ME M S Atc oe I e rotri star Co ntextue.1 behavior S iM card MEMS Gyroscope=
Finger tapping Gyroscope MEMS Barom star Electro-optical sensor (for Location Meg netorri star heart rate rnoniiering Photodetector (for am bient Posture Watc:n Accei (Tom star Watch Corn pass :APPLE WATCH
Location (quick.) GPS & ::;LOSNASS
Phone State (App status.
b attery s tate , .s/vi Fi Optical heart sensor availability, on the phone, ti n.:. a...of-0 ay) f.rnAron: Air pressure, ECG /f.:.:KG (Electrical Hum idity. Temperature heart sensor) Accei ero m star Gyroscope Am bort Ll0ht Sense :-According to various aspects, the system can be configured to evaluate liveness as an ensemble model of many dimensions, in addition to embodiments that evaluate single liveness measures (e.g., voice).
-81-Thus, any confidence measure can be obtained using UBA, by evaluating a nearly infinite number of additional metrics (additional dimensions) to the liveness score. And, as described in the example steps 1-5, each UBA factor can also contribute a system generated identity score, as well.
Stated broadly, multi-dimension liveness can include one or more of the following operations: 1) a set of plaintext UBA input points are acquired as input data to a model; 2) the first DNN (e.g., a generation network tailored the UBA input points) generates encrypted embeddings based on the plaintext input and the system operates on the embeddings such that the actual user behavior data is never transmitted. For example, the encrypted behavioral embeddings have no correlation to any user action nor can any user action data be inferred from the embeddings; and 3) the behavioral embeddings are sent for processing (e.g., from a mobile device to a server) to generate a liveness measure as a probability through a second DNN
(second network or classification network/model).
Example Technical Models for UBA (e.g., Generation Network) Various neural networks can be used to accept plaintext behavioral information as input and output distance measurable encrypted feature vectors. According to one example, the first neural network (i.e., the generation neural network) can be architected as a Long Short-Term Memory (LSTM) model which is a type of Recurrent Neural Network (RNN). In various embodiments, the system is configured to invoke these models to process UBA, which is a time series data. In other embodiments, different first or generation networks can be used to create distance measurable encrypted embeddings from behavioral inputs. For example, the system can use a Temporal Convolutional Networks (TCNs) as the model to process behavioral information, and in another example, a Gated Recurrent Unit Networks (GRUs) as the model.
According to some embodiments, once the first network generates distance measurable embeddings, a second network can be trained to classify on the embeddings and return an identification label or unknown result. For example, the second DNN (e.g., classification network) can be a fully connected neural network ("FCNN"), or commonly called a feed forward neural network ("FFNN"). In various embodiments, the system is configured to implement this type of model, to facilitate processing of attribute data, as opposed to image or binary data.
According to some embodiments, the second DNN model used for classifying is a FCNN which outputs classes and probabilities. In this setting, the feature vectors are used by the classifier component to bind a user's behavioral biometrics to a classification (i.e., mapping
-82-behavioral biometrics to a matchable/searchable identity). According to one embodiment, the deep learning neural network (e.g., enrollment and prediction network) can be executed by the system as a RNN trained on enrollment data. For example, the RNN is configured to generate an output identifying a person or indicating an UNKNOWN individual. In various embodiments, the second network (e.g., classification network which can be a deep learning neural network (e.g., an RNN)) is configured to differentiate between known persons and UNKNOWN.
According to another embodiment, the system can implement this functionality as a sigmoid function in the last layer that outputs probability of class matching based on newly input behavioral biometrics or showing failure to match. In further examples, the system can be configured to achieve matching based on one or more hinge loss functions.
As discussed, the system and/or classifier component are configured to generate a probability to establish when a sufficiently close match is found. In one example, an "unknown" person is determined responsive to negative return values being generated by the classifier network. In further example, multiple matches on a variety of authentication credentials can be developed and voting can also be used based on the identification results of each to increase accuracy in matching.
According to various embodiments, the authentication system is configured to test liveness and test behavioral biometric identity using fully encrypted reference behavioral biometrics. For example, the system is configured to execute comparisons directly on the encrypted behavioral biometrics (e.g., encrypted feature vectors of the behavioral biometric or encrypted embeddings derived from unencrypted behavioral information) to determine authenticity with a learning neural network. In further embodiments, a first neural network is used to process unencrypted behavioral biometric inputs and generate distance or Euclidean measurable encrypted feature vectors or encrypted embeddings (e.g., distance measurable encrypted values ¨ referred to as a generation network). The encrypted feature vectors are used to train a classification neural network. Multiple learning networks (e.g., deep neural networks ¨ which can be referred to as classification networks) can be trained and used to predict matches on different types of authentication credential (e.g. behavioral biometric input (e.g., facial/feature behavioral biometrics, voice behavioral biometrics, health/biologic data behavioral biometrics, etc.). In some examples, multiple behavioral biometric types can be processed into an authentication system to increase accuracy of identification.
-83-Various embodiments of the system can incorporate liveness, multi-dimensional liveness and various confidence thresholds for validation. A variety of processes can be executed to support such operation.
Fig. 28 is an example process flow 3000 for executing identification and liveness validation. Process 3000 can be executed by an authentication system (e.g., 2704, Fig. 25 or 2304, Fig. 16). According to one embodiment, process 3000 begins with generation of a set of random biometric instances (e.g., set of random words) and triggering a request for the set of random words at 3002. In various embodiments, process 3000 continues under multiple threads of operation. At 3004, a first biometric type can be used for a first identification of a user in a first thread (e.g., based on images captured of a user during input of the random words). Identification of the first biometric input (e.g., facial identification) can proceed as discussed herein (e.g., process unencrypted biometric input with a first neural network to output encrypted feature vectors, predict a match on the encrypted feature vectors with a DNN, and return an identification or unknown and/or use a first phase for distance evaluation), and as described in, for example, process 2200 and/or process 2250 below. At 3005, an identity corresponding to the first biometric or an unknown class is returned. At 3006, a second biometric type can be used for a second identification of a user in a second thread. For example, the second identification can be based upon a voice biometric. According to one embodiment, processing of a voice biometric can continue at 3008 with capture of at least a threshold amount of the biometric (e.g., 5 second of voice). In some examples, the amount of voice data used for identification can be reduced at 3030 with biometric pre-processing. In one embodiment, voice data can be reduced with execution of pulse code modulation. Various approaches for processing voice data can be applied, including pulse code modulation, amplitude modulation, etc., to convert input voice to a common format for processing. Some example functions that can be applied (e.g., as part of 3030) include Librosa (e.g., to eliminate background sound, normalize amplitude, etc.); pydub (e.g., to convert between mp3 and .wav formats); Librosa (e.g., for phase shift function); Scipy (e.g. to increase low frequency);
Librosa (e.g., for pulse code modulation); and/or soundfile (e.g., for read and write sound file operations).
In various embodiments, processed voice data is converted to the frequency domain via a fourier transform (e.g., fast fourier transform, discrete fourier transform, etc.) which can be provided by numpy or scipy libraries. Once in the frequency domain, the two dimensional frequency array can be used to generate encrypted feature vectors.
In some embodiments, voice data is input to a pre-trained neural network to generate encrypted voice feature vectors at 3012. In one example, the frequency arrays are used as input
-84-to a pre-trained convolutional neural network ("CNN") which outputs encrypted voice feature vectors. In other embodiments, different pre-trained neural networks can be used to output encrypted voice feature vectors from unencrypted voice input. As discussed throughout, the function of the pre-trained neural network is to output distance measurable encrypted feature vectors upon voice data input. Once encrypted feature vectors are generated at 3012, the unencrypted voice data can be deleted. Some embodiments receive encrypted feature vectors for processing rather than generate them from unencrypted voice directly, in such embodiments there is no unencrypted voice to delete.
In one example, a CNN is constructed with the goal of creating embeddings and not for its conventional purpose of classifying inputs. In further example, the CNN
can employ a triple loss function (including, for example, a hard triple loss function), which enables the CNN to converge more quickly and accurately during training than some other implementations. In further examples, the CNN is trained on hundreds or thousands of voice inputs.
Once trained, the CNN is configured for creation of embeddings (e.g., encrypted feature vectors). In one example, the CNN accepts a two dimensional array of frequencies as an input and provides floating point numbers (e.g., 32, 64, 128, 256, 3028, ... floating point numbers) as output.
In some executions of process 3000, the initial voice capture and processing (e.g., request for random words - 3002 - 3012) can be executed on a user device (e.g., a mobile phone) and the resulting encrypted voice feature vector can be communicated to a remote service via an authentication API hosted and executed on cloud resources. In some other executions, the initial processing and prediction operations can be executed on the user device as well. Various execution architectures can be provided, including fully local authentication, fully remote authentication, and hybridization of both options.
In one embodiment, process 3000 continues with communication of the voice feature vectors to a cloud service (e.g., authentication API) at 3014. The voice feature vectors can then be processed by a fully connected neural network ("FCNN") for predicting a match to enrolled feature vectors and returning a trained label at 3016. As discussed, the input to the FCNN is an embedding generated by a first pre-trained neural network (e.g., an embedding comprising 32, 64, 128, 256, 1028, etc. floating point numbers). Prior to execution of process 3000, the FCNN is trained with a threshold number of people for identification (e.g., 500, 750, 1000, 1250, 1500 ... etc.). The initial training can be referred to as "priming" the FCNN. The priming function is executed to improve accuracy of prediction operations performed by the FCNN.
-85-At 3018, the FCNN returns a result matching a label or an unknown class ¨
i.e., matches to an identity from among a group of candidates or does not match to a known identity. The result is communicated for evaluation of each threads' result at 3022.
According to various embodiments, the third thread of operation is executed to .. determine that the input biometrics used for identification are live (i.e., not spoofed, recorded, or replayed). For example, at 3020 the voice input is processed to determine if the input words matches the set of random words requested. In one embodiment, a speech recognition function is executed to determine the words input, and matching is executed against the randomly requested words to determine an accuracy of the match. If any unencrypted voice input remains in memory, the unencrypted voice data can be deleted as part of 3020. In various embodiments, processing of the third thread, can be executed locally on a device requesting authorization, on a remote server, a cloud resource, or any combination. If remote processing is executed, a recording of the voice input can be communicated to a server or cloud resource as part of 3020, and the accuracy of the match (e.g., input to random words) determined remotely. Any unencrypted voice data can be deleted once encrypted feature vectors are generated and/or once matching accuracy is determined.
In further embodiments, the results of each thread is joined to yield an authorization or invalidation. At 3024, the first thread returns an identity or unknown for the first biometric, the second thread returns an identity or unknown for the second biometric, and the third thread an accuracy of match between a random set of biometric instances and input biometric instances. At 3024, process 3000 provides a positive authentication indication wherein first thread identity matches the second thread identity and one of the biometric inputs is determined to be live (e.g., above a threshold accuracy (e.g., 33% or greater among other options). If not positive, process 3000 can be re-executed (e.g., a threshold number of times) or a denial can be communicated.
According to various embodiments, process 3000 can include concurrent, branched, and/or simultaneous execution of the authentication threads to return a positive authentication or a denial. In further embodiments, process 3000 can be reduced to a single biometric type such that one identification thread and one liveness thread is executed to return a positive .. authentication or a denial. In further embodiments, the various steps described can be executed together or in different order, and may invoke other processes (e.g., to generate encrypted feature vectors to process for prediction) as part of determining identity and liveness of biometric input. In yet other embodiments, additional biometric types can be tested to confirm identity, with at least one liveness test on one of the biometric inputs to provide assurance that
-86-submitted biometrics are not replayed or spoofed. In further example, multiple biometrics types can be used for identity and multiple biometric types can be used for liveness validation.
Example Authentication System With Liveness In some embodiments, an authentication system interacts with any application or system needing authentication service (e.g., a Private Biometrics Web Service). According to one embodiment, the system uses private voice biometrics to identify individuals in a datastore (and provides one to many (1:N) identification) using any language in one second. Various neural networks measure the signals inside of a voice sample with high accuracy and thus allow private biometrics to replace "username" (or other authentication schemes) and become the primary authentication vehicle.
In some examples, the system employs face (e.g., images of the user's face) as the first biometric and voice as the second biometric type, providing for at least two factor authentication ("2FA"). In various implementation, the system employs voice for identity and liveness as the voice biometric can be captured with the capture of a face biometric. Similar biometric pairings can be executed to provide a first biometric identification, a second biometric identification for confirmation, coupled with a liveness validation.
In some embodiments, an individual wishing to authenticate is asked to read a few words while looking into a camera and the system is configured to collect the face biometric and voice biometric while the user is speaking. According to various examples, the same audio that created the voice biometric is used (along with the text the user was requested to read) to check liveness and to ensure the identity of the user's voice matches the face.
Such authentication can be configured to augment security in a wide range of environments. For example, private biometrics (e.g., voice, face, health measurements, etc.) can be used for common identity applications (e.g., "who is on the phone?") and single factor authentication (1FA) by call centers, phone, watch and TV apps, physical security devices (door locks), and other situations where a camera is unavailable.
Additionally, where additional biometrics can be captured 2FA or better can provide greater assurance of identity with the liveness validation.
Broadly stated, various aspects implement similar approaches for privacy-preserving encryption for processed biometrics (including, for example, face and voice biometrics).
Generally stated, after collecting an unencrypted biometric (e.g., voice biometric), the system creates a private biometric (e.g., encrypted feature vectors) and then discards the original unencrypted biometric template. As discussed herein, these private biometrics enable an authentication system and/or process to identify a person (i.e., authenticate a person) while still
-87-guaranteeing individual privacy and fundamental human rights by only operating on biometric data in the encrypted space.
To transform the unencrypted voice biometric into a private biometric, various embodiments are configured to pre-process the voice signal and reduce the voice data to a smaller form (e.g., for example, without any loss). The Nyquist sampling rate for this example is two times the frequency of the signal. In various implementations, the system is configured to sample the resulting data and use this sample as input to a Fourier transform. In one example, the resulting frequencies are used as input to a pre-trained voice neural network capable of returning a set of embeddings (e.g., encrypted voice feature vectors). These embeddings, for example, sixty four floating point numbers, provide the system with private biometrics which then serve as input to a second neural network for classification.
Example Validation Augmentation Fig. 29 is an example process flow 3100 for validating an output of a classification network. According to some embodiments, a classification network can accept encrypted authentication credentials as an input and return a match or unknown result based on analyzing the encrypted authentication credential. According to one embodiment, process 3100 can be executed responsive to generation of an output by the classification network.
For example, at 3102 a classification output is tested. At 3104, the testing determines if any of the output values meet or exceed a threshold for determining a match. If yes (e.g., 3104 YES), the matching result is returned at 3106.
If the threshold is not met, 3104 NO, process 3100 continues at 3108.
According to one embodiment, a reference encrypted credential associated with the closest matches determined by the classification network can be retrieved at 3108. Although the probability of the match main be too low to return an authentication or identification result, the highest probability matches can be used to retrieve stored encrypted authentication credentials for those matches or the highest probability match. At 3110 the retrieved credentials can be compared to the input that was processed by the classification network (e.g. a new encrypted authentication credential).
According to one example, the comparison at 3110 can include a distance evaluation between the input authentication credential and the reference authentication credentials associated with known labels/entities. If the distance evaluation meets a threshold, 3112 YES, process 3100 continues at 3116 and returns a match to the known label/entity. If the threshold is not met, 3112 NO, then process 3100 continues at 3114 with a return of no match. Post classification
-88-validation can be used in cases where a threshold probability is not met, as well as case where a threshold is satisfied (e.g., to confirm a high probability match), among other options.
The terms "program" or "software" are used herein in a generic sense to refer to any type of computer code or set of processor-executable instructions that can be employed to program a computer or other processor to implement various aspects of embodiments as discussed above. Additionally, it should be appreciated that according to one aspect, one or more computer programs that when executed perform methods of the disclosure provided herein need not reside on a single computer or processor, but may be distributed in a modular fashion among different computers or processors to implement various aspects of the disclosure provided herein.
As described herein "authentication system" includes systems that can be used for authentication as well as systems that be used for identification. Various embodiments describe helper network that can be used to improve operation in either context. The various functions, processes, and algorithms can be executed in the context of identifying an entity and/or in the context of authenticating an entity.
Processor-executable instructions may be in many forms, such as program modules, executed by one or more computers or other devices. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. Typically, the functionality of the program modules may be combined or distributed as desired in various embodiments.
Also, data structures may be stored in one or more non-transitory computer-readable storage media in any suitable form. For simplicity of illustration, data structures may be shown to have fields that are related through location in the data structure.
Such relationships may likewise be achieved by assigning storage for the fields with locations in a non-transitory computer-readable medium that convey relationship between the fields. However, any suitable mechanism may be used to establish relationships among information in fields of a data structure, including through the use of pointers, tags or other mechanisms that establish relationships among data elements.
Also, various inventive concepts may be embodied as one or more processes, of which examples (e.g., the processes described with reference to Figs. 4-7, 9-11, etc.) have been provided. The acts performed as part of each process may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
-89-All definitions, as defined and used herein, should be understood to control over dictionary definitions, and/or ordinary meanings of the defined terms. As used herein in the specification and in the claims, the phrase "at least one," in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, "at least one of A and B"
(or, equivalently, "at least one of A or B," or, equivalently "at least one of A and/or B") can refer, in one embodiment, to at least one, optionally including more than one, A, with no B
present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
The phrase "and/or," as used herein in the specification and in the claims, should be understood to mean "either or both" of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases.
Multiple elements listed with "and/or" should be construed in the same fashion, i.e., "one or more" of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the "and/or" clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to "A
and/or B", when used in conjunction with open-ended language such as "comprising" can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A
and B (optionally including other elements); etc.
Use of ordinal terms such as "first," "second," "third," etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed. Such terms are used merely as labels to distinguish one claim element having a certain name from another element having a same name (but for use of the ordinal term).
-90-The phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of "including," "comprising,"
"having,"
"containing", "involving", and variations thereof, is meant to encompass the items listed thereafter and additional items.
Having described several embodiments of the techniques described herein in detail, various modifications, and improvements will readily occur to those skilled in the art. Such modifications and improvements are intended to be within the spirit and scope of the disclosure. Accordingly, the foregoing description is by way of example only, and is not intended as limiting. The techniques are limited only as defined by the following claims and the equivalents thereto.
-91-

Claims (24)

PCT/US2021/045745
1. A system for managing privacy-enabled identification or authentication, the system comprising:
at least one processor operatively connected to a memory;
an identification data gateway, executed by the at least one processor, configured to filter invalid identification information from subsequent verification, enrollment, identification, or authentication functions, the identification data gateway comprising at least:
a first pre-trained validation helper network associated with identification information of a first type, wherein the first pre-trained validation helper network is configured to:
evaluate an identification instance of the first type, responsive to input of the identification instance of the first type to the first pre-trained validation helper network, wherein the first pre-trained validation helper network is pre-trained on evaluation criteria that is independent of a subject of the identification instance seeking to be verified, enrolled, identified, or authenticated:
responsive to a determination that the identification instance meets the evaluation criteria, validate the identification instance for use in subsequent verification, enrollment, identification, or authentication;
responsive to a determination that the identification instance fails the evaluation criteria, reject the unknown information instance for use in subsequent verification, enrollment, identification, or authentication; and generate at least a binary evaluation of the identification information instance based on the determination of the evaluation criteria, wherein the at least the binary evaluation includes generation of an output probability by the first pre-trained validation helper network that the identification instance is valid or invalid.
2. The system of claim 1, wherein the identification data gateway is configured to filter bad audio data from use in subsequent processing.
3. The system of claim 2, wherein the identification data gateway is configured to accept audio data input and validate the audio input for use in transcription.
4. The system of claim 1, wherein the first pre-trained validation helper network is trained on presence data, and configured to determine the presence of a target to be evaluated.
5. The system of claim 3, wherein the first pre-trained validation helper network is configured to validate the presence data independent of the subject seeking to be enrolled, identified, or authenticated.
6. The system of claim 1, wherein the authentication data gateway further comprises a plurality of validation helper networks each associated with a respective type of identification information, wherein each of the plurality of validation helper networks generate a binary evaluation of respective identification inputs to establish validity, wherein at least a plurality of the validation helper networks are configured to validate respective identification information independent of the subject seeking to be enrolled, identified, or authenticated .
7. The system of claim 1, wherein the first pre-trained validation helper network is configured process an image as identification information, and output a probability that the subject is wearing a mask.
8. The system of claim 7, wherein the first pre-trained validation helper network is configured to determine the mask is being worn properly by the subject.
9. The system of claim 7, wherein the first pre-trained validation helper network is configured to determine the mask is being worn properly by the subject irrespective of the subject to be identified.
10. The system of claim 1, wherein the first pre-trained validation helper network is configured to process location associated input as identification information, and output a probability that the location associated input is invalid.
11. The system of claim 1, wherein the identification data gateway further comprises a first pre-trained geometry helper network configured to:
process identification information of the first type, accept as input unencrypted identification information of the fist type, and output processed identification information of the first type.
12. The system of claim 11, wherein the first pre-trained validation helper network is paired with the geometry helper network, and further configured to:
accept the output of the geometry helper neural network, and validate the input identification information of the first type or reject the identification information of the first type.
13. The system of claim 1, wherein the first pre-trained validation helper network is configured to process an image input as identification information, and output a probability that the image input is a presentation attack.
14. The system of claim 5, wherein the first pre-trained validation helper network is configured to process a video input as identification information, and output a probability that the video input is a presentation attack.
15. A computer implemented method for managing privacy-enabled identification or authentication, the system comprising:
filter, by at least one processor, invalid identification information from subsequent verification, enrollment, identification, or authentication functions, wherein the act of filtering includes:
executing, by at least one processor, a first pre-trained validation helper network associated with identification information of a first type;
evaluating, by the first pre-trained validation helper network, an identification instance of the first type, responsive to input of the identification instance of the first type to the first pre-trained validation helper network, wherein the first pre-trained validation helper network is pre-trained on evaluation criteria that is independent of a subject of the identification instance seeking to be verified, enrolled, identified, or authenticated;
validating, by the at least one processor, the identification instance for use in subsequent verification, enrollment, identification, or authentication, in response to determining that the identification instance meets the evaluation criteria;

rejecting, by the at least one processor, the unknown information instance for use in subsequent verification, enrollment, identification, or authentication responsive to determining that the identification instance fails the evaluation criteria;
and generating, by the at least one processor, at least a binary evaluation of the identification instance based on the determination of the evaluation criteria, wherein the at least the binary evaluation includes generation of an output probability by the first pre-trained validation helper network that the identification instance is valid or invalid.
16. The method of claim 15, wherein the act of filtering includes an act of filtering bad audio data from use in subsequent processing.
17. The method of claim 16, wherein the method further comprises accepting audio data input and validating the audio input for use in transcription.
18. The method of claim 15, wherein the first pre-trained validation helper network is trained on presence data, and the method further comprises determining the presence of a valid target to be evaluated.
19. The method of claim 18, wherein the method further comprises validating the presence data independent of the subject seeking to be verified, enrolled, identified, or authenticated.
20. The method of claim 15, wherein the method further comprises:
executing a plurality of validation helper networks each associated with a respective type of identification information, wherein each of the plurality of validation helper networks generates at least a binary evaluation of respective identification inputs to establish validity;
and validating respective identification information independent of the subject seeking to be verified, enrolled, identified, or authenticated.
21. The method of claim 15, wherein the first pre-trained validation helper network is configured process an image as identification information, and the method further comprises an act of outputting a probability that the subject is wearing a mask.
22. The method of claim 21, wherein the method further comprises determining by the first pre-trained validation helper network that the mask is being worn properly by the subject.
23. The method of claim 21, wherein the method further comprises determining by the first pre-trained validation helper network that the mask is being worn properly by the subject irrespective of the subject to be identified.
24. The method of claim 15, wherein method further comprises processing a location associated input as identification information by the first pre-trained validation helper network and generating by the first pre-trained validation helper network a probability that the location associated input is invalid.
CA3191888A 2020-08-14 2021-08-12 Systems and methods for private authentication with helper networks Pending CA3191888A1 (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US16/993,596 2020-08-14
US16/993,596 US10938852B1 (en) 2020-08-14 2020-08-14 Systems and methods for private authentication with helper networks
US17/155,890 2021-01-22
US17/155,890 US11789699B2 (en) 2018-03-07 2021-01-22 Systems and methods for private authentication with helper networks
US17/183,950 2021-02-24
US17/183,950 US11122078B1 (en) 2020-08-14 2021-02-24 Systems and methods for private authentication with helper networks
US17/398,555 2021-08-10
US17/398,555 US11489866B2 (en) 2018-03-07 2021-08-10 Systems and methods for private authentication with helper networks
PCT/US2021/045745 WO2022036097A1 (en) 2020-08-14 2021-08-12 Systems and methods for private authentication with helper networks

Publications (1)

Publication Number Publication Date
CA3191888A1 true CA3191888A1 (en) 2022-02-17

Family

ID=80248171

Family Applications (1)

Application Number Title Priority Date Filing Date
CA3191888A Pending CA3191888A1 (en) 2020-08-14 2021-08-12 Systems and methods for private authentication with helper networks

Country Status (4)

Country Link
EP (1) EP4196890A1 (en)
AU (1) AU2021325073A1 (en)
CA (1) CA3191888A1 (en)
WO (1) WO2022036097A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10270748B2 (en) * 2013-03-22 2019-04-23 Nok Nok Labs, Inc. Advanced authentication techniques and applications
US20180232508A1 (en) * 2017-02-10 2018-08-16 The Trustees Of Columbia University In The City Of New York Learning engines for authentication and autonomous applications
US11494476B2 (en) * 2018-04-12 2022-11-08 Georgia Tech Research Corporation Privacy preserving face-based authentication

Also Published As

Publication number Publication date
WO2022036097A1 (en) 2022-02-17
EP4196890A1 (en) 2023-06-21
AU2021325073A1 (en) 2023-03-16

Similar Documents

Publication Publication Date Title
US11789699B2 (en) Systems and methods for private authentication with helper networks
US11394552B2 (en) Systems and methods for privacy-enabled biometric processing
US20230283476A1 (en) Systems and methods for privacy-enabled biometric processing
US10938852B1 (en) Systems and methods for private authentication with helper networks
US11943364B2 (en) Systems and methods for privacy-enabled biometric processing
US11762967B2 (en) Systems and methods for biometric processing with liveness
US11489866B2 (en) Systems and methods for private authentication with helper networks
US11336643B2 (en) Anonymizing biometric data for use in a security system
Dasgupta et al. Advances in user authentication
US20220147602A1 (en) System and methods for implementing private identity
US20220147607A1 (en) System and methods for implementing private identity
US20220150068A1 (en) System and methods for implementing private identity
US20220277064A1 (en) System and methods for implementing private identity
Giot et al. Keystroke dynamics authentication
Bock Identity Management with Biometrics: Explore the latest innovative solutions to provide secure identification and authentication
CA3154853A1 (en) Systems and methods for privacy-enabled biometric processing
CA3191888A1 (en) Systems and methods for private authentication with helper networks
Wells et al. Privacy and biometrics for smart healthcare systems: attacks, and techniques
Dragerengen Access Control in Critical Infrastructure Control Rooms using Continuous Authentication and Face Recognition
Nguyen A Qualitative Exploratory Research Design Study of Asian American Consumer Acceptance of Biometric Technology
Pandiaraja et al. An Overview of Joint Biometric Identification for Secure Online Voting with Blockchain Technology
CA3188135A1 (en) Systems and methods for enhancing biometric matching accuracy
CA3150735A1 (en) Systems and methods for privacy-enabled biometric processing
Higgins Background Checks.
Sector Biometrics and Standards