GB2570620A

GB2570620A - Verification method and system

Info

Publication number: GB2570620A
Application number: GB1713471.9A
Authority: GB
Inventors: Sheikh Faridul Hasan; Ben Arbia Mohamed
Original assignee: Eyn Ltd
Current assignee: Eyn Ltd
Priority date: 2017-08-22
Filing date: 2017-08-22
Publication date: 2019-08-07
Also published as: GB201713471D0

Abstract

Determining whether a biometric feature of a live human is present is done by using a sensor (106, figure 1) to capture data 180, 190, such as a video of a user’s head, related to a presented biometric feature. A stimulus such as an ultrasound pulse is applied to the presented biometric feature, and the stimulus is capable of causing an involuntary response from a biometric feature of a live human. The data indicates 20888 whether the involuntary response has occurred thereby to determine 210 whether a biometric feature of a live human is present 2102 or whether a falsified feature 2104, such as a photograph has been presented. A further sensor such as a microphone (104, figure 1) may also be used and the identification process may be used to provide access to a smartphone (100, Figure 1).

Description

VERIFICATION METHOD AND SYSTEM

The present invention relates to a method of verifying that (or determining whether) a biometric feature of a live human is present. More particularly, the present invention relates to a method of verifying that (or determining whether) a biometric feature of a live human is present for use as part of a biometric recognition system and/or method. The invention extends to a corresponding apparatus and system.

Biometric authentication and identification systems are used in a variety of applications (including surveillance, access control, gaming and virtual reality, and driver monitoring systems) as a way of determining and verifying the identity of a particular user.

Biometric systems typically involve enrolling an authorised user’s biometric feature(s) (e.g. the user’s face, fingerprints, teeth, or iris) in a database, and, at a later time, automatically matching the authorised user’s biometric feature(s) presented to the system against one or more entries in the database based on a calculated index of similarity.

Such systems may be vulnerable to ‘spoof’ or ‘presentation’ attacks, in which an attacker claims an authorised user’s identity by presenting a falsified biometric feature of the authorised user to the system, for example by use of a mask, a photograph, a video, or a virtual reality representation of the authorised user’s biometric feature. This may mean that otherwise accurate biometric systems suffer from security risks.

Existing techniques for mitigating the risks of presentation attacks often require the cooperation and/or knowledge of the user/attacker (as in the case of ‘challengeresponse’ tests), which, once an attacker has knowledge of the required response, may be relatively easily overcome (i.e. any system incorporating the known techniques may be easy to ‘spoof’). Furthermore, many existing techniques require specialised hardware, which may reduce the utility of such techniques.

Aspects and embodiments of the present invention are set out in the appended claims. These and other aspects and embodiments ofthe invention are also described herein.

According to at least one aspect described herein, there is provided a method for determining whether a biometric feature of a live human is present, comprising: using a sensor, capturing data related to a presented biometric feature; applying a stimulus to the presented biometric feature, wherein the stimulus is capable of causing an involuntary response from a biometric feature of a live human; and determining whether the data indicates that the involuntary response has occurred thereby to determine whether a biometric feature of a live human is present.

By determining whether an involuntary response has occurred, a live biometric feature can be determined to be present without the need for any active (i.e. voluntary) response from a user.

Applying a stimulus may comprise transmitting a signal, wherein the signal may be adapted to cause an involuntary response from a biometric feature of a live human. Optionally, transmitting a signal comprises transmitting a signal in accordance with a predetermined pattern. Determining whether the data indicates that the involuntary response has occurred may comprise determining whether a pattern corresponding to the predetermined pattern is present in the captured data. The pattern may be formed from at least one pulse and at least one pause, and may be selected from a plurality of patterns. Optionally, the pattern is randomly selected.

The transmitted signal may be detected using a further sensor. Determining whether the data indicates that the involuntary response has occurred may comprise comparing data from the sensor and the further sensor and/or applying a time correction to data from the sensor and data from the further sensor. The further sensor may be a microphone, and the signal may comprise a sound wave (preferably, an ultrasound wave).

Optionally, the involuntary response is a movement of at least part of the biometric feature. The sensor may be a camera, and the captured data may be visual data. Determining whether the data indicates that the involuntary response has occurred may comprise determining whether the visual data includes rolling shutter artefacts indicative of the movement of at least part of the biometric feature. The movement may be generally elastic, and preferably has a maximum extent of less than 2 mm, more preferably less than 1 mm, still more preferably less than 0.5 mm, and most preferably less than 0.2 mm.

The biometric feature may be a head. The movement may be a movement of one or more of: hair; and facial skin. The method may further comprise identifying features on the presented biometric feature where movement is expected to occur and/or magnifying a pre-determined frequency band corresponding to a frequency of the movement of at least part of the biometric feature.

Optionally, determining whether the data indicates that the involuntary response has occurred comprises comparing data captured using the sensor while the stimulus is applied to data captured using the sensor while the stimulus is not applied.

Optionally, determining whether the data indicates that the involuntary response has occurred comprises comparing data captured using the sensor against a model, wherein the model represents an involuntary response of a live human to the stimulus. Data related to an involuntary response of a biometric feature of a live human to the stimulus and a response of a falsified biometric feature of a live human may be collected for use in the model. The model may be a trained classifier, which may be trained based on presented biometric features of live humans and presented falsified biometric features of live humans. The model may comprise a convolutional neural network. Data related to the presented biometric feature may be transmitted for remote processing.

Captured visual data may be presented to the presented biometric feature using a screen.

The method may form part of a multi-modal method for determining whether a biometric feature of a live human is present.

According to at least one aspect described herein, there is provided a method of verifying the identity of a user, comprising performing a method as described herein; and verifying the identity of the user by comparing biometric information of the user (which optionally comprises information related to the user’s biometric feature(s)) against a database of biometric information of verified users.

According to at least one aspect described herein, there is provided apparatus for determining whether a biometric feature of a live human is present, comprising: a sensor for capturing data related to a presented biometric feature; a module adapted to apply a stimulus to the presented biometric feature, wherein the stimulus is capable of causing an involuntary response from a biometric feature of a live human; and a module adapted to determine whether the data indicates that the involuntary response has occurred thereby to determine whether a biometric feature of a live human is present.

The sensor may be a camera; and the captured data may be visual data. The module adapted to transmit a signal may be a loudspeaker; and the signal may be an ultrasound signal. The apparatus may further comprise a microphone for detecting the transmitted signal. The apparatus may be in the form of one or more of: a smartphone; a laptop computer; a desktop computer; or a tablet computer; an automated passport control gate; and an entry system.

According to at least one aspect described herein, there is provided a system for determining whether a biometric feature of a live human is present, comprising: a user device, comprising: a sensor for capturing data related to a presented biometric feature; and a module adapted to apply a stimulus to the presented biometric feature, wherein the stimulus is capable of causing an involuntary response from a biometric feature of a live human; and a remote determination module adapted to determine whether the data indicates that the involuntary response has occurred thereby to determine whether a biometric feature of a live human is present.

According to at least one aspect described herein, there is provided a method of perturbing an object using a user device, comprising, using a loudspeaker of the user device, transmitting an ultrasound signal towards the object. The method may further comprise, using a camera, detecting a perturbation of the object in response to the ultrasound signal.

The invention extends to methods, system and apparatus substantially as herein described and/or as illustrated with reference to the accompanying figures.

The invention also provides a computer program or a computer program product for carrying out any of the methods described herein, and/or for embodying any of the apparatus features described herein, and a computer readable medium having stored thereon a program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein.

The invention also provides a signal embodying a computer program or a computer program product for carrying out any of the methods described herein, and/or for embodying any of the apparatus features described herein, a method of transmitting such a signal, and a computer product having an operating system which supports a computer program for carrying out the methods described herein and/or for embodying any of the apparatus features described herein.

Any feature in one aspect of the invention may be applied to other aspects of the invention, in any appropriate combination. In particular, method aspects may be applied to apparatus aspects, and vice versa. As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory.

Furthermore, features implemented in hardware may generally be implemented in software, and vice versa. Any reference to software and hardware features herein should be construed accordingly.

As used herein, the term ‘biometric feature’ preferably connotes a part or characteristic of a human body which can be used to identify a particular human.

As used herein, the term ‘live human’ preferably connotes a living human being (i.e. not a recording or any other kind of indirect representation of a living human).

As used herein, the term ‘head’ preferably connotes a human head, including the face and hair. As used herein, the term ‘face’ is to be preferably understood to be interchangeable with the term ‘head’.

As used herein, the term ‘loudspeaker’ preferably connotes any electroacoustic transducer for transmitting sound waves. As used herein, the term ‘microphone’ refers to connotes any electroacoustic transducer for receiving sound waves.

As used herein, the term ‘audio’ preferably connotes sound, including both audible frequencies and ultrasound frequencies.

As used herein, the term ‘ultrasound’ preferably connotes sound having a frequency above 18 kHZ (which is barely perceptible, or not perceptible, for the majority of humans), more preferably between 18KHz to 22 Khz or alternatively between 20 kHz and 30 MHz.

As used herein, the term ‘involuntary’ preferably connotes an action that is not consciously controlled.

It should also be appreciated that particular combinations of the various features described and defined in any aspects of the invention can be implemented and/or supplied and/or used independently.

The invention will now be described, purely by way of example, with reference to the accompanying drawings, in which:

Figure 1 is a schematic depiction of a typical portable user device in the form of a smartphone;

Figure 2 is a flowchart which illustrates the main steps of a method for determining whether a biometric feature of a live human is present;

Figure 3 is an image of a live human head;

Figure 4 is an image showing the pattern of the transmitted signal and the signals included in the audio data received by the microphone;

Figure 5 is a schematic diagram of a software architecture (including memory) of a user device adapted to implement the method;

Figure 6 is a flow diagram showing the various steps of the comparison step of the method;

Figure 7 is an image showing a method of training a binary classifier for use in the method;

Figure 8a is a schematic depiction of the user device performing the method on a live human face;

Figure 8b is a schematic depiction of the user device performing the method on a falsified human face presented on another user device; and

Figure 9 is a schematic image of a tablet computer and loudspeaker and microphone array for implementing the method.

Specific Description

Figure 1 is a schematic depiction of a typical portable user device 100 in the form of a smartphone. As is well known, the user device 100 comprises a screen 102 and a loudspeaker 108 for providing information to a user, as well as a number of sensors arranged around the screen for receiving inputs from a user. In particular, the sensors include a front-facing camera 106 (i.e. a camera which faces towards the user as the user views the screen 102 - a further rear-facing camera (not shown) may also be provided) for receiving visual data, particularly visual data relating to the user, a microphone 104 for receiving audio data, particularly from the user, and one or more buttons 110 (or similar input device) for receiving a physical input from the user.

The present invention provides a method 200 for determining whether a biometric feature of a live human is present (i.e. whether a part of a purported user presented to the sensors of the user device is actually a part of a real human) which is suited to be implemented using the user device 100. The method 200 may find particular use as an initial stage in facial recognition systems in order to defend such systems against presentation attacks.

Figure 2 is a flow diagram which illustrates the main steps of the method 200 for determining whether a biometric feature of a live human is present. As mentioned, in an embodiment, the method 200 is implemented on a user device 100 in the form of a smartphone, although it will be appreciated that other implementations are of course possible.

In a first step 202, the camera 106 of the user device is used to capture visual data of a (purported) biometric feature which is presented to the camera 106 and screen 102. The biometric feature may be a real biometric feature of a live human or a falsified or fraudulent biometric feature, such a printed photograph (or other image), a photograph or video displayed on a screen of an external display device (such as another user device), or a fake 3D recreation of a biometric feature (such as a mask of a face, or a silicon mould of a fingerprint). In an embodiment, the biometric feature is a human head, although it will be appreciated that other implementations are of course possible. The visual data is a video of the presented head, which is continually captured.

In a second step 204, a stimulus is applied to the presented biometric feature. The stimulus is in the form of an ultrasound signal (i.e. a sound wave having a frequency above 18 kHz, such that it is barely perceptible or not perceptible to most humans), which is transmitted from the loudspeaker 108 of the user device. The signal is transmitted in accordance with a predetermined pattern, which is formed from a series of pulses (or bursts) and pauses.

The ultrasound pulses perturb objects in the vicinity, including the presented head. In the vast majority of uses, the only object of significance in the vicinity of the loudspeaker 108 is the presented head. The perturbation may cause at least parts of the presented head to make an involuntary response in the form of small movements, which are detectable via the visual data. Such small movements may be referred to as ‘micromovements’. The micro-movements are included in the visual data that is captured via the camera 106. Micro-movements are high frequency, spatially tiny motions between frames in a video. Such movements are not detectable via the naked eye and may not be clearly identifiable via the camera itself. However, where a Tolling shutter’ image capture technique is used (i.e. an image is captured by rapidly scanning across a scene, rather than capturing the entire scene at a single instant in time), Tolling shutter artefacts’ are produced as a result of the faster movement of the micro-movements relative to the movement of the rolling shutter across the scene. The presence or absence of micro-movements may therefore be determined by the presence or absence of the relevant rolling shutter artefacts within captured visual data.

Importantly, the micro-movements are movements of a live human head or face under perturbation caused by ultrasonic bombardment, such that a falsified presented head (such as an image or a mask) does not show such micro-movements under ultrasonic bombardment (although such a falsified presented head may of course move in other ways under ultrasonic bombardment). As will be appreciated, the loudspeaker 108 is located in the user device 100 such that it is directed towards the presented head, which may improve the transmission of the ultrasound signal towards the presented head.

Figure 3 is an image of a live human head 300, where parts of the head which may exhibit detectable micro-movements are marked. Under ultrasonic bombardment (of a sufficient magnitude), the entire head is perturbed, but measurable movements only take place in more mobile parts of the head, in particular those parts with have a low mass.

Hair is typically a highly mobile part of the head which exhibits a relatively high level of movement under ultrasonic bombardment. Hair may be present in various areas of a human head and in various forms, including as head hair 302, facial hair 304 (not present on the example head shown in the figure), eyebrows 306, and eyelashes 308. As will be appreciated, different humans have hair with different characteristics, which is present in different locations (and in different volumes) on the head. Detectable micromovements of the hair therefore vary considerably between different humans.

Different parts of the head 300 other than hair may also exhibit micro-movements under ultrasonic bombardment. In particular, facial skin 310 may also exhibit micro-movements (generally of a smaller scale than in hair) under ultrasonic bombardment. Skin in certain sensitive areas (i.e. at the comers of the users eyes) may exhibit relatively large micromovements which can be detectable using the visual data.

Under ultrasonic bombardment, hair and facial skin may experience a generally oscillatory and elastic movement in accordance with the pulses and pauses in the transmitted signal. Generally, this movement is less than 0.25 mm at its maximum extent. Such a movement also generally has a frequency that is not visible to the human eye - for a 20 kHz source ultrasound pulse, a hair is expected to move at 5 kHz.

Figure 4 is an image showing the pattern of the transmitted signal and the signals included in the audio data received by the microphone 104. In a third step 206, the transmitted signal is detected using the microphone 104 of the user device 100. As such, during a ‘pulse’ 402, a peak 406 corresponding to the transmitted pulse is received, while during a ‘pause’ 404 only reflected secondary peaks 408, 410, 412 are received (such secondary peaks may not be used in the method 200). The microphone 104 is turned on throughout the time period over which the ultrasound signal is transmitted (and optionally for a further time period once the transmission of the ultrasound signal has ceased). The use of the microphone 104 produces audio data which includes the transmitted ultrasound signal, along with any other audio signals detected in the vicinity of the microphone 104 which it is recording (such signals may include ambient sound and reflected ultrasound signals from nearby objects, including the presented head).

Optionally, once the transmission of the signal has stopped, the camera 106 and the microphone 104 may be switched off in order to save on battery power and data storage requirements of the user device 100.

In a fourth step 208, the visual data and the audio data are compared. Such comparison may be performed using a processor of the user device 100, or via an external server (in which case the visual data and the audio data, or a processed version thereof, are transmitted from the user device 100 to the external server). Processing locally on the phone may allow for improved speed and reliability of processing (as there is no need to rely on a data connection), while processing on an external server may provide improved processing power.

Both of the visual data and the audio data include information corresponding to the pattern of the transmitted ultrasound signal. In the visual data, the time periods during which micro-movements occur and the time periods during which micro-movements do not occur respectively correspond to the pulses and pauses in the ultrasound signal. In the audio track, pulses and pauses in the detected ultrasound signal of course correspond to the pulse and pauses in the transmitted ultrasound signal. As will be detailed later on, the information corresponding to the pattern in both the visual data and the audio track is extracted and compared. If the information from both the visual data and the audio track ‘matches’ (i.e. information corresponding to the same pattern is present in both the visual data and the audio track), this indicates that the presented head is the head of a real human, and thus that a live human is present. If no ‘match’ is found, this indicates that the presented head is a falsified representation of a human head, and thus that a live human is not present (in the sense that the ‘human’ presented to the device is not a live human - a live human may of course be presenting the falsified representation to the user device 100, for example by holding a picture in front of the user device 100).

It will be appreciated that the requirement of detecting a matching (i.e. coherent) pattern adds a further layer of security, as it protects against video attacks (which may include recorded micro-movements of a human head). An attacker will not be able to produce a matching micro-movement pattern unless they are able to predict the pattern of the transmitted ultrasound signal and replicate it faithfully (including correctly timing when the ultrasound signal is transmitted). Detecting a coherent pattern also allows several user devices 100 implementing the method 200 to be used independently in the same environment (without the risk of false positives caused by the detection of micromovements caused by other ultrasound signals). This is particularly useful for environments such as border control or an entrance to a corporate building, where a plurality of user devices 100 (for example, tablet computers) implementing the method 200 may be provided as part of an access control system.

In a fifth step 212, an output is produced in dependence on the results of the comparison in the fourth step 208. The output may take the form of a message indicating that a real human (or a real human head) has been verified or has not been verified.

Referring to Figure 5, a schematic diagram of the software architecture 150 (including memory) of the user device 100 adapted to implement the method 200 is shown. As illustrated, this includes a control module 152 for controlling the camera 106 and the microphone 104, a stimulus module 154 (also controlled by the control module 152) for generating a pattern and for controlling the loudspeaker 108 to transmit the ultrasound signal including the pattern, a data store 156 provided in communication with the stimulus module 154, and a comparison module 156 for receiving visual data 180 from the camera 106 and audio data 190 from the microphone 104. Optionally, the visual data 180 and audio data 190 (or alternatively processed forms of said data produced by the comparison module 156) are saved into the data store 15, once received.

The camera 106 uses a rolling shutter, and preferably has a relatively high frame rate of between 100 frames/second and 1000 frames/second. A high frame rate allows for clearer visible differences (as more motion-related information is available), which may provide improved accuracy.

The control module 152 is arranged to implement the visual data capturing step 202 of the method by turning the camera on in response to a signal from a processing component of the user device 100 (or alternatively an external signal), where the signal indicates that there is a need to determine whether a biometric feature of a live human is present. The control module 152 is also arranged to implement the signal transmission step 204 in part, in that it directs the stimulus module 154 to generate and transmit a signal. Similarly, the control module is also arranged to implement the signal transmission step 204 in part, in that it turns the microphone 104 on (optionally, at the same time as the camera is turned on).

The stimulus module 154 selects a pattern for the signal on the basis of data relating to a plurality of patterns in the data store. The plurality of patterns may differ in the duration and/or number of the pulses and/or the pauses in a single pattern. Furthermore, a variety of different types of signal may be used as the ‘pulse’ in the pattern; options include a sine sweep (for example, from around 18 KHz to 22 KHz), a pulsed ultrasound (for example, of around say 20 KHz), or a white noise. The stimulus module 154 may select the pattern in accordance with a cycle of various predetermined patterns, but preferably instead selects the pattern randomly (which may provide added security). In an alternative, the pattern is generated dynamically based on a plurality of variables (such as pause duration, pulse duration, and pulse signal type), which are determined dynamically or randomly. In all cases, the pattern of the signal (as well as the power characteristics of the signal) is selected so as to cause micro-movements of part of a live human head within a predetermined range of the user device 100.

The particular frequency of the ultrasound signal is selected in dependence on the hardware of the user device 100. A built-in loudspeaker of a smartphone can typically generate ultrasound of frequencies between 18 KHz and 22 KHz, and a built-in microphone of a smartphone can typically detect frequencies between 18 KHz and 22 KHz. As such, where the user device 100 is a smartphone, frequencies between 18 KHz and 22 KHz may be used.

Figure 6 is a flow diagram showing the various steps of the comparison step 208 of the method 200, which is implemented using the comparison module 156, and the output step 210 of the method 200.

As mentioned, visual data 180 is gathered via the camera 106. In an initial step 2082, the visual data 180 is processed by performing Eulerian video magnification, in which a predetermined frequency band of motion in the video is magnified, where the frequency band corresponds generally to the frequency of the micro-movements of the head (i.e.

corresponding to the rolling shutter artefacts caused by the micro-movements). This allows micro-movements that would not normally be visible to the naked eye to be used in subsequent processing steps. An example method of video magnification is presented in Wadhwa, Neal, et al. Phase-based video motion processing. ACM Transactions on Graphics (TOG) 32.4 (2013): 80. The video magnification in step 2082 acts to improve the quality of data that is input into the classifier, which may improve the accuracy of classification. Using video magnification also acts to filter out irrelevant movements (such as low frequency movements of the whole head, or other biometric feature).

The step 2082 further comprises applying a time correction for any delay between the pulses in the ultrasound signal and the micro-movements due to the inertia of the parts of the head which experience micro-movements.

In an optional subsequent step 2084, micro-movements within the processed visual data 180 are identified and extracted. More specifically, the processed visual data corresponding to a duration n (i.e. occurring during a ‘pulse’ phase 402 of the signal, as shown in Figure 4) is transformed into a binary signal which changes with respect to time, where ‘1’ represents the presence of micro-movements and ‘0’ represents the absence thereof. A ‘1’ value is encoded if the processed visual data within the duration n shows a movement with an amplitude that is above a predetermined threshold.

At the same time, in optional step 2086, the audio data 190 received via the microphone 104 is processed to identify the pulses of the transmitted signal. Such processing may involve comparing the magnitude of the received signal against a predetermined threshold (for ultrasound frequencies). Comparison against a threshold may serve to filter out other audio sources, such as ambient sound and reflected ultrasound signals from the presented human head.

As with the visual data 190, the audio data is transformed into a binary signal which changes with respect to time, where ‘1’ represents a signal pulse and ‘0’ represents a signal pause.

In an optional subsequent step 2088, the extracted binary signals from the visual data and audio data are compared. Such comparison may be done by taking an XOR of the two binary patterns and then counting the number of zeroes (i.e. periods in which the pattern matches). A pre-determined percentage of zeroes in the data may be used as a threshold for determining whether the signals match - for example, the threshold may be 95%. If the binary signals match, a live biometric feature is present.

If a live biometric feature is identified, a positive output 2102 is issued in the output step 212. Conversely, if a live biometric feature is not identified, an output 2104 identifying the presented biometric feature as falsified is issued in the output step 212. Optionally, instead of or in addition to a binary output, a confidence score is produced.

Figure 7 is an image showing a method of training a binary classifier 700 for use in the method 200. The step 210 in which the visual data and audio data are compared is preferably implemented using a trained binary classifier, which may provide more reliable identification as compared to simple comparison of a binary pattern. In use, the classifier receives the raw audio data and visual data as an input and produces a binary output based on its training. The classifier is trained based on raw audio data and visual data alone, using feature learning with a convolutional neural network. The classifier 700 is trained on the basis of original data 702 from presented live human faces as well as ‘attack’ data 704 from falsified human faces.

As such, in this embodiment, the previously described optional steps 2084, 2086, 2088 of the comparison step 210 (in which binary signals are produced and compared) may be viewed as indicative of the processing performed by the binary classifier (which operates on the raw audio and visual data). Alternatively, the optional steps 2084, 2086, 2088 may be performed so as to test the accuracy of the output of the classifier (i.e. the test whether it is detecting the appropriate features). In a further alternative, the classifier may be arranged to perform explicitly the optional steps 2084, 2086, 2088.

As mentioned, the comparison may be performed using an external server, where the binary classifier 200 is provided on the external server. In such a case, the extracted binary signals may be transmitted to the external server over a data connection, and an output may be transmitted back to the user device once the comparison has been performed.

It will be appreciated that the classifier is not trained on the basis of explicitly identified features (or locations) within the visual data - instead, the classifier itself determines which features are important in classification. This may mean that the particular location of features exhibiting micro-movements (primarily, hair) does not affect the result of the classification, as the classifier is able to learn that the presence of micro-movements (particularly of hair) indicates that a live biometric feature is present, irrespective of the location of the micro-movements on the biometric feature.

Figure 8a is a schematic depiction of the user device 100 performing the method 200 on a live human face 300. As shown, the ultrasound signal transmitted by the microphone 104 causes micro-movements of part of the presented head (in this case, the hair), which is detected via the camera 106. At the same time, the transmitted ultrasound signal is received via the loudspeaker 108. The screen 102 of the user device 100 presents an image 112 of the presented head (where the image data is acquired via the camera) - this may assist a human attempting to verify themselves to the system to hold the user device at an appropriate orientation and distance relative to their presented face (so that all relevant parts of the head are in the camera frame and so that the head is in an acceptable range so that an ultrasound signal causes micro-movements).

Figure 8b is a schematic depiction of the user device 100 performing the method 200 on a falsified human face, which is presented on another user device 100. The signal is transmitted and received as before, but since no micro-movements corresponding to the transmitted pattern are caused, the falsified human face cannot be verified as a live human face.

It will be appreciated that the method 200 may act as a first stage of a broader facial or biometric verification method, where the first step (i.e. the described method 200) determines whether a live human is present. Subsequent steps can then determine whether the live human is a verified user.

Similarly, it will be appreciated that the method 200 can form one part of a multi-modal method for determining whether a live human is present, where other techniques are used as further inputs to develop a confidence ‘score’ that the presented feature is in fact a feature of a live human.

Alternatives and Extensions

In an alternative, the microphone 104 is not used to detect a transmitted audio signal. Instead, data related to the transmitted pattern (from the stimulus generation module 154) is provided directly to the comparison module 156. This data is then compared against the visual data 180 and/or the binary signal generated from the visual data, as described. Timing data related to when the signal was transmitted may also be provided from the stimulus generation module 154 to the comparison module 156 to aid in comparison. As will be appreciated, comparing detected micro-movements against an electronic record of the transmitted signal reduces the processing requirements of the method 200.

Optionally, a multi-class classifier may be used instead of a binary classifier.

Optionally, the micro-movements detected via the visual data may be micro-expressions (i.e. brief, involuntary facial expressions indicative of an emotional state) in response to a presented stimulus.

As an alternative to capturing video via the camera, a plurality of photographs may instead be captured and used in subsequent analysis. The photographs are captured with a sufficiently high shutter speed so as to capture micro-movements of the present head. Such sufficient frame rates may be between 25 frames/second and 1000 frames/second.

Optionally, the reflected ultrasound signals from the presented human face (which are included in the audio track) are used in determining the pattern used in the transmitted signal.

Optionally, an alternative ultrasound transducer is used to transmit and/or receive ultrasound signals, rather than the loudspeaker 108 and/or microphone 104 of the user device 100. The transducer may, for example, be provided as part of separate apparatus.

Although the invention has principally been defined with reference to the transmitted signal being an ultrasound signal, it will be appreciated that a variety of alternative signals may be used, where the signal acts as a stimulus which causes an involuntary reaction from the presented human head (or other biometric representation) which is detectable via the visual data. For example, the transmitted signal may be a pulse of light (i.e. a ‘flash’) transmitted by a flash unit of the user device 100, external hardware, or by the screen 102. The sudden exposure of light caused by the flash causes a live human head to react involuntarily by partially closing the eyes. The flash may be transmitted according to a pattern, as previously described. Various other stimuli are contemplated, including audible sounds, visual stimuli presented using the screen 102, and vibrations of the user device 100.

Optionally, a time at which the stimulus is applied (i.e. when the ultrasound signal starts to be transmitted) is varied, so as to add an additional layer of security. For example, in one instance the signal may be transmitted instantly when the method 200 is used, while in another instance a delay may be applied before the signal is transmitted. Optionally, the delay used is randomly determined.

Although the invention has principally been defined with reference to the relevant biometric feature being a human head or face, it will be appreciated that a variety of biometric features can be used, such as a hand, palm, or finger.

The method 200 may be implemented on any kind of portable user device 100 having a screen and a camera, such as a smartphone, a laptop computer, a desktop computer, or a tablet computer. Alternatively, the method 100 may be implemented using a static device, such as those that might be included as part of or in association with entry systems, doors, automated passport control gates, or any other kind of system or device (static or otherwise) implementing a facial recognition system.

Any device or apparatus implementing the described method 200 may comprise a NFC (Near Field Communication) reader adapted to read a RFID (Radio Frequency IDentification) chip provided as part of an identity-certifying document (such as a passport, national ID card or corporate employee badge) or another NFC capable device, which may allow data provided in the RFID chip via NFC to be compared to a face of the user that is verified using the method 200 (as well as optionally allowing comparison between the data in the RFID chip via NFC and any photograph provided as part of the document).

Alternatively, an array of loudspeaker and ultrasound may be used in addition to the typical loudspeaker and microphone available in a smartphone or tablet. Such an apparatus helps the algorithm not only to “perceive” the presence of live human face but may also “locate” the face.

Figure 9 is a schematic image of a tablet computer 900 and loudspeaker and microphone array 902 for implementing the method 200. In one particular use case, a series of tablet computers 900 implementing the method 100 may be installed at an electronic border control (or as part of another access control system). A user may stand in front of the tablet computer 900 and present their passport (allowing the NFC chip of the passport to be scanned, and the photograph information to be compared against a photograph taken via a camera of the tablet computer). In this scenario, to determine whether the user’s face is a live biometric feature a commercially available loudspeaker and microphone array 902 may be provided in communication with the tablet computer 900, where the ultrasound frequency range for such an array 902 may be between 20 kHz and 30 MHz. The use of a loudspeaker and microphone array 902 may allow for improved accuracy.

Optionally, the method 200 may include further steps for mitigating the chances of success of a ‘historic data attack’ (i.e. an attack in which data relating to a ‘successful’ verification is obtained by an attacker by sniffing data during capture, where the historic data is resent on a different occasion in the hope that the verification will again be ‘successful’). In an example, captured visual data and audio data is encrypted and timestamped. The method 200 may then include a further step of determining whether the time stamp indicates that the data is historic data (in which case a live human feature is not present). Alternatively or additionally, a server and/or the processor of the user device 100 may generate a pseudorandom key. A watermarking pattern may be generated based on the pseudorandom key, where the pattern is applied to the captured visual data and audio data. The method 200 may then include a further step of determining whether the watermarking pattern indicates that the data is historic data (i.e. whether the appropriate watermarking pattern is present).

It will be appreciated that alternative components to a screen may be used for presenting the stimulus, such as a flat surface on to which the stimulus is projected.

It will be understood that the invention has been described above purely by way of example, and modifications of detail can be made within the scope of the invention.

Each feature disclosed in the description, and (where appropriate) the claims and 5 drawings may be provided independently or in any appropriate combination.

Reference numerals appearing in the claims are by way of illustration only and shall have no limiting effect on the scope of the claims.

Claims

1. A method for determining whether a biometric feature of a live human is present, comprising:

using a sensor, capturing data related to a presented biometric feature; applying a stimulus to the presented biometric feature, wherein the stimulus is capable of causing an involuntary response from a biometric feature of a live human; and determining whether the data indicates that the involuntary response has occurred thereby to determine whether a biometric feature of a live human is present.

2. A method according to Claim 1, wherein applying a stimulus comprises transmitting a signal, wherein the signal is adapted to cause an involuntary response from a biometric feature of a live human.

3. A method according to Claim 2, wherein transmitting a signal comprises transmitting a signal in accordance with a predetermined pattern.

4. A method according to Claim 3, wherein determining whether the data indicates that the involuntary response has occurred comprises determining whether a pattern corresponding to the predetermined pattern is present in the captured data.

5. A method according to Claim 3 or 4, wherein the pattern is formed from at least one pulse and at least one pause.

6. A method according to any of Claims 3 to 5, further comprising selecting a pattern from a plurality of patterns.

7. A method according to any of Claims 3 to 6, wherein selecting a pattern comprises randomly selecting a pattern.

8. A method according to any of Claims 2 to 7, further comprising detecting the transmitted signal using a further sensor.

9. A method according to Claim 8, wherein determining whether the data indicates that the involuntary response has occurred comprises comparing data from the sensor and the further sensor.

10. A method according to Claim 9, wherein determining whether the data indicates that the involuntary response has occurred comprises applying a time correction to data from the sensor and data from the further sensor.

11. A method according to Claim 8 or 9, wherein the further sensor is a microphone.

12. A method according to any of Claims 2 to 11, wherein the signal comprises a sound wave.

13. A method according to Claim 12, wherein the signal comprises an ultrasound wave.

14. A method according to any preceding claim, wherein the involuntary response is a movement of at least part of the biometric feature.

15. A method according to Claim 14, wherein the sensor is a camera; and the captured data is visual data.

16. A method according to Claim 15, wherein determining whether the data indicates that the involuntary response has occurred comprises determining whether the visual data includes rolling shutter artefacts indicative of the movement of at least part of the biometric feature.

17. A method according to Claim 14 to16, wherein the movement is generally elastic.

18. A method according to any of Claims 14 to 17, wherein the maximum extent of the movement is less than 2 mm.

19. A method according to Claim 18, wherein the maximum extent of the movement is less than 1 mm.

20. A method according to Claim 19, wherein the maximum extent of the movement is less than 0.5 mm.

21. A method according to Claim 20, wherein the maximum extent of the movement is less than 0.2 mm.

22. A method according to any of Claims 14 to 21, wherein the biometric feature is a head.

23. A method according to Claim 22, wherein the movement is a movement of one or more of: hair; and facial skin.

24. A method according to any of Claims 14 to 23, further comprising identifying features on the presented biometric feature where movement is expected to occur.

25. A method according to any of Claims 14 to 24, further comprising magnifying a pre-determined frequency band corresponding to a frequency of the movement of at least part of the biometric feature.

26. A method according to any preceding claim, wherein determining whether the data indicates that the involuntary response has occurred comprises comparing data captured using the sensor while the stimulus is applied to data captured using the sensor while the stimulus is not applied.

27. A method according to any preceding claim, wherein determining whether the data indicates that the involuntary response has occurred comprises comparing data captured using the sensor against a model, wherein the model represents an involuntary response of a live human to the stimulus.

28. A method according to Claim 27, further comprising collecting data related to an involuntary response of a biometric feature of a live human to the stimulus and a response of a falsified biometric feature of a live human for use in the model.

29. A method according to Claim 27 or 28, wherein the model is a trained classifier.

30. A method according to Claim 29, further comprising training the model based on presented biometric features of live humans and presented falsified biometric features of live humans.

31. A method according to any of Claims 27 to 30, wherein the model comprises a convolutional neural network.

32. A method according to any of Claims 27 to 31, further comprising transmitting data related to the presented biometric feature for remote processing.

33. A method according to any preceding claim, further comprising,using a screen, presenting the captured visual data to the presented biometric feature.

34. A method according to any preceding claim, wherein the method forms part of a multi-modal method for determining whether a biometric feature of a live human is present.

35. A method of verifying the identity of a user, comprising performing the method of any of Claims 1 to 34; and verifying the identity of the user by comparing biometric information of the user against a database of biometric information of verified users.

36. A computer program product comprising software code adapted to carry out the method of any of Claims 1 to 35.

37. A client or user device in the form of a telecommunications device or handset such as a smartphone or tablet adapted to execute the computer program product of Claim 36.

38. Apparatus for determining whether a biometric feature of a live human is present, comprising:

a sensor for capturing data related to a presented biometric feature;

a module adapted to apply a stimulus to the presented biometric feature, wherein the stimulus is capable of causing an involuntary response from a biometric feature of a live human; and a module adapted to determine whether the data indicates that the involuntary response has occurred thereby to determine whether a biometric feature of a live human is present.

39. Apparatus according to Claim 38, wherein the sensor is a camera; and the captured data is visual data.

40. Apparatus according to Claim 38 or 39, wherein the module adapted to apply a stimulus is a loudspeaker; and the stimulus is an ultrasound signal.

41. Apparatus according to any of Claims 38 to 40, further comprising a microphone for detecting the transmitted signal.

42. Apparatus according to any of Claims 38 to 41, wherein the apparatus is in the form of one or more of: smartphone; a laptop computer; a desktop computer; or a tablet computer; an automated passport control gate; and an entry system.

43. A system for determining whether a biometric feature of a live human is present, comprising:

a user device, comprising:

a sensor for capturing data related to a presented biometric feature; and a module adapted to apply a stimulus to the presented biometric feature, wherein the stimulus is capable of causing an involuntary response from a biometric feature of a live human; and a remote determination module adapted to determine whether the data indicates that the involuntary response has occurred thereby to determine whether a biometric feature of a live human is present.

44. A method of perturbing an object using a user device, comprising, using a loudspeaker of the user device, transmitting an ultrasound signal towards the object.

45. A method according to Claim 44, further comprising, using a camera, detecting a perturbation of the object in response to the ultrasound signal.