CN113709616B

CN113709616B - Ear proximity detection

Info

Publication number: CN113709616B
Application number: CN202110993115.5A
Authority: CN
Inventors: J·P·莱索
Original assignee: Cirrus Logic International Semiconductor Ltd
Current assignee: Cirrus Logic International Semiconductor Ltd
Priority date: 2018-03-21
Filing date: 2019-03-20
Publication date: 2024-07-05
Anticipated expiration: 2039-03-20

Abstract

Embodiments of the present disclosure include methods, apparatuses, and computer programs for detecting the proximity of an ear relative to an audio device. In one embodiment, the present disclosure provides a system for detecting the presence of an ear in the vicinity of an audio device. The system comprises: an input for obtaining a data signal from an environment of the audio device; and an ear biometric authentication module configured to: comparing one or more ear biometric features extracted from the data signal with an ear biometric template; and generating a first output based on a comparison of the one or more extracted ear biometric features with the ear biometric template, the first output indicating whether there are any ears in the vicinity of the audio device.

Description

Ear proximity detection

The application is a divisional application of an application patent application with the application date of 2019, 3-month and 20-date, the application number of 201980020365.1 and the name of 'ear proximity detection'.

Technical Field

Embodiments of the present disclosure relate to devices, systems, methods, and computer programs for ear proximity (ear proximity) detection. In particular, embodiments of the present disclosure relate to an apparatus, system, method and computer program for detecting whether an ear is present in the vicinity of an audio device.

Background

It is known that the acoustic properties of a user's ear, whether the external portion (referred to as the concha (pinna) or pinna (auricle)), the ear canal, or both, vary significantly from individual to individual and can therefore be used as a biometric to identify the user. One way to achieve this is that one or more speakers (or similar transducers) are positioned close to or within the ear to generate acoustic stimuli, and one or more microphones are similarly positioned close to or within the ear to detect the acoustic response of the ear to the acoustic stimuli. One or more features may be extracted from the response signal and used to characterize the individual.

For example, the ear canal is a resonant system, so one feature that can be extracted from the response signal is the resonant frequency of the ear canal. If the measured resonant frequency (i.e., the resonant frequency in the response signal) is different from the resonant frequency stored for the user, a biometric algorithm coupled to receive and analyze the response signal may return a negative result. Other features of the response signal may be similarly extracted and used to characterize the individual. For example, the features may include one or more mel-frequency cepstral coefficients. More generally, a transfer function (or a characteristic of a transfer function) between the acoustic stimulus and the measured response signal may be determined and compared to a stored transfer function (or a characteristic of a stored transfer function), which is a characteristic of the user.

A personal audio device (such as a headset, headphone or mobile phone) may be used to generate acoustic stimuli as well as measure responses. In such personal audio devices, power consumption is critical because space is limited and thus battery size is also limited. The battery life between two consecutive charges is a key performance indicator when the user selects the device.

To reduce power consumption, many personal audio devices have a dedicated "detect-on-the-ear" (in-EAR DETECT) function operable to detect the presence of an ear in the vicinity of the device. If an ear is not detected, the device may be placed in a low power state, thereby saving power; if an ear is detected, the device may be placed in a relatively high power state.

The on-ear detection function may also be used for other purposes. For example, when the phone is placed close to the user's ear, the mobile phone may utilize an on-ear detection function to lock the touch screen, thereby preventing accidental touch input while talking. For example, the personal audio device may pause audio playback in response to detecting removal of the personal audio device from the user's ear, or cancel pausing audio upon detecting application of the personal audio device to the user's ear.

A variety of mechanisms for in-ear detection are known in the art. For example, infrared sensors have been used in mobile phones to detect the proximity of the ear. Light sensors have been proposed to detect the insertion of headphones and earphones into or onto the user's ears. However, all of these mechanisms have the disadvantage that they require additional hardware in the device for detection purposes on the ear. Additional sensors and/or additional processing circuitry for processing the sensor output signals may be required.

Disclosure of Invention

Devices, systems, methods, and computer programs for ear proximity detection are presented that attempt to alleviate or mitigate one or more of the above-stated problems.

In a first aspect, a system for detecting the presence of an ear in the vicinity of an audio device is provided. The system comprises: an input for obtaining a data signal from an environment of the audio device; and an ear biometric authentication module configured to: comparing one or more ear biometric features extracted from the data signal with an ear biometric template; and generating a first output based on a comparison of the one or more extracted ear biometric features with the ear biometric template, the first output indicating whether there are any ears in the vicinity of the audio device.

Another aspect provides an electronic device comprising the system described above.

Another aspect of the present disclosure provides a method of detecting the presence of an ear in the vicinity of an audio device. The method comprises the following steps: obtaining a data signal from an environment of the audio device; extracting one or more ear biometric features from the data signal; comparing the one or more extracted ear biometric features with an ear biometric template for an authorized user of the audio device; and generating a first output based on a comparison of the one or more extracted ear biometric features with an ear biometric template for the authorized user, the first output indicating whether there are any ears in the vicinity of the audio device.

Another aspect provides an electronic device comprising processing circuitry and a non-transitory machine-readable medium storing instructions which, when executed by the processing circuitry, cause the electronic device to implement a method as described above.

Another aspect provides a non-transitory machine-readable medium storing instructions which, when executed by a processing circuit, cause an electronic device to implement a method as described above.

Another aspect of the present disclosure provides a system for detecting the presence of an ear in the vicinity of an audio device. The system comprises: an input for obtaining a data signal from an environment of the audio device; and an ear biometric authentication module configured to: extracting one or more first ear biometric features from a first number of data frames of the data signal; calculating a first score indicative of a distance between the extracted one or more ear biometric features and an ear biometric template of an authorized user; comparing the first score to a first threshold to determine whether there are any ears in the vicinity of the audio device; extracting one or more second ear biometric features from a second number of data frames of the data signal in response to determining that there are any ears in the vicinity of the audio device, the second number of data frames being greater than the first number of data frames; calculating a second score indicative of a distance between the one or more second ear biometric features extracted and an ear biometric template of the authorized user; and comparing the second score to a second threshold to determine whether an ear of the authorized user is present in the vicinity of the audio device, the second threshold being different from the first threshold.

Another aspect of the present disclosure provides a system for detecting the presence of an ear in the vicinity of an audio device. The system comprises: an input for obtaining a data signal from an environment of the audio device; and an ear biometric authentication module configured to: comparing the one or more ear biometric features extracted from the data signal with an ear biometric template of an authorized user; calculating one or more scores based on the comparison, the one or more scores being indicative of distances between the one or more extracted ear biometric features and the ear biometric template; comparing the one or more scores to a first threshold and a second threshold, wherein the first threshold and the second threshold are different from each other; and generating a first output based on a comparison of the one or more scores to the first threshold, the first output indicating whether there is any ear in the vicinity of the audio device, and generating a second output based on a comparison of the one or more scores to the second threshold, the second output indicating whether there is an ear of an authorized user in the vicinity of the audio device.

Another aspect of the present disclosure provides the use of a biometric processor to detect the presence of any ear in the vicinity of an audio device.

Drawings

For a better understanding of embodiments of the present disclosure, and to show more clearly how these embodiments may be carried into effect, reference will now be made, by way of example only, to the following drawings in which:

FIGS. 1a to 1e illustrate embodiments of a personal audio device;

FIG. 2 shows an arrangement according to an embodiment of the present disclosure;

FIG. 3 is a schematic illustration of a biometric score according to an embodiment of the present disclosure;

FIG. 4 illustrates a system according to an embodiment of the present disclosure;

fig. 5 illustrates acquisition of an audio signal according to an embodiment of the present disclosure; and

Fig. 6 is a flow chart of a method according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure provide apparatus, systems, methods, and computer programs for ear proximity detection. In particular, embodiments utilize a biometric process based on one or more ear biometric characteristics to detect the presence or absence of an ear from the vicinity of a personal audio device.

As used herein, the term "personal audio device" is any electronic device suitable or configurable to provide audio playback to substantially only a single user. Some embodiments of a suitable personal audio device are shown in fig. 1a to 1 e.

Fig. 1a shows a schematic view of a user's ear comprising an (external) concha or pinna 12a and an (internal) ear canal 12b. The personal audio device 20 includes a ear-in-ear (circle-aural) headset that is worn by a user over the ear. The headset comprises a housing that substantially surrounds and encloses the pinna 12a to provide a physical barrier between the user's ear and the external environment. A cushion or pad may be provided at the edge of the housing to increase the comfort of the user and the acoustic coupling between the headset and the skin of the user (i.e. to provide a more effective barrier between the external environment and the user's ear).

The headset comprises one or more speakers 22, which one or more speakers 22 are positioned on the inner surface of the headset and arranged to generate an acoustic signal towards the user's ear, and in particular the ear canal 12 b. The headset further comprises one or more microphones 24, which one or more microphones 24 are also positioned on the inner surface of the headset and arranged to detect acoustic signals within the internal volume defined by the headset, the auricle 12a and the ear canal 12 b.

The headphones may be able to perform active noise cancellation to reduce the amount of noise experienced by the user of the headphones. Active noise cancellation operates by detecting noise (i.e., using a microphone) and generating a signal having the same amplitude but opposite phase to the noise signal (i.e., using a speaker). Thus, the generated signal destructively interferes with the noise, thereby mitigating the noise experienced by the user. Active noise cancellation may operate based on a feedback signal, a feedforward signal, or a combination of both. Feedforward active noise cancellation utilizes one or more microphones on the exterior surface of the headset that operate to detect ambient noise before it reaches the user's ear. The detected noise is processed quickly and a cancellation signal is generated to match the incoming noise when it reaches the user's ear. Feedback active noise cancellation operates to detect a combination of noise and an audio playback signal generated by one or more speakers using one or more error microphones positioned on an inner surface of the headset. This combination is used in a feedback loop with knowledge of the audio playback signal to adjust the cancellation signal generated by the speaker to reduce noise. Thus, the microphone 24 shown in fig. 1a may form part of an active noise cancellation system, e.g. as an error microphone.

Fig. 1b shows an alternative personal audio device 30, the personal audio device 30 comprising an ear-mounted (supra-audio) headphone. The ear-mounted headset does not encircle or surround the ear of the user, but is located on the pinna 12 a. The headset may include a cushion or pad to mitigate the effects of ambient noise. As with the ear-covering headphones shown in fig. 1a, the ear-covering headphones comprise one or more speakers 32 and one or more microphones 34. The speaker 32 and microphone 34 may form part of an active noise cancellation system, with the microphone 34 acting as an error microphone.

Fig. 1c shows another alternative personal audio device 40, the personal audio device 40 comprising an in-the-ear (intra-concha) headphone (or earphone). In use, the in-ear headphones are located inside the external ear cavity of the user. The in-ear headphones may be loosely mounted within the cavity, allowing air to flow into and out of the user's ear canal 12b.

As with the devices shown in fig. 1a and 1b, the in-ear headphones comprise one or more speakers 42 and one or more microphones 44, which one or more speakers 42 and one or more microphones 44 may form part of an active noise cancellation system.

Fig. 1d shows another alternative personal audio device 50, the personal audio device 50 comprising an in-ear (in-ear) headphone (or earphone), a plug-in headphone or an earplug. This headset is configured to be partially or fully inserted within the ear canal 12b and may provide a relatively tight seal (i.e., it may be acoustically closed or sealed) between the ear canal 12b and the external environment. As with the other devices described above, the headset may include one or more speakers 52 and one or more microphones 54, and these components may form part of an active noise cancellation system.

Since the in-ear headphones may provide a relatively tight acoustic seal around the ear canal 12b, the external noise detected by the microphone 54 (i.e., external noise from the external environment) may be low.

Fig. 1e shows another alternative personal audio device 60, which personal audio device 60 is a mobile or cellular telephone or handset (handle). The earpiece 60 includes one or more speakers 62 for playing back audio to the user, and one or more similarly positioned microphones 64.

In use, the earpiece 60 is held close to the user's ear to provide audio playback (e.g., during a conversation). Although a tight acoustic seal is not achieved between the earpiece 60 and the user's ear, the earpiece 60 is typically held close enough so that acoustic stimuli applied to the ear via one or more speakers 62 generate a response from the ear that can be detected by one or more microphones 64. As with other devices, one or more speakers 62 and one or more microphones 64 may form part of an active noise cancellation system.

Thus, all of the personal audio devices described above provide audio playback to substantially a single user when in use. Each device includes one or more speakers and one or more microphones that may be used to generate biometric data related to a user's ear. The speaker is operable to generate acoustic stimulus or acoustic probe waves towards the user's ear, and the microphone is operable to detect and measure the response of the user's ear to the acoustic stimulus, e.g., to measure sound waves reflected from the ear canal or shell and/or to obtain other ear biometric data. Acoustic stimulation may be sonic (e.g., in the audio frequency range of, for example, 20Hz to 20 kHz) or supersonic (e.g., greater than 20kHz or in the range of 20kHz to 50 kHz) or near supersonic (e.g., in the range of 15kHz to 25 kHz). In some embodiments, the microphone signal may be processed to measure a received signal having the same frequency as the transmitted signal frequency.

Another biometric marker may include otoacoustic noise (otoacoustic noise) emitted by the cochlea in response to the acoustic stimulation waveform. The otoacoustic response (otoacoustic response) may include a mix of frequencies in the input waveform. For example, if the input acoustic stimulus includes two tones (tone) of frequencies f1 and f2, then the otoacoustic emissions may include components of frequencies 2 x f1-f 2. The relative power of the frequency components of the transmitted waveforms has been shown to be a useful biometric indicator. Thus, in some embodiments, the acoustic stimulus may include tones of two or more frequencies, and the amplitude of the sum or difference mixture of integer multiples of frequencies generated by otoacoustic emissions from the cochlea may be measured. Alternatively, otoacoustic emissions can be stimulated and measured by using a stimulation waveform that includes a fast transient (e.g., click).

Depending on the construction and use of the personal audio device, the measured response may include user-specific components, i.e., biometric data, related to the auricle 12a, the ear canal 12b, or a combination of both the auricle 12a and the ear canal 12 b. For example, the ear-worn headphones shown in fig. 1a will typically acquire data relating to the auricle 12a and potentially also to the ear canal 12 b. The plug-in headset shown in fig. 1d will typically acquire data relating only to the ear canal 12 b.

One or more of the personal audio devices described above (or more precisely, microphones in these devices) are operable to detect a bone-conducted speech signal from a user. That is, when the user speaks, sound is projected from the user's mouth through the air. However, the acoustic vibrations will also be carried through a portion of the user's bone or skull (e.g., jawbone). These acoustic vibrations may be coupled to the ear canal 12b through the jaw or some other portion of the user's bone or skull and detected by a microphone. Lower frequency sounds tend to experience stronger coupling than higher frequency sounds, and voiced sounds (i.e., sounds or those phonemes generated when the vocal cords vibrate) are more strongly coupled via skeletal conduction than unvoiced sounds (i.e., sounds or those phonemes generated when the vocal cords vibrate). The in-ear headphones 50 may be particularly suitable for detecting bone-conducted speech due to the tight acoustic coupling around the ear canal 12 b.

The other ear biometric feature relates to heart sounds, which can be extracted from an audio signal acquired by the user's ear. That is, phonocardiograms have been shown to be useful for distinguishing between individuals. See, for example, beritelli and Serrano, volume "Biometric Identification Based on Frequency Analysis of Cardiac Sounds",IEEE Transactions on Information Forensics and Security(, volume 2, phase 3, pages 596-604, 2007). One particular feature that may be used as a biometric is the variability of the R-R interval (i.e., the period between successive R peaks, where R is the point corresponding to the peak of the QRS complex of the electrocardiogram).

All of the devices shown in fig. 1 a-1 e and described above may be used to implement aspects of the present disclosure.

Fig. 2 shows an arrangement 200 according to an embodiment of the present disclosure. Arrangement 200 includes a personal audio device 202, a biometric authentication system 204, and a host electronic device 206.

Personal audio device 202 may be any device suitable or configurable to provide audio playback to substantially a single user. Personal audio device 202 typically includes one or more speakers and one or more microphones that are positioned adjacent to or within the user's ear when in use. The personal audio device may be wearable and include headphones for each ear of the user. Alternatively, the personal audio device is operable to be carried by a user and held adjacent one or more ears of the user during use. The personal audio device may comprise a headset or a mobile telephone handset as described above with respect to any of fig. 1a to 1 e.

Host electronic device 206 may include any suitable audio playback device configurable to generate an audio playback signal to be played to a user via personal audio device 202. It should be understood that where, for example, personal audio device 202 comprises a cellular telephone or similar device, host device 206 and personal audio device 202 may be identical.

Biometric system 204 is coupled to both personal audio device 202 and host electronic device 206. In some embodiments, the biometric system 204 is provided in the personal audio device 202 itself. In other embodiments, the biometric system 204 is disposed in the host electronic device 206. In other embodiments, the operation of the biometric device 204 is distributed between the personal audio device 202 and the host electronic device 206.

Biometric system 204 is coupled to personal audio device 202 and is operable to control personal audio device 202 to obtain biometric data indicative of an individual using personal audio device 202.

Thus, the personal audio device 202 may generate acoustic stimuli for application to the user's ear and detect or measure the response of the ear to the acoustic stimuli, thereby obtaining ear biometric data. For example, the acoustic stimulus may be in the sonic range, or in the supersonic range. In some embodiments, the acoustic stimulus may have a flat spectrum over a range of relevant frequencies, or may be pre-processed in a manner that highlights those frequencies (i.e., having a higher amplitude than other frequencies) that facilitate good discrimination between individuals. The measured responses correspond to reflected signals received at one or more microphones, with some frequencies being reflected at higher amplitudes than others due to the specific response of the user's ear. Other forms of ear biometric data, such as heart rate variability and bone conduction speech signals, may only require detection of an audio signal, without prior acoustic stimulation.

Biometric system 204 may send appropriate control signals to personal audio device 202 to initiate the acquisition of biometric data and receive data corresponding to the measured response from personal audio device 202. The biometric system 204 is operable to extract one or more features from the measured response and utilize those features as part of a biometric process.

Some embodiments of suitable biometric procedures include biometric enrollment and biometric authentication. Registration includes acquiring and storing biometric data that is characteristic of an individual. Such stored data may be referred to herein as "ear print". Authentication (alternatively referred to as verification or identification) includes obtaining biometric data from an individual and comparing the data to stored ear prints of one or more registered users or authorized users. A positive comparison (i.e., a determination that the acquired data matches or is sufficiently close to the stored ear print) results in the individual being authenticated. For example, the individual may be allowed to perform a restricted action or be granted access to a restricted area or restricted device. Negative comparison (i.e., a determination that the acquired data does not match or is not sufficiently close to the stored ear print) results in the individual not being authenticated. For example, the individual may not be allowed to perform a restricted action or be granted access to a restricted area or restricted device.

Thus, the biometric system 204 may provide an authentication result to the host electronic device 206, and if the biometric result is positive and identifies the user as an authorized user, the host electronic device 206 is configured to allow or perform one or more restricted actions.

However, in accordance with embodiments of the present disclosure, the authentication system 204 is further utilized to perform an on-ear detection function, i.e., to detect whether an ear is present in the vicinity of the personal audio device. Positive indications of the proximity of the ear to the personal audio device may be used in a variety of ways. For example, the indication may be provided to the personal audio device and used to alter the operational state of the personal audio device. The operational state may change from a relatively low power state (e.g., a sleep state or an unpowered state) to a relatively high power state (e.g., activating a digital connection between the personal audio device 202 and the host device 206, activating audio playback in the personal audio device, etc.). The indication may be provided to the host electronic device 206 for substantially the same purpose (e.g., to alter the operational state of the host electronic device 206, or to prompt the host electronic device to alter the operational state of the personal audio device 202) or for a different purpose (e.g., to lock the touch screen against input, etc.).

Authentication system 204 may perform a biometric authentication algorithm to detect whether an ear is present in the vicinity of personal audio device 202. This concept will be explained more fully below.

Fig. 3 is a schematic diagram showing the distribution of biometric authentication scores.

As described above, biometric authentication generally involves comparing a biometric input signal (in particular, one or more features extracted from the input signal) with a stored template for an authorized user. As described above, the stored templates are typically retrieved during a "registration" process. Some biometric authentication processes may also involve comparing a biometric input signal (or features extracted from the biometric input signal) to a "generic model" that describes the entire population rather than the biometrics of a particular authorized user. Some suitable embodiments of the comparison technique include Probabilistic Linear Discriminant Analysis (PLDA) and computation of cosine similarity.

The output of the biometric authentication process is a score indicating the likelihood that the biometric input signal is a signal of an authorized user. For example, a relatively high score may indicate a relatively high likelihood that the biometric input signal matches an authorized user; a relatively low score may indicate a relatively low likelihood that the biometric input signal matches an authorized user. The biometric processor may make a determination of whether to authenticate a particular user as an authorized user by comparing the biometric score to a threshold. For example, if the biometric score exceeds the threshold, the user may be authenticated; if the biometric score is below the threshold, the user may not be authenticated. The value of the threshold may be constant or may be variable (e.g., depending on the level of security desired).

The inventors have realized that in addition to making only whether the biometric input signal corresponds to an authorized user, the distribution of biometric scores may be used to make further differentiation; the biometric score may be used to determine whether an ear is present at all. Further, since the biometric characteristic of the input signal indicative of any ear is substantially different from the biometric characteristic of the input signal indicative of no ear, a determination as to the proximity of the ear relative to the personal audio device 202 can be quickly made without consuming significant power.

Thus, the distribution of biometric scores may fall into three categories 300, 302, 304. A relatively high biometric score in category 304 may indicate a biometric input signal (i.e., a match) originating from an authorized user. A relatively low biometric score 302 may indicate a biometric input signal originating from an unauthorized user (i.e., a mismatch). The lowest biometric score 300 may indicate a biometric input signal that does not correspond to the ear at all.

The first and second thresholds may be set to distinguish between the three categories. For example, the first threshold T ₁ may be set to a value that distinguishes between a biometric score 300 that indicates no ear and scores 302, 304 that indicate any ear (whether or not that ear belongs to an authorized user). The second threshold T ₂ may be set to a value that distinguishes between a biometric score 302 indicative of an unauthorized ear and a score 304 indicative of an authorized ear.

The values of these thresholds may be set using a machine learning algorithm, such as a neural network. Such machine learning algorithms may go through a training phase in which training data is input to the algorithm. The training data may include biometric scores plus corresponding categories of those scores as belonging to one of the three categories identified above (e.g., as determined by human input). One way to achieve this is for the machine learning algorithm to attempt to categorize the trained biometric score into one of the categories and then provide feedback (which may be positive or negative) to the category depending on whether it is correct. This feedback can then be used to adjust the applied threshold. Once the thresholds are properly trained, they can be put into practice.

A number of different ear biometric characteristics have been discussed above, including ear resonance and antiresonance, otoacoustic emissions, bone conduction speech signals, and heart rate variability. A biometric score and corresponding category or threshold may be generated based on any one or more of these features.

In the latter case, when more than one ear biometric feature is used to categorize the audio signal, different techniques may be utilized to fuse different biometrics into a single process. Embodiments of the present disclosure are not limited to any particular fusion technique.

In score level fusion, a separate biometric algorithm is applied to each ear biometric, thereby generating a plurality of separate biometric scores. These scores are then combined. One way to achieve this is to generate a single scalar score that is then compared to a scalar threshold (e.g., as shown in fig. 3). For example, a cosine similarity between the biometric measurement and the registered biometric template or a predetermined biometric template may be calculated. Another approach may take vectors of multiple biometric scores and the threshold then includes a hyperplane that distinguishes between different categories in a multidimensional space.

In contrast, the decision level fusion incorporates multiple discrete decisions from each biometric (i.e., a discrete biometric score based on each biometric and a discrete threshold). Different rules for combining multiple decisions may be determined to yield a single overall decision regarding the class of input biometric signals.

Fig. 4 illustrates a system 400 according to an embodiment of the present disclosure.

The system 400 includes a processing circuit 422, which processing circuit 422 may include one or more processors, such as a central processing unit or Application Processor (AP) or Digital Signal Processor (DSP). The one or more processors may perform the methods as described herein based on the data and program instructions stored in memory 424. The memory 424 may be provided as a single component or as multiple components, or as co-integrated with at least some of the processing circuitry 422. In particular, the methods described herein may be executed in processing circuitry 422 by executing instructions stored in non-transitory form in memory 424, where the program instructions are stored during manufacture of system 400 or personal audio device 202 or by uploading when the system or device is in use.

The processing circuit 422 includes a stimulus generator module 403, the stimulus generator module 403 being coupled directly or indirectly to an amplifier 404, the amplifier 404 in turn being coupled to a speaker 406.

The stimulus generator module 403 generates and provides an electrical stimulus signal to the amplifier 404, which amplifier 404 amplifies the electrical stimulus signal and provides the amplified signal to the speaker 406. Speaker 406 generates corresponding acoustic signals that are output to one or more ears of a user. The acoustic signal may be sonic or supersonic, for example. The acoustic signal may have a flat spectrum or be pre-processed in a manner that highlights those frequencies that facilitate good discrimination between individuals (i.e., having a higher amplitude than other frequencies).

As described above, the acoustic signal may be output to all or a portion of the user's ear (i.e., auricle 12a or ear canal 12 b). The acoustic signal reflects off the ear and the reflected signal (or echo signal) is detected by microphone 408 and received. Thus, the reflected signal contains data that is characteristic of the individual's ear and is suitable for use as a biometric.

The reflected data signal is transferred from the microphone 408 to an analog-to-digital converter (ADC) 410, where the reflected data signal is converted from the analog domain to the digital domain. Of course, in alternative implementations, the microphone may be a digital microphone and produce a digital data signal (thus eliminating the need to convert to the digital domain).

The signal is detected in the time domain by microphone 408. However, the features extracted for biometric process purposes may be in the frequency domain (because of the frequency response of the user's ear as a characteristic). Thus, the system 400 includes a fourier transform module 412, which fourier transform module 412 converts the reflected signal to the frequency domain. For example, the fourier transform module 412 may implement a Fast Fourier Transform (FFT).

The transformed signal is then passed to a feature extraction module 414, which feature extraction module 414 extracts one or more features of the transformed signal for use in a biometric process (e.g., biometric enrollment, biometric authentication, etc.). For example, the feature extraction module 414 may extract the resonant frequency of the user's ear. For example, the feature extraction module 414 may extract one or more mel-frequency cepstral coefficients. Alternatively, the feature extraction module may determine a frequency response of the user's ear at one or more predetermined frequencies or within one or more frequency ranges. The extracted features may correspond to data of a model for the ear.

The extracted features are transferred to the biometric module 416, which biometric module 416 performs a biometric procedure on them. For example, the biometric module 416 may perform a biometric registration in which the extracted features (or parameters derived from the extracted features) are stored as part of the biometric data 418 as characteristics of the individual. Biometric data 418 may be stored within the system or remotely from the system (and may be securely accessible by biometric module 416). Such stored data 418 may be referred to as "ear prints". In another embodiment, the biometric module 416 may perform biometric authentication and compare the one or more extracted features to corresponding features for the authorized user in the stored ear print 418 (or multiple stored ear prints).

In some embodiments, the stimulation waveform may be a tone of predetermined frequency and amplitude. In other embodiments, the stimulus generator may be configurable to apply music to a speaker, e.g., normal playback operation, and the feature extraction module may be configurable to extract a response function or transfer function from any signal component contained in the stimulus waveform.

Thus, in some embodiments, the feature extraction module 414 may be designed to have a prediction of the nature of the stimulus (e.g., knowledge of the spectrum of the applied stimulus signal), so that the response function or transfer function may be normalized appropriately. In other embodiments, feature extraction module 414 may include a second input to monitor the stimulus (e.g., playback music) to provide information to the feature extraction module regarding the stimulus signal or the spectrum of the stimulus signal such that feature extraction module 414 may calculate a transfer function from the stimulus waveform to the received acoustic waveform from which desired feature parameters may be derived. In the latter case, the stimulation signal may also be transmitted to the feature extraction module 414 via the FFT module 412.

As described above, the microphone 408 may be operable to detect bone-conducted speech signals. In this case, the biometric algorithm performed by the biometric module 416 may include checking whether the bone-conducted voice signal (i.e., detected in the microphone 408) and the air-conducted voice signal (i.e., detected in the voice microphone) match, i.e., correspond to, an acceptability. This will provide an indication that the personal audio device (i.e., the personal audio device including microphone 408) is being worn by the same user who is speaking into the voice microphone.

Thus, the biometric module 416 generates a biometric result 428 (the biometric result 428 may be a successful or unsuccessful generation of an ear print, and a successful or unsuccessful authentication), and outputs the result to the control module 402.

As will be apparent from the discussion above, the biometric module 416 according to embodiments of the present disclosure also performs an on-ear detection function. Thus, in accordance with embodiments of the present disclosure, the biometric module 416 further generates an output 426, the output 426 indicating whether an ear (any ear) is present in the vicinity of the speaker 406. An on-ear output 426 may also be provided to the control module 402.

Fig. 5 is a schematic diagram illustrating the acquisition and use of an audio signal 500 for the purposes of on-ear detection and ear biometric authentication in accordance with an embodiment of the present disclosure.

Because of the relatively low amplitude of the ear biometric features, the audio signals acquired by the personal audio device described herein may inherently have a low signal-to-noise ratio. In order to reliably distinguish between an authorized user's ear and an unauthorized user's ear, biometric algorithms may require a relatively large amount of data. This is because ear bioassays have a relatively low amplitude, and because ear bioassays vary only slightly from individual to individual. Thus, in order to have the necessary confidence that a particular biometric input signal is derived from an authorized user, a relatively large amount of data may be required (e.g., averaged over a relatively long period of time).

In contrast, the difference between the biometric input signal indicating the presence of any ear and the biometric input signal indicating the absence of any ear is more pronounced. For example, the audio signal acquired in the absence of any ear may have no heart beat, no resonance or antiresonance frequency, no otoacoustic emissions, etc. Thus, systems and methods according to embodiments of the present disclosure may be able to reliably discern between whether an ear is present based on relatively little data. In other words, on-ear detection according to embodiments of the present disclosure may be performed quickly and consume relatively little power. In a practical system, it is contemplated that a determination as to whether any ear is present may be reliably made based on 5 to 10 data frames, while a determination as to the presence of a particular ear (e.g., an authorized user's ear) may be reliably made based on approximately 100 data frames. In a system with a sampling rate of 100Hz, this would be equal to about 1 second of data. Thus, approximately 5% -10% of the computation typically required for determining the presence of a particular ear may be required to determine the presence of any ear.

This concept is illustrated in fig. 5, where an input audio signal 500 comprises a series of data frames 502-n (where n is an integer). Each data frame may include one or more data samples.

Three different scenarios are illustrated. In each case, a biometric algorithm is performed based on the audio signal, including comparing the biometric features extracted from the audio signal 500 to a template or ear print for the authorized user, and generating a biometric score indicating the likelihood that an ear of the authorized user is present. The biometric score may be based on accumulated data in the audio signal 500 and thus may evolve and converge towards a "true" value over time. The biometric algorithm may include one or more different types of ear biometric features, where the biometric algorithm includes a plurality of different types of ear biometric features, the ear biometric score or determination is fused, as described above.

In the illustrated embodiment, the biometric module first determines whether the audio signal 500 includes an ear biometric characteristic that indicates the presence of any ear. The determination may be based on relatively little data. In the illustrated embodiment, the biometric module 416 makes this determination based on a single frame of data; however, any number of data frames may be used to make this determination. The determining may include comparing the current biometric score to a threshold T ₁.

In scenario 1, the biometric module 416 determines that no ear is present, so that the biometric algorithm ends without further computation after the data frame 502-1. In particular, the biometric module 416 does not continue to determine whether the audio signal 500 includes an ear biometric characteristic that corresponds to an ear biometric characteristic of an authorized user. Of course, the algorithm may be repeated in the future, e.g., periodically or in response to detecting some event.

In scenario 2, biometric module 416 determines that an ear is present after data frame 502-1, and in response to that determination, proceeds to execute a "complete" biometric algorithm to determine whether the ear belongs to an authorized user. This process may require relatively more data so that in the illustrated embodiment, authentication decisions can only be reliably made after data frame 502-5. In scenario 2, this determination is negative (i.e., the user is unauthorized). Scenario 3 corresponds substantially to scenario 2, but the authentication device is affirmative (i.e., the user is authorized). In either case, the data from which the authentication determination is made may include more data frames than the data from which the on-ear detection determination is made. For example, the data may be averaged over all data frames. The determining may include comparing the current biometric score to a threshold T ₂.

Fig. 6 is a flow chart of a method according to an embodiment of the present disclosure. The method may be performed, for example, by the system 300 described above.

In step 600, the system obtains an audio signal. The audio signal may be acquired through a microphone 408 in the personal audio device, as described above. The audio signal may be obtained in conjunction with generating acoustic stimuli (e.g., to detect resonance frequency/antiresonance frequency, otoacoustic emissions, etc.) or without (e.g., when detecting osteoconductive speech, heart rate variability, etc.).

At step 602, one or more ear biometric features are extracted from an audio signal. This step may be performed, for example, by feature extraction module 414. After applying a fourier transform to the audio signal, features in the frequency domain may be extracted. The ear biometric characteristics may include one or more of the following: one or more resonance frequencies; one or more antiresonance frequencies; otoacoustic emissions; heart rate variability; and, a bone-conduction speech signal.

In step 604, a biometric algorithm is performed based on the audio signal, including comparing the biometric features extracted in step 602 with templates or ear prints for authorized users, and generating a biometric score indicating the likelihood that an ear of an authorized user is present. Where more than one type of ear biometric feature is employed, a biometric fusion technique may be used to fuse the ear biometric scores or determinations, as described above.

In step 606, the biometric score generated in step 604 is compared to a threshold T ₁, which threshold T ₁ distinguishes between a score indicating no ear and a score indicating any ear. If the comparison is negative, the method proceeds to step 608 and the method ends in step 608. Alternatively, a negative on-ear output signal may be generated, indicating that there is no ear in the vicinity of the personal audio device.

If the comparison in step 606 is affirmative (i.e., there is an ear), an affirmative on-ear output signal is generated. The system may respond to such output signals in many different ways, so in some embodiments the method may end at this point. That is, the biometric module 416 detects that the personal audio device is applied to the user's ear, and the personal audio device or host electronic device responds to such detection in its usual manner. In the illustrated embodiment, the method proceeds to step 610, where the personal audio device and/or the host electronic device is "woken up" from a low power state (e.g., a sleep state or an off state) in step 610. However, in other embodiments, the personal audio device may react, for example, by locking the touch screen to prevent further input, by canceling pausing the audio playback, or in any other way.

Accordingly, embodiments of the present disclosure provide methods, devices, and systems in which a biometric processor or module is used to perform an on-ear detection function.

In other embodiments of the present disclosure, the method continues with performing biometric authentication of the user in response to detection of the proximity of the ear in step 606. As described above, biometric authentication may require more data than ear proximity detection, thereby obtaining more audio signal data in step 612. For example, one or more additional data frames of the audio signal may be acquired.

In step 614, one or more ear biometric features are extracted from the audio signal data (i.e., the audio signal data acquired in step 600 and/or step 612), and in step 616 a biometric score is generated that indicates a likelihood that the extracted features match features of a template or ear print stored for the authorized user. Steps 614 and 616 may correspond substantially to steps 602 and 604 described above. The features used to generate the biometric score in step 616 may include the features extracted in step 602 and the features extracted in step 614.

In step 618, the score is compared to a threshold T ₂, which threshold T ₂ distinguishes the ear of the entire population from the ear of an authorized user. The threshold T ₂ is different from the threshold T ₁ applied in step 606, and in embodiments in which the biometric score is configured to increase as the likelihood of a match between the input and the stored template increases, the threshold T ₂ is higher than the threshold T ₁.

If the result of the comparison in step 618 is affirmative, the method proceeds to step 620, where the user is authenticated as an authorized user in step 620; if the result of the comparison in step 618 is negative, the method proceeds to step 622 where the user is not authenticated as an authorized user. Again, the system may respond to positive/negative authentication of the user in any manner. For example, a restricted action may be performed or may be prevented from being performed; settings specific to authorized users may or may not be applied. Many different possibilities exist and the disclosure is not limited in this respect.

Accordingly, the present disclosure provides methods, devices, and systems for performing on-ear detection using a biometric processor or module. By reusing the biometric processor in this manner, the dedicated circuitry that would otherwise be required for in-ear detection can be completely omitted from the personal audio device or host electronic device.

Embodiments of the present disclosure may be implemented in electronic, portable, and/or battery-powered host devices (such as smartphones, audio players, mobile phones, or cellular phones, handsets). Embodiments may be implemented on one or more integrated circuits disposed within such host devices. Embodiments may be implemented in a personal audio device (such as a smart phone, mobile or cellular phone, headset, earphone, etc.) that may be configured to provide audio playback to a single person. See fig. 1a to 1e. Again, embodiments may be implemented on one or more integrated circuits disposed within such personal audio devices. In further alternatives, embodiments may be implemented in a combination of a host device and a personal audio device. For example, embodiments may be implemented in one or more integrated circuits disposed within a personal audio device and one or more integrated circuits disposed within a host device.

It should be appreciated that various operations described herein, particularly in connection with the accompanying drawings, may be implemented by other circuits or other hardware components, particularly as persons of ordinary skill in the art having benefit of this disclosure. The order in which each operation of a given method is performed may be changed, and various elements of the systems illustrated herein may be added, reordered, combined, omitted, modified, etc. It is intended that the present disclosure cover all such modifications and variations, and therefore the above description should be regarded in an illustrative rather than a restrictive sense.

Similarly, while this disclosure refers to particular embodiments, certain modifications and variations may be made to those embodiments without departing from the scope and coverage of this disclosure. Furthermore, no benefit, advantage, or solution to problems described herein with respect to a particular embodiment is intended to be construed as a critical, required, or essential feature or element.

Likewise, other embodiments and implementations will be apparent to those of ordinary skill in the art having the benefit of this disclosure, and such implementations should be considered to be encompassed herein. Furthermore, those of ordinary skill in the art will recognize that a variety of equivalent techniques may be applied in place of or in combination with the embodiments discussed, and that all such equivalent techniques are to be considered to be encompassed by the present disclosure.

Those of ordinary skill in the art will recognize that some aspects of the apparatus and methods described above (e.g., discovery methods and configuration methods) may be embodied as processor control code, for example, on a non-volatile carrier medium such as a magnetic disk, CD-ROM or DVD-ROM, a programmed memory such as read only memory (firmware), or on a data carrier such as an optical signal carrier or an electrical signal carrier. For many applications, embodiments of the present invention will be implemented on a DSP (digital signal processor), an ASIC (application specific integrated circuit) or an FPGA (field programmable gate array). Thus, the code may comprise conventional program code or microcode or code, for example, for setting up or controlling an ASIC or FPGA. The code may also include code for dynamically configuring a reconfigurable device, such as a reprogrammable array of logic gates. Similarly, the code may include code for a hardware description language, such as Verilog ^TM or VHDL (very high speed integrated circuit hardware description language). As will be appreciated by one of ordinary skill in the art, the code may be distributed among a plurality of coupled components that communicate with each other. The embodiments may also be implemented using code running on a field (re) programmable analog array or similar device to configure analog hardware, where appropriate.

Note that as used herein, the term module shall be used to refer to a functional unit or block that may be implemented at least in part by dedicated hardware components (such as custom circuits), and/or by one or more software processors or appropriate code running on a suitable general purpose processor or the like. The modules themselves may include other modules or functional units. A module may be provided by multiple components or sub-modules that do not need to be co-located and that may be disposed on different integrated circuits and/or run on different processors.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design many alternative embodiments without departing from the scope of the appended claims or embodiments. The word "comprising" does not exclude the presence of elements or steps other than those listed in a claim or an embodiment, and "a" or "an" does not exclude a plurality, and a single feature or other unit may fulfill the functions of several units recited in the claims or embodiments. Any reference numerals or references signs in the claims or embodiments shall not be construed as limiting the scope of the claims or embodiments.

Claims

1. A system for detecting the presence of an ear in the vicinity of an audio device, the system comprising:

An input for obtaining an audio data signal from a transducer of the audio device; and

An ear biometric authentication module configured to:

comparing the one or more ear biometric features extracted from the audio data signal with an ear biometric template of an authorized user;

Calculating one or more scores based on the comparison, the one or more scores being indicative of distances between the one or more extracted ear biometric features and the ear biometric template;

comparing the one or more scores to a first threshold and a second threshold, wherein the first threshold and the second threshold are different from each other; and

Generating a first output based on a comparison of the one or more scores to the first threshold, the first output indicating whether there is any ear in the vicinity of the audio device, and generating a second output based on a comparison of the one or more scores to the second threshold, the second output indicating whether there is an ear of an authorized user in the vicinity of the audio device;

Wherein the first output is calculated from a first number of data frames based on the one or more extracted ear biometric features, wherein the ear biometric authentication module is further configured to generate a second output based on a comparison of the one or more extracted ear biometric features from a second number of data frames with an ear biometric template for the authorized user, the second number being greater than the first number.

2. The system of claim 1, wherein the one or more ear biometric features are averaged over a plurality of data frames.

3. The system of claim 1, wherein the second output is generated in response to determining that any ear is located in proximity to the audio device.

4. The system of claim 1, wherein the one or more ear biometric characteristics comprise one or more of: one or more resonance frequencies; one or more antiresonance frequencies; otoacoustic emissions; heart rate variability; bone conduction speech signals.

5. The system of claim 1, wherein the extracted ear biometric features comprise a plurality of different types of ear biometric features, and wherein the ear biometric authentication module is operable to apply a biometric fusion technique to generate the first output.

6. The system of claim 1, wherein the data signal is an audio signal.

7. An electronic device, comprising:

The system of claim 1.

8. The electronic device of claim 7, wherein the electronic device is an audio device.

9. The electronic device of claim 7, wherein the electronic device is a host device coupled to the audio device.

10. A method of detecting the presence of an ear in the vicinity of an audio device, the method comprising:

comparing one or more ear biometric features to an ear biometric template of an authorized user, the one or more ear biometric features extracted from an audio data signal obtained from a transducer of the audio device;

Wherein the first output is calculated from a first number of data frames based on the one or more extracted ear biometric features, wherein the second output is generated based on a comparison of one or more ear biometric features extracted from a second number of data frames with an ear biometric template for the authorized user, the second number being greater than the first number.

11. The method of claim 10, wherein the second output is generated in response to determining that any ear is located in proximity to the audio device.

12. The method of claim 10, wherein the one or more ear biometric features are averaged over a plurality of data frames.

13. The method of claim 10, wherein the one or more ear biometric characteristics comprise one or more of: one or more resonance frequencies; one or more antiresonance frequencies; otoacoustic emissions; heart rate variability; bone conduction speech signals.

14. The method of claim 10, wherein the extracted ear biometric features comprise a plurality of different types of ear biometric features, and wherein the method further comprises applying a biometric fusion technique to generate the first output.

15. The method of claim 10, wherein the data signal is an audio signal.