CN117731325A

CN117731325A - Cardiac measurement using acoustic techniques

Info

Publication number: CN117731325A
Application number: CN202311212016.4A
Authority: CN
Inventors: 王礼俊; T-D·W·萨尤克斯
Original assignee: Apple Inc
Current assignee: Apple Inc
Priority date: 2022-09-20
Filing date: 2023-09-20
Publication date: 2024-03-22

Abstract

The present invention relates to cardiac measurement using acoustic techniques. Ultrasonic waves are output from a speaker of the head-mounted device. A microphone signal is obtained from a microphone of the head-mounted device that senses ultrasonic waves as they are reflected from the user's ear. A heart activity, such as a heart rate of the user, is determined based at least on the microphone signal. Other aspects are described and claimed.

Description

Cardiac measurement using acoustic techniques

This patent application claims the benefit of earlier dates of application of U.S. provisional application Ser. No. 63/376,349, U.S. non-provisional application Ser. No. 18/460,457, U.S. non-provisional application Ser. No. 18/460,484, U.S. No. 2023, 9, 1, and 20, respectively, filed by 9 and 20.

Technical Field

The present disclosure relates to cardiac measurements using acoustic techniques.

Background

The human heart is the main organ that pumps blood through the circulatory system of the human body. The human heart comprises four main chambers that operate in a synchronized manner to circulate blood through the body. Cardiac motion such as contraction of the left atrium and left ventricle or the right atrium and right ventricle, and movement of blood through the heart, may be referred to as heart activity. The cardiac activity may include a cardiac cycle (e.g., heart beat) of the heart that indicates phases of relaxation (diastole) and contraction (systole) of the heart. Cardiac activity may be indicative of a person's health, such as a risk or propensity toward heart lesions.

Heart attacks include a range of disorders related to the human heart, such as, for example, vascular disease (e.g., coronary artery disease), heart rhythm problems (e.g., cardiac arrhythmias), heart defects (e.g., congenital heart defects), heart valve disease, myocardial disease, heart infection, or other heart attack. The number of beats of the heart within a particular period of time (e.g., within a minute) may be referred to as the heart rate. The heart rate of a person may be indicative of heart health, heart disease, and health of the circulatory system.

Disclosure of Invention

In one aspect, a computing device includes a processing device (e.g., a processor) configured to: causing ultrasonic waves to be output from a speaker of a head-mounted device when the head-mounted device is worn on or in an ear of a user; obtaining a microphone signal of a microphone of the head-mounted device that receives reflected ultrasonic waves in response to the output ultrasonic waves; and determining heart activity (e.g., heart rate) of the user of the headset based at least on the microphone signal. For example, ultrasonic waves may reflect from the ear canal, eardrum, pinna, and/or other surfaces of the ear of the user.

Determining the cardiac activity of the user may include detecting a change in phase of the ultrasound wave in the microphone signal over time. The change in phase may be related to a change in path length of the ultrasonic wave from the speaker to the microphone due to movement of the surface of the ear as the ultrasonic wave is reflected. Further, the change in path length may be related to the heart activity of the user. In this way, the phase change may be related to a change (e.g., shortening and lengthening) in the path length of the ultrasound wave, which may be caused by blood pumping through the user's body, which results in an elevation or depression of the ear surface.

In some examples, the microphone may be positioned within the ear canal of the user when the user wears the headset. For example, a headset may include an earpiece that houses a speaker and a microphone. The speaker is driven to output ultrasonic waves, and the microphone senses reflected ultrasonic waves.

In some examples, the headset may be worn in (or above) the user's ear. When worn in (or above) the ear, the microphone is positioned to be sufficient to receive reflected ultrasonic waves from the ear.

In some examples, determining cardiac activity may include heterodyning the reflected ultrasonic wave to generate a heterodyning signal having a near-zero frequency, wherein the heterodyning signal includes a relative phase between the output ultrasonic wave and the reflected ultrasonic wave, or a sensed time of flight between the output ultrasonic wave and the reflected ultrasonic wave, or a transfer function between the output ultrasonic wave and the reflected ultrasonic wave. The probing signal may comprise a sum of at least one sinusoid, which may also be referred to as probing tone. The frequency of each sinusoid may be fixed or it may have a time varying frequency. To output audio content and ultrasound through a speaker, the probe signal may be combined with an audio signal containing audio content (e.g., songs, acoustic music, podcasts, audio of audiovisual work, telephone calls, etc.). In this way, the heart activity of the user may be determined under normal use of the head mounted device.

In some examples, determining the heart activity of the user includes applying a machine learning algorithm to the microphone signal to determine the heart activity of the user. The machine learning algorithm may be trained to correlate the phase changes of the sensed ultrasound signals with heart activity.

In some examples, determining the cardiac activity of the user includes processing the microphone signal through a low pass filter. The low pass filter may be applied to a combination of the microphone signal and the detection signal (e.g., heterodyning signal) and filter out all components except for heart activity (e.g., heart rate).

Determining heart activity may include detecting peaks of heart activity sensed in the microphone signal to determine a heart rate. Heart activity (e.g., heart motion) can cause various tics on the surface of the ear, while peaks in heart activity can indicate a complete cycle (e.g., heart beat).

In some aspects, the computing device is separate from the head-mounted device. For example, the computing device may be communicatively coupled to the head-mounted device (e.g., through one or more electrical conductors or through a wireless transmitter and receiver). The computing device may be a companion device to the head mounted device, such as a smart phone, computer, tablet computer, smart speaker, server, or other computing device.

In other examples, the computing device is integral with the headset device. For example, in one aspect, a headset includes a speaker and a microphone. The head-mounted device further includes a processor configured to: causing an ultrasonic wave to be output from the speaker, obtaining a microphone signal generated from a microphone that senses the ultrasonic wave as the ultrasonic wave is reflected from the user's ear, and determining heart activity of the user of the headset based at least on the microphone signal.

Cardiac activity may be determined without the use of additional sensors. For example, cardiac activity may be determined based on microphone signals without an accelerometer, a photoplethysmography (PPG) sensor, or any other sensor.

The determined heart activity may be stored in a computer readable memory (e.g., a non-volatile computer readable memory) and used for various purposes. The heart activity (e.g., heart rate) may be presented to the user on a display and/or as an audible message (e.g., through a speaker of the device). The display may be integral with the headset or separate. In some aspects, cardiac activity may be used to detect a risk or indication of one or more cardiac lesions. Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.

The above summary does not include an exhaustive list of all aspects of the disclosure. It is contemplated that the present disclosure includes all systems and methods that can be practiced by all suitable combinations of the various aspects summarized above, as well as those disclosed in the detailed description below and particularly pointed out in the claims section. Such combinations may have particular advantages not specifically set forth in the foregoing summary.

Drawings

Aspects of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements. It should be noted that references to "a" or "an" aspect in this disclosure are not necessarily to the same aspect, and they mean at least one. In addition, for the sake of brevity and reducing the total number of drawings, features of more than one aspect of the disclosure may be illustrated using a given drawing, and not all elements in the drawing may be required for a given aspect.

Fig. 1 illustrates a system for detecting heart activity using a head-mounted device, according to some aspects.

Fig. 2 illustrates an example of a head mounted device that may be used to determine heart activity in accordance with some aspects.

Fig. 3 illustrates an example of using a head mounted device to determine heart activity in accordance with some aspects.

FIG. 4 illustrates an exemplary workflow for determining cardiac activity by ultrasound, according to some aspects.

Fig. 5 depicts a graph depicting an indication of heart activity using microphone signals, in accordance with some aspects.

Fig. 6 illustrates four waveforms used in an exemplary method for determining heart activity using a chirped probe.

Fig. 7 is a graph of time series of change values and resulting heartbeat values produced by a difference detector for a chirp-based method.

Fig. 8 illustrates an example of an audio processing system in accordance with some aspects.

Detailed Description

Cardiac activity may include cardiac motion, such as contraction of the left atrium and left ventricle or the right atrium and right ventricle, and movement of blood through the heart. The cardiac activity may include a cardiac cycle (e.g., heart beat) of the heart that indicates phases of relaxation (diastole) and contraction (systole) of the heart. Under normal heart activity, ventricular diastole begins with isovolumetric relaxation and then three sub-phases of inflow occur, namely: rapid inflow, diastole and atrial contraction. Cardiac activity may be indicative of a potential heart condition or risk of heart condition. Heart lesions may include diseases or abnormalities of the heart that may result in a decrease in the ability of the heart to effectively pump blood through the human body. Such cardiac lesions may be identified by or associated with irregular cardiac activity.

In-ear headphones, headsets, and other hearing devices may be used to listen to music, reduce noise, and/or enhance hearing. In some aspects of the disclosure, these devices may be equipped with an acoustic transducer (e.g., a microphone) arranged to capture sound inside the ear (e.g., in the ear canal of the user). In some examples, the same or different microphones may be used for active noise reduction, transparency, and adaptive equalization. The acoustic transducer may sense sound (e.g., vibration) and generate a signal (e.g., a microphone signal) that varies in magnitude with time and/or frequency.

In addition, the sensors of these devices may pick up body sounds such as respiration rate, heart beat, and chewing sounds. The role of in-ear headphones, headsets or other hearing devices may be extended to support the creation of phonocardiograms and ballistocardiograms.

The head-mounted device may include one or more microphones and one or more speakers located in the ears of a user (e.g., a wearer of the device). One or more of the speakers may output ultrasonic waves that are inaudible to the human ear. The microphone may sense acoustic energy in its surroundings, such as how the ultrasonic waves output by the speaker reflect from one or more surfaces of the user's ear. The sensed acoustic energy may be characterized in a microphone signal generated by each microphone. The microphone signal may be processed to determine a change in the sensed ultrasound wave, which may be correlated to motion in the user's ear, which may then be analyzed to determine the user's heart activity.

Fig. 1 illustrates a system 100 for detecting cardiac activity using a head-mounted device 102, in accordance with some aspects. Some or all of the blocks described in this example or other examples may be performed by processing logic that may be integral to computing device 124. The processing logic may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, a processor, a processing device, a Central Processing Unit (CPU), a system-on-a-chip (SoC), a machine-readable memory, etc.), software (e.g., machine-readable instructions stored or executed by the processing logic), or a combination thereof.

In some examples, computing device 124 may be separate from headset 102. For example, computing device 124 may include a smart phone, computer, cloud-based server, smart speaker, tablet computer, or other computing device. In some examples, computing device 124 may be a wearable computing device (e.g., a wristwatch). In some examples, the computing device 124 may be partially or fully integrated within the head-mounted device 102.

The head-mounted device 102 may be worn on or in an ear 116 of the user 104. The head-mounted device 102 may include in-ear sensing technology (e.g., one or more microphones 110) and apply one or more algorithms 122 to the microphone signals 118 to detect cardiac activity 114. As used herein, the headset 102 may be worn in any suitable manner to create a suitable seal with the user's ear, e.g., on top or on top for an ear-covering earphone or inserted into the ear canal for an in-ear device. For example, an earplug (in-ear earphone) may include a compressible tip (e.g., silicone or rubber) that acoustically seals an appropriately worn ear canal. An ear-covered (also referred to as periaural) earphone set may have a cushion that seals acoustically against the head (rather than the ear canal). The ear mount may include a cushion that presses and seals against the ear.

The head mounted device 102 may include headphones that are worn in or on the ears 116 of the user 104. For example, the head-mounted device 102 may include earpieces that are worn over the outer ear of the user such that the earpieces partially enter the ear canal of the user. In another example, the head-mounted device 102 may include a shell and pad combination that is worn over or on top of a user's ear. When properly worn, the headset may create a seal for the user to acoustically separate the ultrasound waves from the surrounding environment.

The head mounted device 102 may include a microphone 110 that generates a microphone signal 118. In some examples, the headset 102 may include multiple microphones, and each microphone may generate a respective microphone signal that is processed separately, as discussed.

The processing logic may cause the ultrasonic waves 128 to be output from the speaker 108 of the headset 102. For example, processing logic may provide audio signal 126 to drive speaker 108. The audio signal 126 may include an inaudible ultrasonic probe signal as well as audible audio content, such as music, telephone conversations, or other audio content. The processing logic may combine the probe signal (comprising one or more ultrasonic sinusoids) with an audio signal comprising audio content (producing audio signal 126) to output the audio content and ultrasonic waves through a speaker. The resulting audio signal 126 may be used to drive the speaker 108 to output ultrasonic waves 128 into the ear canal of the user. The ultrasonic waves 128 may be sensed by the microphone 110 of the device and processed as discussed. In this way, in the example in question, the device may be used as a hearing device for outputting content while also detecting heart activity of the user.

At signal processing block 112, processing logic obtains a microphone signal 118 generated from a microphone of the headset that senses ultrasonic waves as they are reflected from the user's ear. The processing logic may determine the heart activity 114 of the user 104 of the head mounted device 102 based at least on the microphone signal 118. The processing logic may apply one or more algorithms 122 in the processing of the microphone signal 118 to determine the heart activity 114.

As described, the cardiac activity 114 may include movement of the user's heart 106, such as contraction of the left atrium and left ventricle or right atrium and right ventricle, or movement of blood through the user's heart 106. The cardiac activity 114 may include expansion and contraction of arteries throughout the body (e.g., arteries located at or around the user's ear). The heart activity 114 may include waveforms that vary in magnitude over time and/or frequency to correspond to movement of the heart or blood. In some examples, the heart activity 114 may include a heart rate of the user.

At signal processing block 112, processing logic may detect a phase change of the ultrasonic waves sensed in the microphone signal over a period of time to determine heart activity 114 of the user. When the heart pumps blood around the ear canal of the user, the skin in the ear canal may deflect in response to the vascular pressure wave. Slight variations in the shape of the ear canal result in slight variations in the magnitude and phase response of the transfer function between the speaker 108 and microphone 110. The processing logic may correlate the phase (and/or magnitude) change with a path length change or resonance (e.g., transfer function) change of the ultrasonic wave from the speaker to the microphone as it reflects from the user's ear. The path length or resonance change may also be related to the heart activity of the user.

In some examples, this period of processing the microphone signal 118 may be greater than the period of the user's heartbeat to capture at least one complete heartbeat cycle. In some examples, the period of time may be greater than one minute in order to capture the user's heartbeat in an entire minute.

Algorithm 122 may include combining (e.g., heterodyning) one or more features or characteristics of microphone signal 118 with a probe signal (e.g., one or more probe tones being output). The characteristic or property of the detection signal (which may be a real signal) that appears in the heterodyning signal (which may be a complex signal) is frequency modulation. As the microphone picks up the output probing tone, these features or characteristics appear in the output probing tone (output by the speaker) and in the microphone signal. This feature allows heterodyning to be performed in a separate manner, as discussed further below in connection with fig. 4. The results of the outward error operation may be filtered to isolate the phase changes of the ultrasound waves that are most relevant to heart activity. For example, the output ultrasound waves may include one or more probe tones, and each corresponding reflected ultrasound probe tone may be heterodyned to generate a corresponding heterodyne signal having a near zero frequency (i.e., between zero or dc and 5Hz, or between zero or dc and 100 Hz). Each respective heterodyne signal may be filtered to remove components of the respective heterodyne signal other than the corresponding reflected ultrasound probe tone. The filtering may remove or filter out components or frequencies other than near zero components, e.g., remove components above near zero components (such as above 100 Hz). Processing (including difference detection and peak detection as discussed in more detail below) may then be performed on one or more near zero components, but not on other components at higher frequencies, to calculate the heart rate.

In some examples, the algorithm 122 may include an artificial neural network or other machine learning model trained to detect cardiac activity 114 in the microphone signal based on the phase changes of the sensed ultrasound waves. For example, the artificial neural network may be trained with a sufficiently large data set of microphone signals (e.g., training data) having ultrasound reflections of the inner ear and a target output of cardiac activity (e.g., waveforms corresponding to measured cardiac activity of the training data) to enhance the artificial neural network, thereby correlating the sensed ultrasound waves in the microphone signals with cardiac activity. The training data may include ground truth data including a true measurement of heart activity.

Training an artificial neural network may involve using an optimization algorithm to find a set of weights to optimally map an input (e.g., microphone signal with sensed ultrasound components) to an output (e.g., heart activity). These weights may be values representing the strength of the connections between the neural network nodes of the artificial neural network. During training, machine learning model weights may be trained to minimize differences between a) outputs generated by the machine learning model based on input training data and b) approval outputs associated with the training data. The input training data and the target output of the training data may be described as input-output pairs, and these pairs may be used to train a machine learning model in a process that may be referred to as supervised training.

Training of the machine learning model may include optimizing a cost function using linear or nonlinear regression (e.g., least squares) to reduce errors in the output of the machine learning model (as compared to approval findings of training data). The error is propagated back through the machine learning model, causing an adjustment of the weights that control the neural network algorithm. This process may be repeated for each recording to adjust the weights so that errors are reduced and accuracy is improved. The same set of training data may be processed multiple times to refine the weights. Training may be completed once the error has been reduced to meet a threshold, which may be determined by routine testing and experimentation. The trained machine learning algorithm may be trained to correlate the phase or magnitude changes of the sensed ultrasound signals with heart activity. In some examples, the machine learning algorithm may take as input a microphone signal and an output audio signal. The machine learning algorithm may be trained to identify relative phase information of the output ultrasound waves (in the output audio signal) and the reflected ultrasound waves (in the microphone signal) and correlate the relative phase information with heart activity.

Fig. 2 illustrates an example of a head mounted device 202 that may be used to determine cardiac activity in accordance with some aspects. The headset 102 may include an ear bud earphone that fits in the ear canal 216 of the user.

The head mounted device 202 may emit ultrasound waves from a speaker 204 of the head mounted device. The speaker 204 may be housed within the body of the head-mounted device 102 in an orientation and position sufficient to direct acoustic energy from the speaker 204 toward the user's ear (e.g., directly into the user's ear canal 216).

The head-mounted device 202 may include a microphone 206 that senses ultrasonic waves as they reflect from an ear (e.g., the user's ear canal 216). The microphone 206 may be an error microphone or an internal microphone, rather than an external microphone that directly senses ambient sound. The internal or error microphone may be configured or arranged to directly receive sound produced by the earphone speaker. Microphone 206 may encode the sensed sound field in the microphone signal. The head mounted device 202 may include processing logic 218 that determines cardiac activity of the user based at least on the microphone signal 220, such as described below in connection with fig. 4. The heart activity may include heart rate, e.g., "X" beats per minute (bpm).

The processing logic 218 may detect one or more changes in the phase or magnitude of the frequency response or resonance of the following system: from the input of the speaker, the acoustic output is to the surface of the ear, and then the acoustic return is to the microphone, and then to the output of the microphone. The processing logic 218 may correlate this change in phase or magnitude with a change in length or resonance of the acoustic path 208, 210 of the ultrasonic wave traveling from the speaker to the microphone as it is reflected from the user's ear. The processing logic 218 may correlate the change in length or resonance of the paths 208, 210 with the heart activity of the user.

For example, when the heart pumps blood around the user's ear canal, the skin 212 and 214 of the user's ear canal 216 deflect in response to the vascular pressure wave. Slight variations in the shape of the ear canal (caused by these deflections) may result in slight variations in the magnitude and/or phase response of the transfer function between the speaker 204 and the microphone 206. For example, a path length change [ delta_x ]208 or 210 between the speaker 204 and microphone 206 may cause a corresponding change in the relative phase of the sensed ultrasonic waves. The wavelength of the sound wave can be expressed as λ=c/f, where c is the speed of sound (343 m/s) and f is the frequency, which provides:

and->

For example, at 20kHz, the skin deflection variation (delta_x) may be 1mm, corresponding to a relative phase shift of 0.366 radians at the reflected ultrasound component. The processing logic may detect such a change in relative phase and correlate such a phase change with the path length. The 10 μm (micrometer) path length variation may be correlated to 3.66 milliradians. Advantageously, this phase shift may be adequately measured by microphone 206.

At the signal processing block 112, processing logic may measure the modulation of the relative phase between the emitted probe signal (through the speaker 108) and the sensed ultrasonic wave (in the microphone signal 118). The modulation of the measured phase is robust to noise and constant to amplitude. Thus, sensing the ultrasound signal in the microphone signal may provide a sufficiently robust heart activity measurement.

Furthermore, by outputting one or more ultrasonic tones, the processing logic may sense the user's heart activity without disturbing the user, assuming that the ultrasonic tones are inaudible above a normal human hearing range, e.g., >20kHz. Ambient sounds, such as music and speech, often have sparse ultrasound content. Ultrasound tones provide excellent pulsation tracking with arbitrarily low pulse rates. Typical heart rates may be between 25-200bpm or 0.42-3.33 Hz.

Advantageously, the head mounted device 102 can utilize existing hardware without requiring additional sensors such as an accelerometer, a light sensor (e.g., PPG sensor), or another sensor. No additional hardware is required other than microphone 206 and speaker 204, as well as the ability to inject probing tones into the speaker output. The microphone 206 may further be used for other purposes (e.g., echo cancellation). The variation in amplitude response may be negligible and no precise frequency response is required for the speaker 204 and microphone 206 system, so long as the microphone exhibits low nonlinear distortion and adequate signal-to-noise ratio (SNR).

Fig. 3 illustrates an example of using a headset 302 to determine heart activity, according to some aspects. The headset 302 may include a non-in-ear earphone, such as an ear-covering earphone or an ear-mounted earphone. The head mounted device 302 may include a housing 304 that is worn on or over the user's ear. The housing 304 may include a gasket 320 that creates a seal between the housing 304 and the user such that the acoustic environment 306 outside the housing is substantially separated from the acoustic environment 308 inside the housing.

Similar to other examples, the head mounted device 302 includes a speaker 310 that may be housed within the housing 304 in an orientation and position sufficient to direct acoustic energy from the speaker 310 toward the user's ear (e.g., into the user's ear canal 314). Speaker 310 may emit ultrasonic waves. The head mounted device 302 may include a microphone 312 that senses ultrasonic waves as they reflect from an ear (e.g., the ear canal 314 of a user). Microphone 312 may be an internal microphone or an error microphone disposed in housing 304 to capture sound from speaker 310 as well as reflected sound from the user's ear. The processing logic 316 may determine heart activity (e.g., heart rate or heartbeat) of the user based at least on the microphone signal 318, as described in other portions. In some examples, processing logic 316 is integral with head mounted device 302. In other examples, the processing logic 316 may be partially or completely separate from the head mounted device 302.

FIG. 4 illustrates an exemplary workflow 400 for determining cardiac activity by ultrasound, according to some aspects. The operations and blocks of the described workflow 400 may be performed by processing logic and head-mounted devices corresponding to other examples.

The detection signal generator 402 generates a detection signal 422 comprising one or more ultrasound components. In some examples, the probe signal 422 may include a plurality of ultrasonic sinusoids added together by the probe signal generator 402. The ultrasonic sinusoids may each have a fixed frequency and be sufficiently spaced apart in the combined probe signal, e.g., the probe tones may be

Wherein a is _k Amplitude of is and phi _k Is of a corresponding phase and the detection signal may be

Wherein N is _f Is the number of probing tones.

Due to the narrow bandwidth resulting from low pass filtering (described further below), the sinusoidal probe tones may be placed at intervals of 80-120Hz (e.g., 100Hz apart). For the ultrasonic frequency band between 20-40KHz, the processing logic may combine up to 200 different probe tones with such spacing. For example, processing logic may determine f _k ＝kf _spacing +f _base Wherein f _base Is the lowest frequency in the range (e.g. 20kHz) And f _spacing Is the spacing or minimum spacing between each of the probe tones (e.g., f _spacing =100 Hz). To mitigate high crest factors, relative phase φ _k May be random.

In other examples, the probe signal may include a click, a chirp, a pseudo random noise (such as a maximum length sequence), or a gray code.

At operation 404, the processing logic may combine the probe signal 422 with an audio signal of the audio content 406, which may include podcasts, music, telephone calls, sounds of audiovisual work, or other audio content. The resulting audio signal 428 may include the audio content 406 and the probe signal 422. The processing logic may drive the speaker 408 to output audio content having ultrasonic waves 426 through the speaker 408. The ultrasonic waves 426 may be inaudible to a listener. Ultrasonic waves (which may include multiple ultrasonic components) are sensed by the microphone 410 as they reflect from the surface 424 of the user's ear. Microphone 410 may be an internal microphone or an error microphone. Surface 424 may include an interior portion of a user's ear, such as the user's ear canal or eardrum.

To determine cardiac activity 434, at signal processor 416, processing logic measures the change (of phase and/or magnitude, transfer function). Which measures the modulation of the relative phase between the transmitted probe signal (transmitted through speaker 408) and the sensed ultrasonic wave (picked up in microphone signal 430). This is based on the microphone signal 430. In one example, the received signal rx (n) in the microphone signal may have the form:

Wherein the method comprises the steps ofIs at f _k The kth receive sinusoid of the next time-varying amplitude, and +.>Is the corresponding time-varying phase.

Combiner 412 may combine or heterodyne microphone signal 430 with heterodyne signal 423 to produce combined signal 418 (or also referred to herein as a heterodyne signal). As will be seen below, this operation will isolate the reflected probe signal contained in the microphone signal 430. Heterodyne action signal 423 may be a complex valued function described as a "matching" ultrasonic signal that matches a real valued function of detection signal 422 being output by (or driving) a speaker, where the term "matching" means that their phase changes or timing (including frequency) or frequency modulation are synchronized; for example, heterodyne signal 423 may be a complex valued version or copy of detection signal 422 (or detection tone), as shown by the dashed line in fig. 4; in another example, heterodyne action signal 423 may be a separately generated complex valued function that is generated by receiver signal processor 416 to synchronize with detection signal 422. For example, a signal detector in signal processor 416 may detect a timing offset in microphone signal 430 and then generate a heterodyne signal based on the timing offset, wherein the heterodyne signal is corrected according to the timing offset. The timing offset may be detected from the frequency deviation delta_f in the case of, for example, a linear chirp, where delta_t=delta_f/γ, where γ is the predetermined linear chirp frequency sweep rate in Hz per second. The signal processor 416 then combines the heterodyne signal with the real microphone signal 430 to produce a heterodyne signal.

Where detection signal 422 includes a plurality of fixed frequency sinusoids, microphone signal 430 may be combined or heterodyned with each of the detection tones, respectively, to generate a plurality of resulting heterodyned signals, one for each detection tone. Heterodyning may refer to multiplying a signal by the complex conjugate of the heterodyning signal. Heterodyne action signal 423 may be a fixed frequency sinusoid, chirp, maximum length sequence, or other decoding sequence. For example, if included at frequency f _m Is multiplied by the signal of the sinusoidal component at frequency f _k Complex conjugate of a pure complex sinusoid of (2), then the result includes a complex conjugate at f _m -f _k Is included in the (c) sinusoidal components. Concrete embodimentsGround, when f _m ＝f _k The resulting heterodyne signal component is at frequency 0 and can be used to determine the path length of the acoustic wave. The change in path length of the acoustic wave may be related to heart activity. Thus, combined signal 418 may be comprised of a set of one or more heterodyne signals or signal components. The combined signal 418 may include a heterodyne signal per carrier frequency in the output ultrasound. Some of the resulting signals may exhibit cardiac activity better than others. Those signals may be selected for use in determining heart activity, while signals with a lower SNR may be discarded or ignored. Thus, in some examples, the combined signal 418 may exclude some components that exhibit noise or that do not sufficiently exhibit cardiac activity.

In some aspects, processing logic may use heterodyning demodulator and low pass filter 414 to isolate each sensed or received ultrasonic component. Each heterodyne demodulator can be expressed as:

and for each m, the corresponding filtered partial detection signal sensed in the microphone signal may be expressed as a heterodyne signal:

wherein LPF is a low pass filter 414 and all frequency components except the frequency component at DC have been filtered out, wherein f _m ＝f _k 。

In another aspect, referring now to fig. 6, a heart rate measurement method uses output ultrasound waves that span a sequence of frames, where each frame includes a probe tone whose frequency varies within the frame. In the example of the top waveform in fig. 6, each frame is 0.05 seconds long and the frequency of the single probe tone increases linearly from about 20kHz to 40kHz, and this is repeated for each successive frame, as shown. In other words, the time-varying instantaneous frequency is used to detect the signal p, and thus the received signal rx, the demodulator h and the filtered heterodyne signal q can be expressed as:

and

q(n)＝LPF(rx(n)h*(n))

examples of such time-varying frequency modulation include periodically varying sawtooth frequency modulation as depicted in the probe tone waveform at the top of fig. 6. It can be expressed mathematically as:

The neutralization is a low value and a high value of the frequency modulation range and is the number of samples in the period. In the example shown, the ultrasound chirp is periodic (period 5-10 milliseconds) and the frequency modulation (or chirp) sweeps linearly from 20kHz to 40kHz, although other combinations of frame lengths, frequency endpoints, and sweep profiles are possible.

Next, a received signal Rx picked up by a microphone (labeled "microphone" in fig. 6) is shown. The microphone signal (generated by the internal microphone of the headset) contains reflected ultrasonic waves in response to the output ultrasonic waves, the probe tone p (n), wherein the reflected ultrasonic waves are reflected from the surface of the user's ear. The microphone signal has a blurred spectrum because it contains the probe tone (Tx) and its reflections (e.g., convolutions with the ear canal transfer function).

Next in fig. 6 is a graph of a heterodyne signal, which may be represented in the above equation as rx (n) multiplied by the complex conjugate of h (n). The rx (n) signal (which includes reflection of the reflected ultrasonic wave or the probe tone in the microphone signal) is multiplied by the complex conjugate of h (n). In one example, the heterodyne signal may be represented by h (n). The signal or function h (n) may be an analytic signal generated by adding the imaginary orthogonal component of the hilbert transform from the probe tone p (n). The probe tone p (n) may be equivalent to the real component of the heterodyne effect signal h (n). More generally, reflected ultrasound waves are considered to be heterodyned (to generate a heterodyned signal) by matching a time-varying frequency signal, as that term is described above.

Heterodyne signals have near zero frequency components and other components at higher frequencies. As seen in the example shown in the third graph of fig. 6, the heterodyne signal has a flat frequency spectrum with the desired component q (n) at near zero frequency, and thus can be isolated by applying a Low Pass Filter (LPF) to the heterodyne signal, resulting in the graph at the bottom of fig. 6.

In other examples of frequency modulation that may be used (instead of frequency modulation with a sawtooth instantaneous frequency as depicted in fig. 6), the sequence of frames in the output ultrasound wave has an instantaneous frequency such as a triangle wave or a sine wave.

Each of the filtered signals 420 may be used to provide some detailed information to determine heart activity 434. In addition, the low pass filter 414 may remove interfering audio content such as from the speaker 408, audio content 406, speech, or other environmental noise. The low pass filter 414 may be designed based on a tradeoff between wider bandwidth, more detail, more noise on the one hand, and narrower bandwidth, less detail, less noise on the other hand. For example, for a typical heart rate of 60BPM, the fundamental frequency may be 1Hz, and a filter with a bandwidth as wide as 10Hz may provide sufficient detail about the periodic structure of the signal while rejecting interference. The filter may have a stop frequency that rejects noise and avoids channel overlap. In some examples, filter 414 may have a bandwidth of 10Hz and a stop band of 50 Hz. The filter 414 may include, for example, -100dB of stop band attenuation to satisfactorily reject noise. Digital low pass filters may include butterworth, elliptic, chebyshev, and other designs.

In a method for measuring heart rateIn one aspect, referring now to fig. 4, a difference detector 432 is used to calculate heart rate as follows. Each filtered signal q that may appear at the output of the low pass filter 414 is obtained _m (n) time-varying phase of 420This may be done by performing a complex arg () operation on the difference signal (in the combined signal 418), e.g.>The purpose is to measure the ripple of the phase signal. In order to avoid phase winding problems, it is more convenient to pass +.>Working by variation of (e.g.)

The instantaneous phase can be expressed as:

wherein the global phase is removed. Next, peak detection is performed, wherein difference detector 432 may detect the presence of each peak in the signal by determining when it occursOr->The period of the pulse waveform is found. In some examples, this may include calculating +.>Is a zero crossing point of (c). Other techniques may be applied to detect the peak and/or period of the pulse waveform. Once the time T is determined _l In (a) and (b)Peak position, where T _l Is the time of the first peak, and the instantaneous pulse period is the time distance or time interval DeltaT between successive or uninterrupted peaks _l ＝T _l -T _l-1 . The heart rate may then be determined to be 60/DeltaT _l Beat/min.

In another aspect of the method for measuring or calculating heart rate, referring now to the equation above for response q (n) and the chirp implementation of fig. 6, difference detector 432 (see fig. 4) is configured to calculate the change in response q (n) relative to the past time n- Δt or the difference Δq (n). One way is to measure Δq (n) as a subtraction:

Δq(n)Δq(n)＝q(n)-q(n-Δt)

Another way is to measure the response change as a division or ratio

Δq(n)＝q(n)/q(n-Δt)

For convenience, the response difference may also be normalized, for example,

in some aspects, Δt=n is conveniently used _p (periodicity of the frequency modulation) such that each sample of q (n) is compared to a corresponding sample from the past one period. In one aspect, for a composition consisting of χ [ m, k ]]Each respective frequency index k given calculates q (N) at each time index m, where index m is the frame time and index k is the length N _p Is not included in the frame. The Δt need not be constant, but for practical convenience it may be constant.

Then, more generally, the change in response q (n) is measured (by difference detector 432) to detect a time series of differences, such as CQ [ m, k ], where each difference represents a phase difference or magnitude difference of the heterodyne signal between one frame and an earlier frame of the heterodyne signal. FIG. 7 shows a plot of CQ [ m, k ] for an exemplary quotient phase and a plot of CQ [ m, k ] for an exemplary quotient value, wherein the time series covers at least two seconds (here, five seconds as an example). The difference detector 432 then detects peaks in the time series of differences, which can be seen in the power plot at the bottom of fig. 7. The heart rate is then proportional to the time interval separating a pair of adjacent peaks.

Viewed another way, the difference detector 432 calculates a frequency response difference or a time series of spectral differences. Each frequency response or spectrum difference is the difference between i) the frequency response or spectrum calculated for one frame of the heterodyne signal and ii) the frequency response or spectrum calculated for an earlier frame of the heterodyne signal. The difference detector 432 will then detect peaks in the time series and provide the heart rate as a number proportional to the time interval separating one or more adjacent peak pairs.

Viewed another way, difference detector 432 generates a time series of variation values, where each variation value represents a variation in the heterodyne signal between corresponding (e.g., adjacent) frames of the heterodyne signal; and detecting a plurality of peaks in the time series of variation values, wherein the heart rate is then given or output in proportion to the time interval separating one or more adjacent peak pairs of the plurality of peaks. For example, generating the time series of change values includes: for a current chirp frame, calculating a plurality of differences, wherein each difference indicates a difference in frequency response calculated at a respective single frequency between the current frame and a previous frame; and summing the plurality of differences (e.g., across all frequencies in the chirp frame) to produce a sum representing one of the varying values in the time series. If this sum is small enough (less than some threshold, near zero), this can be interpreted to mean that there is no change between the previous frame and the current frame. But if the sum is greater than the threshold, this is interpreted to mean that there is a change between the previous frame and the current frame. In both cases, the changes are quantized and stored as samples of the current frame. This process is repeated for adjacent pairs of frames, resulting in a sequence of quantized change samples (e.g., power values) such as shown in the bottom graph of fig. 7. The peak detection process is also performed simultaneously to detect peaks that are interpreted as representing heartbeats.

In some aspects, the cardiac activity 434 is heart rate. In some aspects, the cardiac activity 434 may be displayed or presented to the user. Additionally or alternatively, the heart activity 434 may be used with other algorithms to detect an indication of heart disease (e.g., an elevated risk). Heart conditions may include aortic murmur, bradycardia, tachycardia, aortic stenosis, mitral regurgitation, aortic regurgitation, mitral stenosis, arterial catheter patency, or other heart conditions. A heart condition may include abnormal heart activity, heart rhythm, or heart beat, such as heart activity that deviates from normal or healthy heart activity in one or more cardiac cycles.

In this way, the processing logic may utilize a multi-tone approach. Each probe response can be bandpass isolated to a bandwidth of <50Hz due to the narrowband carrier and slow time-varying phase modulation. Each demodulated carrier band may be subsampled to 100Hz (relative to 96 kHz). Up to 200 or more simultaneous detection tones may be used to measure pulsations per channel with statistically independent noise. The maximum likelihood technique may be used to combine independent ripple estimates across many probe carrier frequencies. Thus, the techniques may be more resistant or less susceptible to noise used in the microphone or ear canal frequency response, thereby being used to detect heart activity.

Fig. 5 is a graph 500 depicting an indication of heart activity measured using microphone signals, in accordance with some aspects described herein. Graph 500 shows a phase differenceBut alternatively may be used, which is shown as varying in micro-camber (on the Y-axis) over time (on the X-axis). This difference may be extracted at a difference detector 432 as described above in connection with fig. 4. One aspect of the technique is the phase signal +.>Is small, such as milliradians. Although small, the signal-to-noise ratio is empirically sufficient (e.g., measurable) at nominal operating conditions. In this graph of measurement data, where the phase +.>The offset of about 5 milliradians can be measured from the 31.5kHz carrier detect tone. This translates to a path length offset of 8.6 microns. />The magnitude of (2) is expected to be small. However, in the presence of discontinuities, noise, or no signal, the phase becomes random and the RMS phase difference jumps many orders of magnitude greater than the nominal micro radians. Thus, one or more conditions may result in a failure to track cardiac activity. This allows a detection heuristic to be +.>Less than a small threshold value in value, such as 1e-6 or 1e-5. If larger, this may indicate that an anomaly is present and will not depend on the sensed microphone signal (or given heterodyne detection tone).

As described above, a computing device has a processor configured to: causing ultrasonic waves to be output from a speaker of a head-mounted device when the head-mounted device is worn on or in an ear of a user; obtaining a microphone signal of a microphone of the head-mounted device that receives reflected ultrasonic waves in response to the output ultrasonic waves; and determining heart activity of the user of the headset based at least on the microphone signal. Heart activity may be detected by detecting surface motion of the ear, which may be detected based on a change in frequency response of a system in which the output ultrasound waves and the received reflected ultrasound waves are generated and detected. The frequency response may be measured using ticks, chirp, or pseudo random noise (in the output ultrasound). Heart activity may be detected by heterodyning the reflected ultrasound waves with a heterodyning signal to generate a heterodyning signal having a near-zero frequency component. Heterodyne signals include the relative phase between the output ultrasonic wave and the reflected ultrasonic wave, or the sensed time of flight between the output ultrasonic wave and the reflected ultrasonic wave, or the frequency response (e.g., transfer function) of a system in which the output ultrasonic wave and the reflected ultrasonic wave are generated and detected. In one aspect, the output ultrasound waves may include one or more probe tones, and heterodyning for each corresponding reflected ultrasound probe tone to generate a respective heterodyned signal having a near zero frequency. Each respective heterodyne signal is filtered to filter out components other than its near-zero component, and then difference detection over time is performed on the near-zero component to determine heart activity. In one aspect, the one or more probing tones include at least one of: a plurality of fixed frequency sinusoids or one or more frequency swept tones. In another aspect, causing ultrasonic output from a speaker of a head-mounted device is accomplished by combining one or more probe tones with audio content to produce an audio signal and driving the speaker with the audio signal. The heart activity of the user may be determined by detecting adjacent peaks in the relative phase between the output ultrasound waves and the reflected ultrasound waves and determining the heart rate based on (e.g., proportional to) the time interval between the peaks.

In another aspect of the disclosure also described above, a headset has a speaker, a microphone, and a processor configured to: causing ultrasonic waves to be output from the speaker of the head mounted device; obtaining a microphone signal of a microphone of the headset that senses reflected ultrasonic waves in response to the output ultrasonic waves; heart activity of a user of the headset is determined based at least on the reflected ultrasound waves characterized in the microphone signal. The speaker and microphone may be arranged in an earplug of the head mounted device. Alternatively, the speaker and microphone may be arranged in a housing worn on or above the user's ear.

Fig. 8 illustrates an example of an audio processing system 600 according to some aspects. The audio processing system may be a device such as, for example, a desktop computer, a tablet, a smart phone, a laptop, a smart speaker, a media player, a home appliance, a set of ears, a Head Mounted Display (HMD), a watch, smart glasses, an infotainment system for an automobile or other vehicle, or another computing device. The system may be configured to perform the methods and processes described in this disclosure.

Although various components of an audio processing system are shown that may be incorporated into headphones, speaker systems, microphone arrays, and entertainment systems, this illustration is merely one example of a particular implementation of the types of components that may be present in an audio processing system. This example is not intended to represent any particular architecture or manner of interconnecting the components, as such details are not germane to the aspects described herein. It should also be appreciated that other types of audio processing systems having fewer or more components than shown may also be used. Thus, the processes described herein are not limited to use with the hardware and software shown.

The audio processing system may include one or more buses 616 for interconnecting the various components of the system. One or more processors 602 are coupled to the bus as is known in the art. The one or more processors may be a microprocessor or special purpose processor, a system on a chip (SOC), a central processing unit, a graphics processing unit, a processor created by an Application Specific Integrated Circuit (ASIC), or a combination thereof. Memory 608 may include Read Only Memory (ROM), volatile memory, and nonvolatile memory or combinations thereof coupled to the bus using techniques known in the art. The sensor 614 may include an IMU and/or one or more cameras (e.g., RGB camera, RGBD camera, depth camera, etc.) or other sensors described herein. The audio processing system can also include a display 612 (e.g., an HMD or touch-screen display).

Memory 608 may be connected to the bus and may include DRAM, a hard drive, or flash memory, or a magnetic optical drive or magnetic memory, or an optical drive or other type of memory system that maintains data even after the system is powered down. In one aspect, the processor 602 retrieves computer program instructions stored in a machine-readable storage medium (memory) and executes the instructions to perform the operations described herein.

Although not shown, audio hardware may be coupled to one or more buses to receive audio signals to be processed and output by speaker 606. The audio hardware may include digital-to-analog converters and/or analog-to-digital converters. The audio hardware may also include audio amplifiers and filters. The audio hardware may also be connected to a microphone 604 (e.g., a microphone array) to receive audio signals (whether analog or digital), digitize them as appropriate, and transmit these signals to the bus.

The communication module 610 may communicate with remote devices and networks through wired or wireless interfaces. For example, the communication module may communicate via known techniques such as TCP/IP, ethernet, wi-Fi, 3G, 4G, 5G, bluetooth, zigBee, or other equivalent techniques. The communication module may include wired or wireless transmitters and receivers that may communicate (e.g., receive and transmit data) with a networking device such as a server (e.g., cloud) and/or other devices such as a remote speaker and remote microphone.

It should be appreciated that aspects disclosed herein may utilize memory that is remote from the system, such as a network storage device coupled to the audio processing system through a network interface, such as a modem or ethernet interface. Buses may be connected to each other through various bridges, controllers and/or adapters as is well known in the art. In one aspect, one or more network devices may be coupled to the bus. The network device may be a wired network device (e.g., ethernet) or a wireless network device (e.g., wi-Fi, bluetooth). In some aspects, various aspects described (e.g., simulation, analysis, estimation, modeling, object detection, etc.) may be performed by a networked server in communication with the capture device.

Various aspects described herein may be at least partially embodied in software. That is, the techniques may be implemented in an audio processing system in response to its processor executing sequences of instructions contained in a storage medium, such as a non-transitory machine-readable storage medium (e.g., DRAM or flash memory). In various aspects, hard-wired circuitry may be used in combination with software instructions to implement the techniques described herein. Thus, these techniques are not limited to any specific combination of hardware circuitry and software nor to any particular source for the instructions executed by the audio processing system.

In this specification, certain terms are used to describe features of various aspects. For example, in some cases, the terms "module," "processor," "unit," "renderer," "system," "device," "filter," "engine," "block," "detector," "isolator," "extractor," "generator," "model," and "component" represent hardware and/or software configured to perform one or more processes or functions. For example, examples of "hardware" include, but are not limited to, integrated circuits such as processors (e.g., digital signal processors, microprocessors, application specific integrated circuits, microcontrollers, etc.). Thus, as will be appreciated by those skilled in the art, different combinations of hardware and/or software may be implemented to perform the processes or functions described by the above terms. Of course, the hardware may alternatively be implemented as a finite state machine or even as combinatorial logic elements. Examples of "software" include executable code in the form of an application, applet, routine or even a series of instructions. As described above, the software may be stored in any type of machine-readable medium.

Some portions of the preceding detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the audio processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and is conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as those set forth in the following claims, refer to the actions and processes of an audio processing system, or similar electronic device, that manipulates and transforms data represented as physical (electronic) quantities within the system's registers and memories into other data similarly represented as physical quantities within the system memories or registers or other such information storage, transmission or display devices.

The processes and blocks described herein are not limited to the specific examples described, and are not limited to the specific order used herein as examples. Rather, any of the processing blocks may be reordered, combined, or removed, performed in parallel, or serially, as desired, to achieve the results described above. The processing blocks associated with implementing the audio processing system may be executed by one or more programmable processors executing one or more computer programs stored on a non-transitory computer readable storage medium to perform the functions of the system. All or part of the audio processing system may be implemented as dedicated logic circuits, e.g., FPGAs (field programmable gate arrays) and/or ASICs (application specific integrated circuits). All or part of the audio system may be implemented with electronic hardware circuitry comprising electronic devices such as, for example, at least one of a processor, memory, programmable logic device, or logic gate. In addition, the processes may be implemented in any combination of hardware devices and software components.

In some aspects, the disclosure may include a language such as "[ element a ] and [ element B ]. The language may refer to one or more of these elements. For example, "at least one of a and B" may refer to "a", "B", or "a and B". In particular, "at least one of a and B" may refer to "at least one of a and B" or "at least either a or B". In some aspects, the disclosure may include languages such as "[ element a ], [ element B ], and/or [ element C ]". The language may refer to any one of or any combination of these elements. For example, "A, B and/or C" may refer to "a", "B", "C", "a and B", "a and C", "B and C" or "A, B and C".

While certain aspects have been described and shown in the accompanying drawings, it is to be understood that such aspects are merely illustrative and not restrictive, and that this disclosure not be limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art.

To assist the patent office and any readers of any patent issued in this application in interpreting the appended claims, the applicant wishes to note that they do not intend any of the appended claims or claim elements to call 35u.s.c.112 (f) unless the word "means for" or "steps for" is used explicitly in a particular claim.

It is well known that the use of personally identifiable information should follow privacy policies and practices that are believed to meet or exceed industry or government requirements for maintaining user privacy. In particular, personally identifiable information data should be managed and processed to minimize the risk of inadvertent or unauthorized access or use, and the nature of authorized use should be specified to the user.

Claims

1. A method for measuring heart rate of a user, the method comprising:

causing output ultrasound to be output from a speaker of a head-mounted device when the head-mounted device is worn on or in an ear of a user;

Obtaining a microphone signal of a microphone of the headset that receives reflected ultrasonic waves in response to the output ultrasonic waves; and

a heart rate of the user of the headset is determined based at least on the microphone signal.

2. The method of claim 1, wherein determining the heart rate of the user comprises detecting surface motion of the ear of the user based on the reflected ultrasonic waves in the microphone signal, wherein the surface motion of the ear is related to heart activity of the user.

3. The method of claim 2, wherein the surface motion of the ear is detected based on a change over time in a relative phase between the output ultrasonic wave and the reflected ultrasonic wave in the microphone signal, and wherein the change in the relative phase relates to a change in resonance of the reflected ultrasonic wave from the speaker to the microphone as it is reflected from the ear of the user.

4. The method of claim 2, wherein the surface motion of the ear is detected based on a change in time of flight between the output ultrasonic wave and the reflected ultrasonic wave.

5. The method of claim 2, wherein the surface motion of the ear is detected based on determining a change over time in a transfer function or frequency response between the output ultrasound wave and the reflected ultrasound wave.

6. The method of claim 5, wherein determining the change in the transfer function comprises:

heterodyning the reflected ultrasonic wave by matching the ultrasonic signal to generate a heterodyned signal having a near-zero frequency; and

a change over time in the phase or magnitude of the heterodyne signal, or a change over time in the frequency response, is detected.

7. The method of claim 5, wherein the transfer function is measured using ticks, chirp, or pseudo random noise.

8. The method of claim 1, wherein determining the heart rate comprises heterodyning the reflected ultrasonic waves to generate a heterodyning signal having a near-zero frequency, wherein the heterodyning signal comprises a relative phase between the output ultrasonic waves and the reflected ultrasonic waves, or a sensed time of flight between the output ultrasonic waves and the reflected ultrasonic waves, or a transfer function between the output ultrasonic waves and the reflected ultrasonic waves.

9. The method of claim 8, wherein the output ultrasound wave includes a plurality of probe tones and heterodyning the reflected ultrasound wave generates a plurality of heterodyning signals having near-zero frequencies.

10. The method of claim 9, further comprising low pass filtering each of the plurality of heterodyning signals and then performing peak detection to determine the heart rate.

11. The method of claim 9, wherein the plurality of probe tones are a plurality of fixed frequency sinusoids or one or more frequency swept tones.

12. The method of claim 9, wherein causing the output ultrasonic waves to be output from the speaker of the headset comprises combining the plurality of probe tones with audio content to generate an audio signal and driving the speaker with the audio signal.

13. The method of claim 1, wherein determining the heart rate of the user comprises applying a machine learning algorithm to the microphone signal to determine the heart rate of the user.

14. The method of claim 1, wherein determining the heart rate of the user comprises detecting peaks in relative phase between the output ultrasound waves and the reflected ultrasound waves, and determining the heart rate based on a time interval between the peaks.

15. A computing device comprising a processor configured to:

a heart activity of the user of the headset is determined based at least on the microphone signal.

16. The computing device of claim 15, wherein determining the heart activity of the user comprises:

a surface motion of the ear of the user is detected based on the reflected ultrasonic waves in the microphone signal, wherein the surface motion of the ear is related to heart activity of the user.

17. The computing device of claim 16, wherein the surface motion of the ear is detected based on a change in relative phase over time between the output ultrasonic wave and the reflected ultrasonic wave in the microphone signal, and wherein the change in the relative phase relates to a change in path length or resonance of the reflected ultrasonic wave from the speaker to the microphone as it is reflected from the ear of the user.

18. The computing device of claim 16, wherein the surface motion of the ear is detected based on a change in time of flight between the output ultrasonic wave and the reflected ultrasonic wave.

19. The computing device of claim 16, wherein the surface motion of the ear is detected based on determining a change over time in a transfer function between the output ultrasound wave and the reflected ultrasound wave.

20. The computing device of claim 19, wherein determining the change in the transfer function is based on a change in a phase of the transfer function over time.

21. A method for measuring heart rate of a user, the method comprising:

causing output ultrasound to be output from a speaker of a head-mounted device when the head-mounted device is worn on or in an ear of a user, wherein the output ultrasound comprises a sequence of frames, each frame comprising a probe tone whose frequency varies within the frame;

obtaining a microphone signal of a microphone of the headset, wherein the microphone signal includes reflected ultrasonic waves responsive to the output ultrasonic waves, wherein the reflected ultrasonic waves are reflected from a surface of the ear of the user;

Heterodyning the reflected ultrasonic waves in the microphone signal to produce a heterodyned signal; and

a heart rate of the user of the headset is calculated based on the heterodyne signal.

22. The method of claim 21, wherein the sequence of frames in the output ultrasound wave has an instantaneous frequency that varies like a triangle, a sawtooth, or a sinusoid.

23. The method of claim 21, wherein heterodyning the reflected ultrasonic waves in the microphone signal comprises:

the reflected ultrasonic wave is heterodyned by matching a time-varying frequency signal to produce the heterodyned signal, wherein the heterodyned signal has a near-zero frequency component and other components at higher frequencies.

24. The method of claim 23, further comprising:

the matched time-varying frequency signal is generated using a copy of the probe tone that drives the speaker to produce the output ultrasonic wave.

25. The method of claim 23, further comprising:

detecting a timing offset in the microphone signal; and

the matching time-varying frequency signal is generated based on the timing offset.

26. The method of claim 23, wherein calculating the heart rate comprises:

Generating a time series of differences, wherein each difference represents a phase difference or magnitude difference of the heterodyne signal between one frame and an earlier frame of the heterodyne signal; and

a plurality of peaks in the time series of differences are detected, wherein the heart rate is proportional to a time interval separating peak pairs of the plurality of peaks.

27. The method of claim 23, wherein calculating the heart rate comprises:

calculating a time series of frequency responses or spectral differences, wherein each frequency response or spectral difference is a difference between a frequency response or spectrum calculated for one frame of the heterodyne signal and the frequency response or spectrum calculated for an earlier frame of the heterodyne signal; and

a plurality of peaks of the time series are detected, wherein the heart rate is proportional to a time interval separating peak pairs of the plurality of peaks.

28. The method of claim 23, wherein calculating the heart rate comprises:

generating a time series of variation values, wherein each variation value represents a variation in the heterodyne signal between corresponding frames of the heterodyne signal; and

a plurality of peaks in the time series of variation values are detected, wherein the heart rate is proportional to a time interval separating peak pairs of the plurality of peaks.

29. The method of claim 28, wherein generating a time series of variation values comprises:

for a current frame, calculating a plurality of differences, wherein each difference indicates a difference in frequency response at a respective single frequency between the current frame and a previous frame; and

the plurality of differences are summed to produce a sum, wherein the sum represents one of the variation values in the time series.

30. An apparatus for measuring a heart rate of a user, comprising:

a signal processor configured to:

when the headset is worn on or in the user's ear, causing output ultrasound to be output from a speaker of the headset, wherein the output ultrasound spans a sequence of frames and includes a probe tone whose frequency varies within each frame;

31. The device of claim 30, wherein the signal processor is configured to heterodyne the reflected ultrasonic waves in the microphone signal by:

heterodyning the reflected ultrasonic wave by matching a time-varying frequency signal to produce the heterodyned signal, wherein the heterodyned signal has a near-zero frequency component and other components at higher frequencies, and wherein the heart rate is calculated by processing the near-zero frequency component instead of the other components.

32. The apparatus of claim 30, wherein the signal processor is configured to generate the matching time-varying frequency signal using a copy of a probe tone that drives the speaker to produce the output ultrasound wave, or by detecting a timing offset in the microphone signal and generating the matching time-varying frequency signal based on the timing offset.

33. The device of claim 30, wherein the signal processor is configured to calculate the heart rate by:

generating a time series of variation values, wherein each variation value represents a variation in the heterodyne signal measured between a respective pair of frames in the series of frames or between a current frame and a previous frame;

Detecting a plurality of peaks in the time series of variation values; and

outputting the heart rate proportional to a time interval separating peak pairs of the plurality of peaks.

34. The apparatus of claim 33, wherein generating the time series of change values comprises:

35. The device of claim 30, wherein the head-mounted device is an earplug.

36. The apparatus of claim 35, wherein the signal processor is integrated into a smart phone or tablet computer.

37. The device of claim 30, wherein the signal processor is integrated into the head-mounted device.

38. A machine-readable medium comprising instructions storing instructions that configure a processor to:

39. The machine-readable medium of claim 38, wherein the storage instructions configure the processor to heterodyne the reflected ultrasonic waves in the microphone signal by matching a time-varying frequency signal to produce the heterodyne signal, wherein the heterodyne signal has a near-zero frequency component and other components at higher frequencies, and wherein the heart rate is calculated by processing the near-zero frequency component instead of the other components.

40. The machine-readable medium of claim 39, wherein the storage instructions configure the processor to generate the matching time-varying frequency signal using a copy of a probe tone that drives the speaker to produce the output ultrasound wave, or by detecting a timing offset in the microphone signal and generating the matching time-varying frequency signal based on the timing offset.