WO2021147018A1

WO2021147018A1 - Electronic device activation based on ambient noise

Info

Publication number: WO2021147018A1
Application number: PCT/CN2020/073882
Authority: WO
Inventors: Xiaoming Bao; Jingbin Wang
Original assignee: Qualcomm Incorporated
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2021-07-29

Abstract

A method performed by an electronic device is described. The method includes determining an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal. The method also includes comparing the ambient noise level with a noise threshold. The method additionally includes selecting, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user. The method further includes determining whether to enter an active mode based on the selected verification threshold.

Description

ELECTRONIC DEVICE ACTIVATION BASED ON AMBIENT NOISE

FIELD OF DISCLOSURE

The present disclosure relates generally to electronic devices. More specifically, the present disclosure relates to systems and methods for electronic device activation based on ambient noise.

BACKGROUND

In the last several decades, the use of electronic devices has become common. In particular, advances in electronic technology have reduced the cost of increasingly complex and useful electronic devices. Cost reduction and consumer demand have proliferated the use of electronic devices such that they are practically ubiquitous in modern society. As the use of electronic devices has expanded, so has the demand for new and improved features of electronic devices. More specifically, electronic devices that perform new functions and/or that perform functions faster, more efficiently or with higher quality are often sought after.

Some electronic devices (e.g., cellular phones, smartphones, audio recorders, camcorders, computers, etc. ) utilize audio signals. These electronic devices may encode, store and/or transmit the audio signals. For example, a smartphone may obtain, encode, and transmit a speech signal for a phone call, while another smartphone may receive and decode the speech signal. Improving audio signal usage in electronic devices may be beneficial.

SUMMARY

A method performed by an electronic device is described. The method includes determining an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal. The method also includes comparing the ambient noise level with a noise threshold. The method further includes selecting, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user. The method additionally includes determining whether to enter an active mode based on the selected verification threshold.

Selecting the verification threshold may include selecting among different verification thresholds based on whether the comparison indicates a noisy condition. Determining whether to enter the active mode may include comparing a verification metric with the selected verification threshold. The method may include entering the active mode in response to determining that the verification metric satisfies the selected verification threshold.

Selecting the verification threshold may include selecting the verification threshold from a first verification threshold and a second verification threshold. The first verification threshold may be greater than the second verification threshold. Selecting the verification threshold may include selecting the first verification threshold in response to determining that the ambient noise level is not less than the noise threshold. Determining whether to enter the active mode may include comparing a verification metric with the first verification threshold in response to determining that the ambient noise level is not less than the noise threshold. The method may include entering the active mode in response to determining that the verification metric is greater than the first verification threshold.

Selecting the verification threshold may include selecting the second verification threshold in response to determining that the ambient noise level is less than the noise threshold. Determining whether to enter the active mode may include comparing a verification metric with the second verification threshold in response to determining that the ambient noise level is less than the noise threshold. The method further may include entering the active mode in response to determining that the verification metric is greater than the second verification threshold.

The method may include performing noise suppression on the audio signal. The method may also include detecting a keyword with an associated verification metric based on the noise suppressed audio signal. The method may include providing a first level of device access in response to determining that a verification metric satisfies a first verification threshold or may include providing a second level of device access in response to determining that the verification metric satisfies a second verification threshold.

An electronic device is also described. The electronic device includes a memory. The electronic device also includes a processor in electronic communication with the memory. The processor is configured to determine an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal. The processor is also configured to compare the ambient noise level with a noise threshold. The processor is further configured to select, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user. The processor is additionally configured to determine whether to enter an active mode based on the selected verification threshold.

A non-transitory tangible computer-readable medium storing computer-executable code is also described. The computer-readable medium includes code for causing a processor to determine an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal. The computer-readable medium also includes code for causing the processor to compare the ambient noise level with a noise threshold. The computer-readable medium further includes code for causing the processor to select, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user. The computer-readable medium additionally includes code for causing the processor to determine whether to enter an active mode based on the selected verification threshold.

An apparatus is also described. The apparatus includes means for determining an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal. The apparatus also includes means for comparing the ambient noise level with a noise threshold. The apparatus further includes means for selecting, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user. The apparatus additionally includes means for determining whether to enter an active mode based on the selected verification threshold.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 is a block diagram illustrating one example of an electronic device in which systems and methods for electronic device activation based on ambient noise may be implemented;

Figure 2 is a flow diagram illustrating one configuration of a method for controlling electronic device activation based on ambient noise;

Figure 3 is a flow diagram illustrating a more specific example of a method for controlling electronic device activation based on ambient noise;

Figure 4 is a flow diagram illustrating another more specific example of a method for controlling electronic device activation based on ambient noise;

Figure 5 is a flow diagram illustrating another more specific example of a method for controlling electronic device activation based on ambient noise;

Figure 6 is a flow diagram illustrating another more specific example of a method for controlling electronic device activation based on ambient noise;

Figure 7 is a state diagram illustrating an example of modes and transitions that may be implemented in accordance with some configurations of the systems and methods described herein;

Figure 8 is a state diagram illustrating another example of modes and transitions that may be implemented in accordance with some configurations of the systems and methods described herein;

Figure 9 is a block diagram illustrating an example of elements or components that may be implemented in accordance with some configurations of the systems and methods disclosed herein; and

Figure 10 illustrates certain components that may be included within an electronic device configured to implement various configurations of the systems and methods disclosed herein.

DETAILED DESCRIPTION

Some configurations of the systems and methods disclosed herein relate to electronic device activation based on ambient noise. Electronic devices may include devices configured to operate using electronic circuitry. Examples of electronic circuitry include integrated circuits, processors, memory, application specific integrated circuits (ASICs) , etc. Some examples of electronic devices include smartphones, tablet devices, computing devices, remote controllers, smart appliances, autonomous vehicles, vehicle electronics, aircraft, etc.

Some electronic devices may be configured to receive speech and/or voice signals. For example, some electronic devices may provide a voice user interface (UI) to operate in response to received speech and/or voice signals. For instance, voice UI may be a feature on smartphones. In some scenarios (e.g., in a driving car) , voice UI may be convenient. However, voice UI may not always be enabled due to power consumption and/or user privacy concerns. Accordingly, it may be beneficial to have voice UI enabled just when requested in some examples.

Voice activation may be utilized to activate an electronic device and/or to activate voice UI. For example, voice activation may include techniques in which a user’s voice and/or speech may be utilized for activating an electronic device and/or voice UI. For instance, activating an electronic device and/or voice UI may include transitioning the electronic device to an active mode from a passive mode (e.g., low-power mode, sleep mode, hibernate mode, locked mode, power-save mode, etc. ) . Additionally or alternatively, activating an electronic device and/or voice UI may include activating voice control of the electronic device.

Some examples of voice activation may utilize a keyword and/or user verification. A keyword may be a word, phrase, term, speech, audio signal, and/or sound that may be utilized to trigger a function (e.g., voice activation) of an electronic device. A keyword may be predefined in some approaches. For example, a keyword may be set before use by an electronic device manufacturer and/or user. In some examples of the systems and methods described herein, an electronic device may perform keyword detection and user verification. In some examples, keyword detection may include detecting a keyword in an audio signal (e.g., an utterance of a keyword) . Detecting the keyword may be performed with or without regard for the identity of the speaker. For example, an electronic device may detect a keyword in an audio signal by comparing the audio signal with a keyword model. The keyword model may include a template and/or one or more aspects (e.g., phonemes, timing, etc. ) corresponding to a keyword. For instance, an electronic device may utilize a speech recognition technique such as a hidden Markov model (HMM) , dynamic time warping, and/or machine learning (e.g., deep neural network (s) , artificial neural network (s) , recurrent neural network (s) , etc. ) to detect a keyword in an audio signal.

In some examples, user verification may include recognizing and/or identifying a designated user. User verification may be a technique that allows an electronic device to limit responding to a particular user or users (and/or to ignore one or more other people) . For instance, a smartphone that utilizes keyword detection and user verification may attempt to allow only an owner (and/or designated user (s) ) to activate an electronic device (e.g., unlock the smartphone, activate voice UI, etc. ) using a spoken keyword, while disallowing others. In some examples, low-power voice activation may be enabled in electronic devices, such as smartphones.

In some examples of the techniques described herein, a spoken keyword may be detected and a user may be verified for a designated user to be recognized. In some approaches to keyword (e.g., pre-defined keyword) detection and user verification for voice activation, an electronic device may request a designated user to speak a keyword several times. The electronic device may capture the utterances to train a user sound model (e.g., to obtain a voiceprint) and/or a keyword model. The designated user’s voiceprint characteristics may be saved in the user model. An electronic device may compare an audio signal (e.g., a microphone input) with the content in a user sound model. In some examples, the audio signal may be compared with the keyword model and/or the user sound model (e.g., template data) in order to detect the keyword and to perform user verification. For instance, the particular keyword may be detected and the designated user may be verified.

In some approaches, both the keyword may need to be detected and the user may need to be verified to activate an electronic device. In some examples, an electronic device may not be activated even if the designated user’s voice in the audio signal matches the voiceprint, unless the keyword is also detected. Also, the electronic device may not be activated even if the non-designated user utters the keyword and the keyword is detected in the audio signal. For instance, both keyword detection and user verification may need to be successful to activate the electronic device. In some approaches, keyword detection and user verification may be performed jointly. For example, template data may include or indicate one or more aspects (e.g., phonemes, timing, etc. ) used for keyword detection and/or one or more aspects (e.g., voiceprint, vocal characteristics, etc. ) used for user verification. In some approaches, keyword detection and user verification may be performed separately. For example, keyword detection may be performed, and verification metric assessment may then be performed on a detected keyword. Additionally or alternatively, verification metric assessment may be performed on an audio signal, and keyword detection may then be performed on the audio signal. In some examples, a detected keyword may have an associated verification metric. For example, if a template (e.g., template data) matches a portion (e.g., keyword) of the audio signal, a verification metric may be associated with the detected keyword, where the verification metric may indicate a degree of matching or confidence that the detected keyword was uttered by a designated user and/or that an utterance by a designated user is a keyword. In some examples, user verification may be performed on a detected keyword in order to produce the verification metric associated with the detected keyword, or a keyword may be detected on a portion of an audio signal corresponding to a designated user.

Some approaches to voice activation may suffer in noisy environments. For example, in noisy environments (e.g., especially in environments with non-stationary noise) with relatively low signal-to-noise ratio (SNR) , voice activation may fail frequently. In some experiments, for instance, in environments with SNR > 9 decibels (dB) in television (TV) program noise, some approaches to voice activation with user verification may function properly for approximately 95%of voice activation attempts. In environments with SNR < 9dB in TV program noise, some approaches to voice activation with user verification may function properly for approximately 50%of voice activation attempts. Accordingly, some approaches to voice activation with user verification may not work well in low-SNR environments. For instance, some approaches to user verification may cause more rejections, which may cause voice activation to fail frequently.

Users may expect voice activation to function (e.g., consistently function) in a variety of environments, including noisy environments. If voice activation fails frequently (e.g., if a smartphone often does not wake up when a keyword is spoken) , a user may become frustrated, lose interest in voice activation, and/or stop attempting to use voice activation.

In some approaches, noise-suppression may be performed before inputting a signal to voice activation. Performing noise suppression may increase the SNR of the signal input to voice activation. In some examples (e.g., in a smartphone use case) , noise suppression may provide limited improvement.

In some approaches, user verification requirements may be reduced for voice activation. This may increase a wake-up rate in noisy environments, but may result in decreasing user verification performance in other conditions (e.g., higher SNR environments) . For instance, user verification requirements may be reduced such that user verification can only distinguish between male and female voices, which may result in voice activation based on falsely verifying non-designated people (e.g., “imposters” ) . In some approaches, user verification requirements may be changed manually. For example, different user verification levels may be provided in a user interface. This may allow a user to select user verification requirements to increase a wake-up rate in noisy environments. However, if the user neglects to change user verification requirements for different environments, this may result in decreasing user verification performance in differing conditions (e.g., increased imposter detection in higher SNR environments or reduced wake-up rate in low SNR environments) .

Some examples of the techniques described herein may improve electronic device activation based on ambient noise. For instance, some approaches may improve user verification by increasing robustness in a low-SNR environment. Examples of automatic user verification to improve a voice activation detection rate in noisy environments are described herein.

Various configurations are now described with reference to the Figures, where like reference numbers may indicate functionally similar elements. The systems and methods as generally described and illustrated in the Figures herein could be arranged and designed in a wide variety of different configurations. Thus, the following more detailed description of several configurations, as represented in the Figures, is not intended to limit scope, as claimed, but is merely representative of the systems and methods.

Figure 1 is a block diagram illustrating one example of an electronic device 102 in which systems and methods for electronic device activation based on ambient noise may be implemented. The electronic device 102 may be an apparatus for performing a function or functions. Examples of the electronic device 102 include smartphones, tablet devices, computing devices, computers (e.g., desktop computers, laptop computers, etc. ) , cameras, virtual reality devices (e.g., headsets) , augmented reality devices (e.g., headsets) , mixed reality devices, vehicles (e.g., semi-autonomous vehicles, autonomous vehicles, etc. ) , automobiles, robots, aircraft, drones, unmanned aerial vehicles (UAVs) , servers, network devices, healthcare equipment, gaming consoles, smart appliances, etc. In some configurations, the electronic device 102 may be integrated into one or more devices (e.g., vehicles, drones, mobile devices, etc. ) . The electronic device 102 may include one or more components or elements. One or more of the components or elements may be implemented in hardware (e.g., circuitry) or a combination of hardware and instructions (e.g., a processor with software stored in memory) .

In some configurations, the electronic device 102 may include a processor 110, a memory 120, one or more displays 128, one or more microphones 104, and/or one or more communication interfaces 106. The processor 110 may be coupled to (e.g., in electronic communication with) the memory 120, display (s) 128, microphone (s) 104, and/or communication interface (s) 106. It should be noted that one or more of the elements illustrated in Figure 1 may be optional. In particular, the electronic device 102 may not include one or more of the elements illustrated in Figure 1 in some configurations. For example, the electronic device 102 may or may not include a display 128. Additionally or alternatively, the electronic device 102 may or may not include a communication interface 106.

The memory 120 may store instructions and/or data. The processor 110 may access (e.g., read from and/or write to) the memory 120. Examples of instructions and/or data that may be stored by the memory 120 may include audio data 122 (e.g., audio samples, audio files, audio waveforms, etc. ) , noise threshold data 124, verification threshold data 126, noise level determiner 112 instructions, threshold selector 114 instructions, noise level comparator 116 instructions, mode controller 118 instructions, and/or instructions for other elements, etc.

The communication interface 106 may enable the electronic device 102 to communicate with one or more other electronic devices. For example, the communication interface 106 may provide an interface for wired and/or wireless communications. In some configurations, the communication interface 106 may be coupled to one or more antennas 108 for transmitting and/or receiving radio frequency (RF) signals. For example, the communication interface 106 may enable one or more kinds of wireless (e.g., cellular, wireless local area network (WLAN) , personal area network (PAN) , etc. ) communication. Additionally or alternatively, the communication interface 106 may enable one or more kinds of cable and/or wireline (e.g., Universal Serial Bus (USB) , Ethernet, High Definition Multimedia Interface (HDMI) , fiber optic cable, etc. ) communication.

In some configurations, multiple communication interfaces 106 may be implemented and/or utilized. For example, one communication interface 106 may be a cellular (e.g., 3G, Long Term Evolution (LTE) , CDMA, 5G, etc. ) communication interface 106, another communication interface 106 may be an Ethernet interface, another communication interface 106 may be a universal serial bus (USB) interface, and yet another communication interface 106 may be a wireless local area network (WLAN) interface (e.g., Institute of Electrical and Electronics Engineers (IEEE) 802.11 interface) . In some configurations, the communication interface (s) 106 may send information (e.g., audio information, noise information, verification information, etc. ) to and/or receive information from another electronic device (e.g., a microphone (s) , transducer (s) , a vehicle, a smartphone, a camera, a display, a robot, a remote server, etc. ) .

In some configurations, the electronic device 102 may include one or more displays 128. A display 128 may be a screen or panel for presenting images. In some examples, the display (s) 128 may be implemented with one or more display technologies, such as liquid crystal display (LCD) , light-emitting diode (LED) , organic light-emitting diode (OLED) , plasma, cathode ray tube (CRT) , etc. The display (s) 128 may present content. Examples of content may include one or more interactive controls, one or more frames, video, still images, graphics, virtual environments, three-dimensional (3D) image content, 3D models, symbols, characters, etc. In some configurations, information, data, and/or images based on audio signal (s) being captured by microphone (s) 104 may be presented on the display 128.

The display (s) 128 may be integrated into the electronic device 102 or may be linked to the electronic device 102. In some examples, the display (s) 128 may be a monitor with a desktop computer, a display on a laptop, a touch screen on a tablet device, an OLED panel in a smartphone, etc. In another example, the electronic device 102 may be a virtual reality headset with integrated displays 128. In another example, the electronic device 102 may be a computer that is coupled to a virtual reality headset with the displays 128.

In some configurations, the electronic device 102 may present a user interface 130 on the display 128. For example, the user interface 130 may enable a user to interact with the electronic device 102. In some configurations, the display 128 may be a touchscreen that receives input from physical touch (by a finger, stylus, or other tool, for example) . Additionally or alternatively, the electronic device 102 may include or be coupled to another input interface. For example, the electronic device 102 may include a camera and may detect user gestures (e.g., hand gestures, arm gestures, eye tracking, eyelid blink, etc. ) . In another example, the electronic device 102 may be linked to a mouse and may detect a mouse click. In yet another example, the electronic device 102 may be linked to one or more other controllers (e.g., game controllers, joy sticks, touch pads, motion sensors, etc. ) and may detect input from the one or more controllers.

In some configurations, the electronic device 102 may include one or more microphones 104. Examples of microphone (s) 104 may include microelectromechanical system (MEMS) microphones, dynamic microphones, condenser microphones, ribbon microphones, etc. The microphone (s) 104 may convert sound or acoustic energy into electrical signal (s) (e.g., electronic audio signal (s) ) . As used herein, the term “microphone” and variations thereof may additionally or alternatively refer to audio transducer (s) and/or mechanical transducer (s) . For example, an audio transducer may be a device that converts sound or acoustic energy into electrical signal (s) (e.g., electronic audio signal (s) ) . A mechanical transducer may be a device that converts mechanical energy (e.g., vibration) into electrical signals (s) (e.g., electronic audio signal (s) ) . For instance, mechanical vibrations of the electronic device 102 from movement of the electronic device 102 and/or movement of another body (e.g., vehicle mount, seat, table, floor, clothing, user limb, etc. ) in contact with the electronic device 102 may cause vibrations in the microphone (s) 104 that may be captured as audio signal (s) .

In some examples, the microphone (s) 104 may capture one or more audio signal (s) . The audio signal (s) may be stored as audio data 122 in memory 120 in some examples. The audio signal (s) may indicate sound from the environment of the electronic device 102. For example, the audio signal (s) may indicate ambient noise (e.g., environmental noise, interfering noise, stationary noise, non-stationary noise, etc. ) , voice, speech, and/or sounds, etc. For instance, the audio signal (s) may include sound (e.g., voice, speech, etc. ) from one or more designated users (e.g., owners, authorized users, etc. ) of the electronic device 102 and/or ambient noise.

In some examples, the microphone (s) 104 may be included in (or mechanically coupled to) the electronic device 102 or another electronic device. For instance, the microphone (s) 104 may be included in a smartphone or a remote smartphone. In some configurations, the microphone (s) 104 may be linked to the electronic device 102 via a wired and/or wireless link.

In some configurations, the electronic device 102 may request and/or receive one or more audio signals from another device. For example, the electronic device 102 may receive one or more audio signals from one or more external microphones linked to the electronic device 102. In some configurations, the electronic device 102 may request and/or receive the one or more audio signals via the communication interface 106. For example, the electronic device 102 may or may not include microphone 104 and may receive audio signal (s) (e.g., audio data) from one or more remote devices.

In some configurations, the electronic device 102 (e.g., processor 110, memory 120) may obtain and/or receive audio data 122. Examples of audio data 122 include audio samples, audio frames, audio waveforms, audio files, etc. In some examples, audio frames may be captured at regular periods, semi-regular periods, or aperiodically. The audio data 122 may indicate one or more audio signals. The audio data 122 may be stored in memory 120. For example, the memory 120 may buffer and/or store a stream of audio data 122 from a microphone and/or from another device (e.g., network device, smartphone, computing device, etc. ) .

The processor 110 may include and/or implement a noise level determiner 112, athreshold selector 114, a noise level comparator 116, and/or a mode controller 118. In some examples, one or more of the elements illustrated in the electronic device 102 and/or processor 110 may be excluded (e.g., not implemented and/or not included) , may be combined, and/or may be divided. For example, the electronic device 102 may not include the microphone (s) 104, communication interface (s) 106, antenna (s) 108, and/or display (s) 128 in some configurations. In some examples, the noise level determiner 112, threshold selector 114, and/or mode controller 118 may be combined. In some examples, the threshold selector 114 and noise level comparator 116 may be divided or separated. Additionally or alternatively, one or more of the elements illustrated in the processor 110 may be implemented separately from the processor 110 (e.g., in other circuitry, on another processor, on a separate electronic device, etc. ) . For example, the electronic device 102 may include multiple processors 110 and/or multiple memories 120, and one or more of the elements described herein may be distributed across multiple processors 110 and/or multiple memories 120.

The processor 110 may include and/or implement a noise level determiner 112. For example, the processor 110 may execute noise level determiner 112 instructions stored in the memory 120 to implement the noise level determiner 112. The noise level determiner 112 may determine an ambient noise level based on an audio signal. An ambient noise level may be an indication and/or estimate of a degree of noise (e.g., acoustic and/or mechanical noise) in the environment of the electronic device 102. For example, noise may be sounds, vibrations, and/or acoustic waves, etc., besides target audio. In some configurations, target audio may be sound, voice, speech, etc., of a designated user or users (e.g., a sought-for signal over interfering signals or noise) . The ambient noise may interfere with the target audio. In some examples, the ambient noise level may be expressed in terms of an amount of noise (e.g., ambient noise level may increase with increased noise) or may be expressed in terms inverse to an amount of noise. For example, the ambient noise level may be expressed relative to target audio (e.g., may decrease with increased noise and/or increased target audio) .

In some configurations, the ambient noise level may be expressed as a signal-to-noise ratio (SNR) . For example, the noise level determiner 112 may determine (e.g., estimate) a noise level (e.g., noise estimate, average noise amplitude, peak noise amplitude, etc. ) from portions of the audio signal without target audio. The noise level determiner 112 may determine (e.g., estimate) a “signal” level or target audio level (e.g., average target audio amplitude, peak target audio amplitude, etc. ) from portions of the audio signal that may include target audio. The portions of the audio signal with and/or without the target audio may be detected (e.g., estimated) using voice activity detection (VAD) in some configurations. For example, the electronic device 102 (e.g., processor 110) may include and/or implement a voice activity detector. The voice activity detector may indicate whether voice activity (e.g., speech) is detected. For example, the voice activity detector may provide a voice activity indicator (e.g., VAD flag) that indicates whether voice activity is detected (in a portion of the audio signal and/or within a period of time, for example) . The noise level determiner 112 may divide a target audio level estimate by the noise level estimate to determine the SNR (as an expression of the ambient noise level, for instance) . In some configurations, the ambient noise level may be expressed as noise power, noise energy, signal power divided by noise power, average noise magnitude, etc.

In some configurations, the noise level determiner 112 may determine the ambient noise level by performing noise suppression on the audio signal. For example, the electronic device 102 (e.g., processor 110) may include and/or implement one or more noise suppressors. For instance, the electronic device 102 (e.g., processor 110, noise level determiner 112, noise suppressor (s) , etc. ) may perform noise suppression to determine an ambient noise level (e.g., SNR) .

In some examples, the electronic device 102 (e.g., processor 110, noise level determiner 112, noise suppressor (s) , etc. ) may perform single-microphone or multi-microphone audio signal processing. The audio signal processing may produce the voice activity indicator (e.g., VAD flag) , target audio (e.g., target audio level estimate, speech reference, etc. ) , and/or noise estimation (e.g., noise estimate, noise level estimate, average noise amplitude, peak noise amplitude, etc. ) . In some examples, the audio signal processing (e.g., noise suppression) may be accomplished by performing Weiner filtering, beamforming, improved minima controlled recursive averaging (IMCRA) , power level differences (PLD) (between microphones, for example) , spectral subtraction, stationary noise suppression, non-stationary noise suppression, deep learning, and/or a voiceprint algorithm. The SNR may be calculated based on the target audio (e.g., target audio level estimate, speech reference, etc. ) and noise estimation (e.g., noise level estimate, etc. ) .

Some approaches to noise suppression may include single-microphone noise suppression approaches. For example, a minimum statistics algorithm or improved minimum statistics algorithm (e.g., IMCRA) may be utilized to perform noise estimation. Some approaches to noise suppression may include multi-microphone noise suppression approaches. For example, beamforming and/or PLD may provide improved non-stationary noise estimation. Some approaches to noise suppression may include deep learning noise suppression approaches. For example, one or more deep learning-based (e.g., deep neural network) approaches may be utilized by the electronic device 102 (e.g., mobile device, smartphone, computer, etc. ) . Some deep learning-based approaches may work with single microphone or multiple microphones, may provide a separate noise reference (e.g., noise level estimate) and speech reference (e.g., target audio level estimate) , and/or may provide good noise estimation. In some cases, some approaches for voice call noise suppression may not work well for keyword detection, as aggressive noise-suppression may introduce distortion.

The processor 110 may include and/or implement a threshold selector 114. For example, the processor 110 may execute threshold selector 114 instructions stored in the memory 120 to implement the threshold selector 114. The threshold selector 114 may be configured to select a verification threshold. A verification threshold may be a threshold for determining whether at least a portion of the audio signal (e.g., an utterance) corresponds to a designated (e.g., authorized) user. For example, the verification threshold may be used for verifying a user or not (e.g., for determining whether an audio signal corresponds to a designated user or not) .

In some configurations, the threshold selector 114 may include a noise level comparator 116. In some configurations, the noise level comparator 116 may be separate from the threshold selector 114. In some examples, the processor 110 may execute noise level comparator 116 instructions stored in the memory 120 to implement the noise level comparator 116. The noise level comparator 116 may compare the ambient noise level with one or more noise thresholds. For example, one or more noise thresholds may be stored in the memory 120 as noise threshold data 124. The noise threshold (s) may be predetermined and/or may be set based on a user input. For instance, the noise threshold (s) may be stored as noise threshold data 124 during manufacture and/or calibration. Additionally or alternatively, a user may set and/or adjust the noise threshold (s) (via the user interface 130, for instance) .

In some examples, a noise threshold may indicate a level of ambient noise at which to change user verification. For instance, if a level of ambient noise is below a noise threshold, a certain verification threshold may be utilized, or if a level of ambient noise is above a noise threshold, a different verification threshold may be utilized. In some examples, a noise threshold may be expressed in terms of SNR. Examples of a noise threshold may include 4 dB, 5 dB, 7 dB, 9 dB, 10 dB (SNR) , etc. In some examples, one noise threshold may be utilized to establish two ambient noise level ranges for different verification thresholds. In some examples, two or more noise thresholds may be utilized to establish three or more ambient noise level ranges for different verification thresholds. The noise level comparator 116 may compare the ambient noise level to the noise threshold (or to noise thresholds) to determine a relationship between the ambient noise level and the noise threshold (s) . For instance, the noise level comparator 116 may determine whether the ambient noise level is greater than, equal to, or less than the noise threshold (or one or more of a set of noise thresholds) .

In some configurations, the comparison (of ambient noise level to noise threshold) may indicate whether a noisy condition is met. A noisy condition may be a condition in which the ambient noise level may cause an increased degree of user verification rejections. In some examples, a noisy condition may be met if the ambient noise level has a particular relationship with the noise threshold (s) . For example, if an SNR (which may reflect the ambient noise level, for instance) is less than a noise threshold, then the noisy condition may be met and/or indicated.

The threshold selector 114 may select a verification threshold based on comparing the ambient noise level with the noise threshold (s) . For example, two or more verification thresholds may be stored in the memory 120 as verification threshold data 126. The verification thresholds may be predetermined and/or may be set based on a user input. For instance, the verification thresholds may be stored as verification threshold data 126 during manufacture and/or calibration. For example, a calibration may be performed to determine and/or tune one or more verification thresholds, such that the one or more verification thresholds provide (s) a target wake-up rate and/or imposter rejection rate. Additionally or alternatively, a user may set and/or adjust the verification thresholds (via the user interface 130, for instance) . In some configurations, each of the verification thresholds may be associated with an ambient noise level range that is established by the noise threshold (s) . For instance, a first verification threshold may be associated with an ambient noise level range that is greater than or equal to a noise threshold, and a second verification threshold may be associated with an ambient noise level range that is less than the noise threshold. In another example, a first verification threshold may be associated with an ambient noise level range that is greater than or equal to a first noise threshold, a second verification threshold may be associated with an ambient noise level range that is less than the first noise threshold and greater than or equal to a second noise threshold, and a third verification threshold may be associated with an ambient noise level range that is less than the second noise threshold. Other numbers of noise thresholds and/or verification thresholds may be utilized in other examples.

The threshold selector 114 may select the verification threshold according to the relationship between the ambient noise level and the noise threshold. For example, selecting the verification threshold may include selecting among different verification thresholds based on whether the comparison indicates a noisy condition. For instance, if the ambient noise level is greater than or equal to a noise threshold, the threshold selector 114 may select a first verification threshold, or may select a second verification threshold if the ambient noise level is less than the noise threshold. Additionally or alternatively, the threshold selector 114 may select the verification threshold associated with an ambient noise level range where the ambient noise level resides. For instance, if the ambient noise level is within a first ambient noise level range, the threshold selector 114 may select a first verification threshold, or may select a second verification threshold if the ambient noise level is within a second ambient noise level range.

In some examples, the threshold selector 114 may select a verification threshold based on a voice activity condition. A voice activity condition may be an indication of voice activity. Examples of a voice activity condition may include a voice activity indicator (e.g., an indicator of voice activity provided by voice activity detection and/or a voice activity detector, a VAD flag, etc. ) and/or a voice activity measurement (e.g., SNR) . A voice activity measurement may be a measurement of an audio signal that may indicate voice activity (e.g., speech) in the audio signal. In some examples, the threshold selector 114 may select a verification threshold based on whether a voice activity indicator (e.g., VAD flag) indicates voice activity. For instance, if the voice activity indicator indicates that the audio signal does not include voice activity (e.g., the VAD flag is false) , then a first verification threshold may be selected. Alternatively, if the voice activity indicator indicates that the audio signal includes voice activity and the noisy condition is met, then a second verification threshold may be selected.

In some examples, the threshold selector 114 may select a verification threshold based on a voice activity measurement (e.g., SNR) . For instance, when voice activity is not included in an audio signal (e.g., when the VAD flag is false) , the calculated SNR of the audio signal may be relatively low (e.g., -20 dB) . For example, a low voice activity measurement may be calculated for portions of the audio signal in which little or no target audio is included. In some examples, the threshold selector 114 may utilize a voice activity threshold (e.g., -20 dB, -10 dB, -5 dB, etc. ) to select a user verification threshold. For instance, the user verification threshold may be selected based on whether a voice activity measurement satisfies a voice activity threshold. For example, the threshold 114 selector may select a user verification threshold based on whether the voice activity measurement (e.g., SNR) is greater than the voice activity threshold. For instance, if the SNR is within a range between the voice activity threshold and a noise threshold (e.g., -20 dB < SNR < 9 dB) , the threshold selector may select a second verification threshold (e.g., a lower verification threshold) . Otherwise, the threshold selector 114 may select a first verification threshold (e.g., if SNR ≤ -20 dB or if SNR ≥ 9 dB) . The first verification threshold may be greater than the second verification threshold. In some examples, the voice activity measurement (e.g., SNR) may be the ambient noise level (e.g., SNR) .

The processor 110 may include and/or implement a mode controller 118. For example, the processor 110 may execute mode controller 118 instructions stored in the memory 120 to implement the mode controller 118. The mode controller 118 may control a mode of the electronic device 102. For example, the mode controller 118 may control whether the electronic device 102 is in an active mode or a passive mode. A passive mode may be a mode of operation where electronic device activity is reduced and/or limited. Examples of passive mode may include low-power mode, sleep mode, hibernate mode, locked mode, power-save mode, etc. When in passive mode, the electronic device 102 may perform limited operations and/or may be responsive to limited inputs. For instance, the electronic device 102 may respond to limited inputs for triggering a transition to active mode, for charging a battery of the electronic device 102, for performing an emergency call, etc. In some examples, the electronic device 102 may adjust operation in passive mode to conserve power. For instance, the display (s) 128 and/or touchscreen (s) may be deactivated, the processor 110 and/or memory 120 may operate more slowly, and/or the communication interface (s) 106 may reduce communication (e.g., transmission/reception) when the electronic device 102 is in passive mode.

An active mode may be a mode of operation in which the electronic device 102 allows more operations and/or is responsive to more inputs (than when in passive mode, for instance) . When in active mode, for example, the electronic device 102 may respond to more inputs (e.g., voice commands, clicks, taps, motion, button presses, etc. ) for interacting with applications, for triggering a transition to passive mode, for charging the electronic device 102, for performing calls, sending text messages, playing games, etc. In some examples, the electronic device 102 may consume more power in active mode than in passive mode. For instance, the display (s) 128 and/or touchscreen (s) may be activated, the processor 110 and/or memory 120 may operate more quickly, and/or the communication interface (s) 106 may allow more communication (e.g., transmission/reception) when the electronic device 102 is in active mode.

The mode controller 118 may determine whether to enter an active mode based on the selected verification threshold. For example, determining whether to enter the active mode may include comparing a verification metric with the selected verification threshold. A verification metric may be a value indicating a degree of certainty or confidence that the audio signal (e.g., detected keyword) is from (e.g., was spoken by and/or corresponds to) a designated user. For example, the electronic device 102 may detect a keyword from the audio signal and determine an associated verification metric that indicates a degree of certainty or confidence that the detected keyword was spoken by a designated user (e.g., authorized user/owner of the electronic device 102) . In some examples, the mode controller 118 may compare an audio signal (e.g., a portion of the audio signal) to template data corresponding to a designated user to detect the keyword and/or produce the verification metric. Performing a cosine similarity procedure may be an example of comparing the audio signal to the template data. In some configurations, the memory 120 may store template data corresponding to the keyword.

Template data may be audio data and/or other data (e.g., features, pitch, spectral envelope, filter coefficients, and/or timing, etc. ) that characterize (s) a designated user’s voice, speech, and/or utterance of the keyword. In some approaches, the electronic device 102 may receive and/or determine the template data during an enrollment (e.g., a user verification setup) procedure. For example, the user may speak the keyword, which may be captured by the microphone (s) 104 and used (e.g., analyzed) to produce and/or store the template data. For instance, the electronic device 102 may analyze the audio signal with the keyword using a hidden Markov model, Gaussian mixture model, neural network, frequency estimation, vector quantization, and/or linear predictive coding, etc., to produce the template data, which may be stored in the memory 120. In some examples, the electronic device 102 may receive the template data from another (e.g., remote) device. The electronic device 102 may compare a detected keyword to the template data to determine a degree of matching between the detected keyword and the template data. In some examples, the degree of matching may be determined by calculating a correlation, error, mean squared error, difference, distance (e.g., Euclidean distance, vector distance, etc. ) , and/or number of matching aspects, etc., between the audio signal (e.g., a portion of the audio signal, the detected keyword, detected aspect (s) of the audio signal, metric (s) corresponding to the audio signal, pitch of the audio signal, detected phonemes in the audio signal, filter coefficients, etc. ) and the template data. In some examples, a higher amount of correlation and/or higher number of matching aspects may indicate a higher degree of matching. Additionally or alternatively, a lower amount of error, mean squared error, difference, and/or distance may indicate a higher degree of matching. The degree of matching may be an example of the verification metric.

In some configurations, the electronic device 102 (e.g., processor 110, mode controller 118) may enter the active mode in response to determining that the verification metric satisfies the selected verification threshold. For example, the mode controller 118 may compare the verification metric to the selected verification threshold. In a case that the verification metric satisfies the selected verification threshold, the electronic device 102 may enter the active mode. For example, if the verification metric indicates that the uttered keyword corresponds to a designated user with a degree of certainty and/or confidence according to the selected verification threshold, the electronic device 102 may enter the active mode. Entering the active mode may include activating and/or increasing the activity of one or more components (e.g., display (s) 128, processor 110, memory 120, communication interface (s) 106, etc. ) , and/or may include allowing and/or responding to increased interaction (e.g., voice commands, clicks, taps, gestures, and/or motions, etc. ) .

In some configurations, one or more of the components or elements described in connection with Figure 1 may be combined and/or divided. For example, the noise level determiner 112, threshold selector 114, and/or mode controller 118 may be combined into an element that performs the functions of the noise level determiner 112, threshold selector 114, and/or mode controller 118. In another example, the threshold selector 114 may be divided into a number of separate components or elements that perform a subset of the functions associated with the threshold selector 114.

Figure 2 is a flow diagram illustrating one configuration of a method 200 for controlling electronic device activation based on ambient noise. The method 200 may be performed by the electronic device 102 described in connection with Figure 1. The electronic device 102 may determine 202 an ambient noise level based on an audio signal. This may be accomplished as described in connection with Figure 1. For example, the electronic device 102 may determine an SNR based on an audio signal.

One or more approaches for determining the SNR may be implemented as described above. For example, the electronic device 102 may determine an SNR in a scenario with car noise, train noise, bus noise, indoor noise (e.g., home noise, kitchen noise, office noise, etc. ) . There may be some scenarios in which an inaccurate high SNR may be measured even when an actual target audio SNR is low. For instance, an interfering speaker may be louder than a target voice utterance. In some cases, single-microphone noise suppression may be limited in detecting a correct SNR in scenarios where an interfering speaker is louder than a target speaker. Multi-microphone beamforming noise suppression may avoid this issue, though scenarios where interfering speech and target speech are from the same direction may cause difficulties in accurately measuring the SNR. In some examples, deep learning-based (e.g., recurrent neural network model-based) noise suppression may obtain an accurate SNR using a voiceprint with a target utterance recognition. In some examples, the electronic device 102 may not trigger a lower user verification threshold in cases with high (but inaccurate SNR) . For instance, an inaccurate higher SNR may be caused by a non-designated speaker (e.g., imposter) in some cases. A higher user verification threshold may be utilized in these cases, which may help to avoid allowing a higher wake-up rate with a non-designated speaker (e.g., imposter) .

The electronic device 102 may compare 204 the ambient noise level with a noise threshold. This may be accomplished as described in connection with Figure 1. For example, the electronic device 102 may compare the ambient noise level with a noise threshold to determine whether the ambient noise level is less than, equal to, or greater than the noise threshold. In some examples, the electronic device 102 may compare the ambient noise level (e.g., SNR, a voice activity measurement) with a voice activity threshold.

The electronic device 102 may select 206 a verification threshold based on comparing the ambient noise level with the noise threshold. This may be accomplished as described in connection with Figure 1. For example, the electronic device 102 may select a verification threshold from among different verification thresholds based on the comparison. For instance, the electronic device 102 may select a verification criterion or criteria (to determine whether a portion of an audio signal corresponds to a designated user) based on the quality (e.g., ambient noise level, SNR, etc. ) of the audio signal.

The electronic device 102 may determine 208 whether to enter an active mode based on the selected verification threshold. This may be accomplished as described in connection with Figure 1. For example, the electronic device 102 may compare a verification metric to the selected verification threshold to determine whether to enter the active mode. In some examples, the electronic device 102 may enter the active mode in response to determining 208 to enter the active mode.

Figure 3 is a flow diagram illustrating a more specific example of a method 300 for controlling electronic device activation based on ambient noise. The method 300 may be performed by the electronic device 102 described in connection with Figure 1. The electronic device 102 may receive 302 an audio signal. This may be accomplished as described in connection with Figure 1. For example, the electronic device 102 may capture the audio signal using one or more microphones and/or may receive the audio signal from another device.

The electronic device 102 may determine 304 an ambient noise level (e.g., SNR) based on an audio signal. This may be accomplished as described in connection with one or more of Figures 1–2.

The electronic device 102 may determine 306 whether a noisy condition is indicated. This may be accomplished as described in connection with one or more of Figures 1–2. For example, the electronic device 102 may compare the ambient noise level with a noise threshold. The comparison may indicate a noisy condition if a relationship between the ambient noise level corresponds to a noisy condition. For instance, a noise threshold may be 9 dB (SNR) in some configurations. Ambient noise levels (SNR) less than 9 dB may correspond to a noisy condition. Accordingly, if the ambient noise level is less than 9 dB, a noisy condition may be indicated. Otherwise, a noisy condition may not be indicated for this example. Additional or alternative comparisons may be utilized to determine whether a noisy condition is indicated. For instance, an ambient noise (e.g., noise power) that is greater than a noise threshold (e.g., noise power threshold) may indicate a noisy condition in some approaches.

In a case that a noisy condition is not indicated, the electronic device 102 may select 308 a first verification threshold. For example, the first verification threshold may be utilized for scenarios with less noise and/or greater target audio strength (e.g., SNR ≥ 9 dB) . In this example, the first verification threshold may be more stringent for user verification. For instance, the first verification threshold may be satisfied with a higher verification metric (e.g., better matching between a keyword and template data, greater confidence that the keyword was uttered by a designated user, etc. ) . Some examples of the first verification threshold may be 30, 35, 38, 40, 43, 45, 50, 60, 70, 75, 80, 90, 0.3, 0.4, 0.5, 0.6, 0.75, 0.8, 0.9, 0.95, etc. In some examples, the first verification threshold may be expressed as a percentage, proportion, or degree, etc.

In a case that a noisy condition is indicated, the electronic device 102 may select 310 a second verification threshold. For example, the second verification threshold may be utilized for scenarios with more noise and/or less target audio strength (e.g., SNR < 9 dB) . In this example, the second verification threshold may be less stringent for user verification. For instance, the second verification threshold may be satisfied with a lower verification metric (e.g., less stringent matching between a keyword and template data, less confidence that the keyword was uttered by a designated user, etc. ) . Some examples of the second verification threshold may be 7, 10, 15, 20, etc. In some examples, the second verification threshold may be expressed as a percentage, proportion, or degree, etc. The second verification threshold may be less than the first verification threshold. In some examples, selecting a verification threshold may include selecting a verification threshold from the first verification threshold and the second verification threshold, where the first verification threshold is different from (e.g., greater than, or less than) the second verification threshold.

The electronic device 102 may determine 312 whether to enter an active mode based on the selected verification threshold. This may be accomplished as described in connection with one or more of Figures 1–2. For example, the electronic device 102 may compare a verification metric to the first verification threshold or to the second verification threshold to determine whether to enter the active mode. The electronic device 102 may determine 312 to enter the active mode in a case that the verification metric satisfies the selected verification threshold. In a case that the verification metric does not satisfy the selected verification threshold (e.g., the first verification threshold or the second verification threshold) , operation may end 314.

In a case that the electronic device 102 determines 312 to enter the active mode, the electronic device 102 may enter 316 the active mode. This may be accomplished as described in connection with one or more of Figures 1–2. For example, the electronic device 102 may transition to the active mode from a passive mode. In some examples, the electronic device 102 may enter the active mode by enabling and/or allowing more operations and/or more inputs (than when in passive mode, for instance) .

Figure 4 is a flow diagram illustrating another more specific example of a method 400 for controlling electronic device activation based on ambient noise. The method 400 may be performed by the electronic device 102 described in connection with Figure 1. The electronic device 102 may receive 402 an audio signal. This may be accomplished as described in connection with one or more of Figures 1 or 3.

The electronic device 102 may determine 404 an ambient noise level (e.g., SNR) based on an audio signal. This may be accomplished as described in connection with one or more of Figures 1–3.

The electronic device 102 may detect 406 a keyword with an associated verification metric based on the audio signal. This may be accomplished as described in connection with Figure 1. For example, the electronic device 102 may detect a keyword by comparing the audio signal (e.g., a portion of the audio signal) to template data corresponding to a designated user or designated users. The comparison may indicate a degree of matching between the audio signal and the template data, which may indicate the verification metric. In some examples, detecting 406 the keyword and the associated verification metric may include performing a cosine similarity procedure, which may indicate a degree of matching (e.g., similarity) between the audio signal and the template data. In a case that the degree of matching satisfies a detection threshold, a keyword may be detected. The degree of matching for a detected keyword may be an example of the verification metric.

The electronic device 102 may determine 408 whether a noisy condition is indicated. This may be accomplished as described in connection with one or more of Figures 1–3.

In a case that a noisy condition is not indicated, the electronic device 102 may select 410 a first verification threshold. This may be accomplished as described in connection with Figure 3.

The electronic device 102 may determine 412 whether the verification metric satisfies the first verification threshold. For example, the electronic device 102 may compare the verification metric to the first verification threshold. In some examples, the first verification threshold may be satisfied if the verification metric indicates, relative to the first verification threshold, that the detected keyword was uttered by a designated user. The criterion or criteria for satisfying the first verification threshold may vary depending on configuration. In some examples, the first verification threshold may be satisfied if the verification metric is greater than the first verification threshold. In some examples, the first verification threshold may be satisfied if the verification metric is less than the first verification threshold. In some configurations, determining 412 whether the verification metric satisfies the first verification threshold may be an example of determining whether to enter an active mode. In a case that the verification metric does not satisfy the first verification threshold, operation may end 414. In a case that the verification metric satisfies the first verification threshold, the electronic device 102 may enter 420 an active mode. This may be accomplished as described in connection with one or more of Figures 1–3.

In a case that a noisy condition is indicated, the electronic device 102 may select 416 a second verification threshold. This may be accomplished as described in connection with Figure 3.

The electronic device 102 may determine 418 whether the verification metric satisfies the second verification threshold. For example, the electronic device 102 may compare the verification metric to the second verification threshold. In some examples, the second verification threshold may be satisfied if the verification metric indicates, relative to the second verification threshold, that the detected keyword was uttered by a designated user. The criterion or criteria for satisfying the second verification threshold may vary depending on configuration. In some examples, the second verification threshold may be satisfied if the verification metric is greater than the second verification threshold. In some examples, the second verification threshold may be satisfied if the verification metric is less than the second verification threshold. In some configurations, determining 418 whether the verification metric satisfies the second verification threshold may be an example of determining whether to enter an active mode. In a case that the verification metric does not satisfy the second verification threshold, operation may end 414. In a case that the verification metric satisfies the second verification threshold, the electronic device 102 may enter 420 an active mode. This may be accomplished as described in connection with one or more of Figures 1–3.

Variations of the method 400 may be implemented. For example, the template data may include multiple templates or references for a designated user. In some cases, a user may tend to speak differently in loud environments. For example, a user may alter one or more vocal characteristics (e.g., loudness, pitch, rate, syllable duration, vocal energy, accent, etc. ) in a loud environment in accordance with the Lombard effect. The vocal characteristic (s) may be altered relative to vocal characteristic (s) of the user in other environments and/or scenarios. In some configurations, the template data may include a first template (e.g., a default template according to a default user sound model) and a second template (e.g., a modified template according to a modified user sound model) . For instance, the electronic device 102 may be trained with a modified user sound model, where the modified user sound model may provide better detection performance in a noisy environment and/or scenario. Other numbers of templates may be utilized in other examples.

Relative to a default user sound model, the modified user sound model may be trained with different training data (e.g., recording files) . For example, a default user sound model may be trained with a user’s voice when speaking in a low-noise environment and/or scenario (e.g., in typical life) . For the modified user sound model training, for example, the electronic device 102 may utilize the user’s voice in a low-noise environment and/or scenario, the user’s voice in one or more noisy environments and/or scenarios (e.g., TV, car, indoor noise, etc. ) , and/or a combination of template data (e.g., recordings) of the user’s voice in low-noise and noisy environment (s) and/or scenario (s) .

In some examples, a modified user sound model may enable better detection performance in a noisy environment and/or scenario relative to the default user sound model. A modified user sound model may have higher imposter false alarm in other scenarios. In some configurations, the modified user sound model (e.g., second template, modified template) may be utilized for noisy environments, noisy scenarios, and/or for accessing non-secure functions and/or applications.

In some configurations, the electronic device 102 may select a template (e.g., user sound model, etc. ) based on the ambient noise level (e.g., SNR) . In some approaches, the electronic device 102 may select and/or utilize a first template (e.g., default template, default user sound model) in scenarios with higher SNR. For example, if the SNR is greater than a template threshold (e.g., 8 dB, 9 dB, etc. ) , the electronic device 102 may select the default template and/or may compare the default template with the detected keyword to produce the verification metric. In some approaches, the electronic device 102 may select and/or utilize a second template (e.g., modified template, modified user sound model) in scenarios with lower SNR. For example, if the SNR is less than or equal to the template threshold (e.g., 8 dB, 9 dB, etc. ) , the electronic device 102 may select the modified template and/or may compare the modified template with the detected keyword to produce the verification metric. The verification metric may be compared with a verification threshold as described herein to determine whether to enter 420 an active mode.

In some approaches, the electronic device 102 may select and/or utilize a first template (e.g., default template, default user sound model) in scenarios with higher SNR. For example, if the SNR is greater than a template threshold (e.g., 8 dB, 9 dB, etc. ) , the electronic device 102 may select the default template and/or may compare the default template with the detected keyword to produce the verification metric. In some approaches, the electronic device 102 may select and/or utilize a first template (e.g., default template, default user sound model) and a second template (e.g., modified template, modified user sound model) in scenarios with lower SNR. For example, if the SNR is less than or equal to the template threshold (e.g., 8 dB, 9 dB, etc. ) , the electronic device 102 may compare the default template with the detected keyword to produce a first verification metric and may compare the modified template with the detected keyword to produce a second verification metric. If the first verification metric satisfies a verification threshold (e.g., the first verification threshold) or if the second verification metric satisfies a verification threshold (e.g., the second verification threshold) , the electronic device 102 may enter 416 the active mode.

Figure 5 is a flow diagram illustrating another more specific example of a method 500 for controlling electronic device activation based on ambient noise. The method 500 may be performed by the electronic device 102 described in connection with Figure 1. The electronic device 102 may receive 502 an audio signal. This may be accomplished as described in connection with one or more of Figures 1 or 3–4.

The electronic device 102 may determine 504 a SNR based on an audio signal. This may be accomplished as described in connection with one or more of Figures 1–4.

The electronic device 102 may perform 506 noise suppression on the audio signal. This may be accomplished as described in connection with Figure 1. For example, the electronic device 102 may perform Weiner filtering, beamforming, and/or spectral subtraction, etc., to reduce and/or remove noise from the audio signal. For instance, the electronic device 102 may determine an estimate of a noise spectrum (e.g., average noise spectrum) and may subtract the noise spectrum from the audio signal. In some examples, the noise suppression may include stationary noise suppression and/or non-stationary noise suppression. Performing 506 noise suppression may increase a SNR of the audio signal.

The electronic device 102 may detect 508 a keyword with an associated verification metric based on the noise suppressed audio signal. This may be accomplished as described in connection with one or more of Figures 1 or 4. For example, the electronic device 102 may detect a keyword by comparing the noise suppressed audio signal (e.g., a portion of the noise suppressed audio signal) to template data corresponding to a designated user or designated users.

The electronic device 102 may determine 510 whether the SNR is less than the noise threshold. This may be accomplished as described in connection with one or more of Figures 1–4. For example, the electronic device 102 may compare the SNR to the noise threshold to determine if the SNR is less than the noise threshold or not less than (e.g., greater than or equal to) the noise threshold.

In a case that the SNR is not less than the noise threshold, the electronic device 102 may determine 512 whether the verification metric is greater than the first verification threshold. This may be accomplished as described in connection with Figure 4 in some configurations. For example, the electronic device 102 may compare the verification metric to the first verification threshold to determine if the verification metric is greater than the first verification threshold or not greater than (e.g., less than or equal to) the first verification threshold. In some configurations, determining 512 whether the verification metric is greater than the first verification threshold may be an example of determining whether to enter an active mode. In a case that the verification metric is not greater than the first verification threshold, operation may end 514. In a case that the verification metric is greater than the first verification threshold, the electronic device 102 may enter 516 an active mode. This may be accomplished as described in connection with one or more of Figures 1–4.

The electronic device 102 may provide 518 a first level of device access. A level of device access may be an indication of electronic device 102 functions and/or information (e.g., applications, operations, data, etc. ) that a user may access. For example, a first level of device access may be an unrestricted level of access, where a user may access all or virtually all functions and/or information of the electronic device 102, including sensitive functions and/or information (e.g., contacts list, financial information, stored media, user identification information, financial applications, social media applications, file browsing, sensitive data control (e.g., reading, writing, transferring, etc. ) , communication, messaging, email applications, etc. In some examples, another access level or levels may be utilized. For example, a second level of device access may restrict some of the function (s) and/or information of the electronic device 102, while allowing access to some function (s) and/or information. For instance, a second level of device access may allow access to non-sensitive function (s) and/or information. Examples of non-sensitive function (s) and/or information may include time, calendar, calculator, timer, stopwatch, volume, screen brightness, maps, navigation (without access to previously visited locations/addresses, etc. ) , Global Positioning System (GPS) location, emergency communication (e.g., emergency service calls and/or messaging) , power down, and/or image capture (without access to previously captured images or videos, etc. ) , etc.

In some examples, additional levels of device access may be implemented. In another example, a first level (where a user is verified with high confidence, for instance) may allow access to all device functionality, including account settings, financial transactions, etc. A second level (where a user is verified with medium confidence, for instance) may allow for calls to be placed and access to (but not modification of) photos, contacts, etc., and may not allow access to account information, financial information, or transaction functions, etc. A third level (where a user is verified with low confidence, for instance) may allow access only to non-secure functions (e.g., blank calendar, navigation, etc. ) . A fourth level (where a user is not verified, but an emergency keyword is detected) may allow access to emergency calling (and not other functions, for instance) . The verification thresholds corresponding to the levels may be predetermined and/or designated based on user input. Other examples of different numbers of levels with corresponding functions may be implemented in accordance with the systems and methods described herein.

In a case that the SNR is less than the noise threshold, the electronic device 102 may determine 520 whether the verification metric is greater than the second verification threshold. This may be accomplished as described in connection with Figure 4 in some configurations. For example, the electronic device 102 may compare the verification metric to the second verification threshold to determine if the verification metric is greater than the second verification threshold or not greater than (e.g., less than or equal to) the second verification threshold. In some configurations, determining 520 whether the verification metric is greater than the second verification threshold may be an example of determining whether to enter an active mode. In a case that the verification metric is not greater than the second verification threshold, operation may end 514. In a case that the verification metric is greater than the second verification threshold, the electronic device 102 may enter 522 an active mode. This may be accomplished as described in connection with one or more of Figures 1–4.

The electronic device 102 may determine 524 whether the verification metric is greater than the first verification threshold. This may be accomplished as described in connection with Figure 4 in some configurations. In a case that the verification metric is greater than the first verification threshold, the electronic device 102 may provide 518 a first level of device access. In some configurations, the electronic device may present a notification or message (e.g., visual notification or message on a display and/or audio notification or message via one or more speakers) indicating that sensitive data and/or application access is being restricted due to insufficient verification and/or ambient noise interference. In some configurations, the electronic device may additionally or alternatively present a notification or message (e.g., visual notification or message on a display and/or audio notification or message via one or more speakers) indicating that sensitive data and/or application (s) may be accessed with repeated and/or improved user verification. For example, the electronic device may allow additional and/or repeated verification to improve the verification metric (e.g., confidence) that the keyword corresponds to a designated user.

In a case that the verification metric is not greater than the first verification threshold, the electronic device 102 may provide 526 a second level of device access. In some examples, the second level of device access may be more restrictive than the first level of device access. For example, a second level of device access may restrict some of the function (s) and/or information of the electronic device 102, while allowing access to some function (s) and/or information as described above. In some configurations, two or more levels of device access may be utilized, where more stringent (e.g., higher) verification thresholds may correspond to greater device access, and/or where less stringent (e.g., lower) verification thresholds may correspond to less device access.

Figure 6 is a flow diagram illustrating another more specific example of a method 600 for controlling electronic device activation based on ambient noise. The method 600 may be performed by the electronic device 102 described in connection with Figure 1. The electronic device 102 may receive 602 an audio signal. This may be accomplished as described in connection with one or more of Figures 1 or 3–5.

The electronic device 102 may determine 604 a SNR based on an audio signal. This may be accomplished as described in connection with one or more of Figures 1–5.

The electronic device 102 may perform 606 noise suppression on the audio signal. This may be accomplished as described in connection with one or more of Figures 1 or 5.

The electronic device 102 may detect 608 a keyword with an associated verification metric based on the noise suppressed audio signal. This may be accomplished as described in connection with one or more of Figures 1 or 4–5.

The electronic device 102 may determine 610 whether the SNR is less than the noise threshold. This may be accomplished as described in connection with one or more of Figures 1–5.

In a case that the SNR is not less than the noise threshold, the electronic device 102 may determine 612 whether the verification metric is greater than the first verification threshold. This may be accomplished as described in connection with one or more of Figures 4 or 5 in some configurations. In a case that the verification metric is not greater than the first verification threshold, operation may end 614. In a case that the verification metric is greater than the first verification threshold, the electronic device 102 may enter 616 an active mode. This may be accomplished as described in connection with one or more of Figures 1–5.

In a case that the SNR is less than the noise threshold, the electronic device 102 may determine 620 whether the verification metric is greater than the second verification threshold. This may be accomplished as described in connection with one or more of Figures 4 or 5 in some configurations. In a case that the verification metric is not greater than the second verification threshold, operation may end 614. In a case that the verification metric is greater than the second verification threshold, the electronic device 102 may enter 616 an active mode. This may be accomplished as described in connection with one or more of Figures 1–5.

The electronic device 102 may determine 618 whether the verification metric is greater than a security threshold. For example, the electronic device may compare the verification metric with a security threshold or thresholds to determine a level of device access. The security threshold (s) may be different from one or more of the verification thresholds. For example, verification thresholds may be directly utilized to determine a level of device access in some configurations. In the example of Figure 6, a separate security threshold may be utilized to determine a level of device access separately from the determination of whether to enter the active mode. A separate security threshold may provide greater control and/or customization for device access in conjunction with voice activation.

In a case that the verification metric is greater than the security threshold, the electronic device 102 may provide 622 a first level of device access. This may be accomplished as described in connection with Figure 5.

In a case that the verification metric is not greater than the security threshold, the electronic device 102 may provide 624 a second level of device access. This may be accomplished as described in connection with Figure 5.

Figure 7 is a state diagram illustrating an example of modes and transitions that may be implemented in accordance with some configurations of the systems and methods described herein. Figure 7 illustrates a passive mode 732 and an active mode 734. The passive mode 732 and the active mode 734 may be examples of the passive mode and active mode described herein. As illustrated in Figure 7, an electronic device (e.g., electronic device 102 described in connection with Figure 1) may operate in a passive mode 732. In this example, the electronic device may transition from the passive mode 732 to the active mode 734 when a first verification threshold is satisfied 736 or when a second verification threshold is satisfied 738 for a noisy condition. The electronic device may transition from an active mode 734 to a passive mode 732 when a passive mode transition is triggered 740. In some examples, the passive mode transition may be triggered 740 based on an inactivity timer or a user input (e.g., button press, speech command, touchscreen tap, mouse click, etc. ) .

Figure 8 is a state diagram illustrating another example of modes and transitions that may be implemented in accordance with some configurations of the systems and methods described herein. Figure 8 illustrates a passive mode 832, active mode A 848, and active mode B 844. The passive mode 832, active mode A 848, and/or active mode B 844 may be examples of the passive mode and/or active mode described herein. Active mode A 848 may be an active mode with a first level of device access (e.g., unrestricted access) . Active mode B 844 may be an active mode with a second level of device access (e.g., restricted access) . As illustrated in Figure 8, an electronic device (e.g., electronic device 102 described in connection with Figure 1) may operate in a passive mode 832. In this example, the electronic device may transition from the passive mode 832 to active mode A 848 when a first verification threshold is satisfied 836. In this example, the electronic device may transition from the passive mode 832 to active mode B 844 when a second verification threshold is satisfied 846 for a noisy condition. The electronic device may transition from active mode B 844 to active mode A 848 when the first verification threshold is satisfied 842 for the noisy condition. The electronic device may transition from an active mode A 848 or active mode B 844 to a passive mode 832 when a passive mode transition is triggered 840a, 840b. In some examples, the passive mode transition may be triggered 840a, 840b based on an inactivity timer or a user input (e.g., button press, speech command, touchscreen tap, mouse click, etc. ) .

Figure 9 is a block diagram illustrating an example of elements or components that may be implemented in accordance with some configurations of the systems and methods disclosed herein. In some examples, one or more the elements or components described in connection with Figure 9 may be implemented in the electronic device 102 described in connection with Figure 1. For example, one or more of the elements or components described in connection with Figure 9 may be implemented in hardware (e.g., circuitry, ASICs, etc. ) and/or in a combination of hardware and software (e.g., a processor with instructions or code) . In some examples, one or more of the elements and/or components described in connection with Figure 9 may perform one or more of the functions and/or operations described in connection with one or more of Figures 1–8.

A microphone 950 may capture an audio signal 952. For example, the microphone 950 may convert an acoustic signal into an analog or digital electronic audio signal 952. The audio signal 952 may be provided to a noise suppressor 954. The noise suppressor 954 may produce a noise-suppressed audio signal 956 and an SNR 964. For example, the noise suppressor 954 may perform noise suppression and may determine the SNR 964 as described in connection with Figure 1. The noise-suppressed audio signal 956 may be provided to a keyword detector 958, and the SNR 964 may be provided to a verification threshold selector 966.

The keyword detector 958 may detect a keyword 960 based on the noise-suppressed audio signal 956. For example, the keyword detector 958 may detect the keyword 960 as described in connection with one or more of Figures 1 or 5–6. The keyword 960 may be provided to a mode controller 962.

The verification threshold selector 966 may select a verification threshold 968. For example, the verification threshold selector 966 may select a verification threshold as described in connection with one or more of Figures 1–6. The selected verification threshold 968 may be provided to the mode controller 962.

The mode controller 962 may control a mode of the electronic device. For example, the mode controller 962 may control whether the electronic device remains in a passive mode or transitions to an active mode based on the keyword 960 and the verification threshold 968. Additionally or alternatively, the mode controller 962 may control whether the electronic device remains in a passive mode or transitions to an active mode with an access level and/or transitions between active modes with different access levels. In some examples, the mode controller 962 may control the mode as described in connection with one or more of Figures 1–8. The mode controller 962 may produce one or more controls signals 970. The control signal (s) 970 may control one or more components of an electronic device to control the mode. For example, the control signal (s) 970 may control whether a display is activated, whether the electronic device will respond to additional input (e.g., voice commands, clicks, taps, motion, button presses, etc. ) , whether one or more applications and/or functions of the electronic device are accessible and/or operative, etc. For instance, the mode controller 962 may send the control signal (s) 970 to one or more electronic device components (e.g., display, memory, communication interface, camera, etc. ) to activate and/or increase the operations of the component (s) . Additionally or alternatively, the control signal (s) 970 may enable increased interactivity when transitioning to an active mode. For example, the control signal (s) 970 may enable the electronic device to listen for additional voice commands (e.g., in addition to the keyword for activation) .

Some configurations of the systems and methods described herein may provide a SNR check and/or user verification adjustment that automatically adjusts user verification based on environmental noise conditions. Some configurations may achieve an improved user experience for voice activation in a low-SNR and other conditions (e.g., better SNR conditions) . For example, a SNR may be determined using the noise suppressor. Once determined, if the SNR is < 9 dB (or another threshold value that can be adjusted based on sound model performance, for example) , the electronic device may switch to using user verification with a lower threshold automatically. Or, if the SNR is ≥ 9 dB (or another threshold value that can be adjusted based on sound model performance, for example) , the electronic device may automatically switch to using user verification with a higher or default threshold automatically.

Some tests of a user verification approach were conducted with the following results. In a TV program noise environment, the wake-up rate was 96.7%for a SNR of 10 dB and a wake-up rate of 56.7%for a SNR of 2 dB. Some tests of configurations of the systems and methods described were performed with the following results. In a simulation, the wake-up rate was 96.7%for a SNR of 10 dB and a wake-up rate of 90.0%for a SNR of 2 dB. As can be observed, the wake-up rate was increased from 56.7%to 90.0%in a low SNR environment. This may improve user experience.

Some benefits of some examples of the systems and methods described herein may include improved wake-up rates in low-SNR environments and automatic verification threshold adjustment based on ambient noise. Additionally or alternatively, the user verification performance may not be impacted in environments with higher SNR. In some examples, the systems and methods disclosed herein may be implemented on a variety of platforms to improve user experience.

Figure 10 illustrates certain components that may be included within an electronic device 1002 configured to implement various configurations of the systems and methods disclosed herein. Examples of the electronic device 1002 may include servers, cameras, video camcorders, digital cameras, cellular phones, smartphones, computers (e.g., desktop computers, laptop computers, etc. ) , tablet devices, media players, televisions, vehicles, automobiles, wearable cameras, virtual reality devices (e.g., headsets) , augmented reality devices (e.g., headsets) , mixed reality devices (e.g., headsets) , action cameras, robots, aircraft, drones, unmanned aerial vehicles (UAVs) , gaming consoles, personal digital assistants (PDAs) , smart appliances, etc. The electronic device 1002 may be implemented in accordance with one or more of the electronic devices (e.g., electronic device 102) described herein.

The electronic device 1002 includes a processor 1021. The processor 1021 may be a general purpose single-or multi-chip microprocessor (e.g., an ARM) , a special purpose microprocessor (e.g., a digital signal processor (DSP) ) , a microcontroller, a programmable gate array, etc. The processor 1021 may be referred to as a central processing unit (CPU) . Although just a single processor 1021 is shown in the electronic device 1002, in an alternative configuration, a combination of processors (e.g., an ARM and DSP) could be implemented.

The electronic device 1002 also includes memory 1001. The memory 1001 may be any electronic component capable of storing electronic information. The memory 1001 may be embodied as random access memory (RAM) , read-only memory (ROM) , magnetic disk storage media, optical storage media, flash memory devices in RAM, on-board memory included with the processor, EPROM memory, EEPROM memory, registers, and so forth, including combinations thereof.

Data 1005a and instructions 1003a may be stored in the memory 1001. The instructions 1003a may be executable by the processor 1021 to implement one or more of the methods, procedures, steps, and/or functions described herein. Executing the instructions 1003a may involve the use of the data 1005a that is stored in the memory 1001. When the processor 1021 executes the instructions 1003, various portions of the instructions 1003b may be loaded onto the processor 1021 and/or various pieces of data 1005b may be loaded onto the processor 1021.

The electronic device 1002 may also include a transmitter 1011 and/or a receiver 1013 to allow transmission and reception of signals to and from the electronic device 1002. The transmitter 1011 and receiver 1013 may be collectively referred to as a transceiver 1015. One or more antennas 1009a-b may be electrically coupled to the transceiver 1015. The electronic device 1002 may also include (not shown) multiple transmitters, multiple receivers, multiple transceivers and/or additional antennas.

The electronic device 1002 may include a digital signal processor (DSP) 1017. The electronic device 1002 may also include a communication interface 1019. The communication interface 1019 may allow and/or enable one or more kinds of input and/or output. For example, the communication interface 1019 may include one or more ports and/or communication devices for linking other devices to the electronic device 1002. In some configurations, the communication interface 1019 may include the transmitter 1011, the receiver 1013, or both (e.g., the transceiver 1015) . Additionally or alternatively, the communication interface 1019 may include one or more other interfaces (e.g., touchscreen, keypad, keyboard, microphone, camera, etc. ) . For example, the communication interface 1019 may enable a user to interact with the electronic device 1002.

The various components of the electronic device 1002 may be coupled together by one or more buses, which may include a power bus, a control signal bus, a status signal bus, a data bus, etc. For the sake of clarity, the various buses are illustrated in Figure 10 as a bus system 1007.

Some configurations of the systems and methods described herein may be beneficial. For example, some of the techniques described herein may improve a voice activation wake-up rate in a noisy and/or low-SNR environment, while not impacting user verification performance in other conditions (e.g., higher-SNR environments, etc. ) . For voice activation, for instance, some of the techniques may utilize a SNR check and/or user verification adjustment that automatically adjusts a user verification threshold based on environmental noise conditions. For example, in high-SNR conditions, user verification may be performed with a high threshold, to ensure good user verification performance. Upon detecting a noisy and/or low-SNR environment, operation may switch to utilize user verification with a lower threshold, to ensure a good wake-up rate and/or to provide a better user experience. In some examples in a TV program noise environment with a SNR of 2 dB, the wake-up rate may increase from 56.7%to 90.0%.

Some examples of the techniques described herein may be utilized for providing access at different security levels. For instance, a high user verification threshold may be applied to provide access to secure functions and/or applications (e.g., contacts list, stored media, etc. ) , which may ensure that the access corresponds to a designated user. Additionally or alternatively, a lower user verification threshold may be applied to provide access to non-secure functions and/or applications (e.g., general calendar, navigation, etc. ) , which may ensure a high access acceptance rate and/or may provide good user experience for noisy scenarios.

The term “determining” encompasses a wide variety of actions and, therefore, “determining” can include calculating, computing, processing, deriving, investigating, looking up (e.g., looking up in a table, a database or another data structure) , ascertaining and the like. Also, “determining” can include receiving (e.g., receiving information) , accessing (e.g., accessing data in a memory) and the like. Also, “determining” can include resolving, selecting, choosing, establishing, and the like.

The phrase “based on” does not mean “based only on, ” unless expressly specified otherwise. In other words, the phrase “based on” describes both “based only on” and “based at least on. ”

The term “processor” should be interpreted broadly to encompass a general purpose processor, a central processing unit (CPU) , a microprocessor, a digital signal processor (DSP) , a controller, a microcontroller, a state machine, and so forth. Under some circumstances, a “processor” may refer to an application specific integrated circuit (ASIC) , a programmable logic device (PLD) , a field programmable gate array (FPGA) , etc. In some examples, the term “processor” may refer to a combination of processing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

The term “memory” should be interpreted broadly to encompass any electronic component capable of storing electronic information. The term memory may refer to various types of processor-readable media such as random access memory (RAM) , read-only memory (ROM) , non-volatile random access memory (NVRAM) , programmable read-only memory (PROM) , erasable programmable read-only memory (EPROM) , electrically erasable PROM (EEPROM) , flash memory, magnetic or optical data storage, registers, etc. Memory is said to be in electronic communication with a processor if the processor can read information from and/or write information to the memory. Memory that is integral to a processor is in electronic communication with the processor.

The terms “instructions” and “code” should be interpreted broadly to include any type of computer-readable statement (s) . For example, the terms “instructions” and “code” may refer to one or more programs, routines, sub-routines, functions, procedures, etc. “Instructions” and “code” may comprise a single computer-readable statement or many computer-readable statements.

The functions described herein may be implemented in software or firmware being executed by hardware. The functions may be stored as one or more instructions on a computer-readable medium. The terms “computer-readable medium” or “computer-program product” refer to any tangible storage medium that can be accessed by a computer or a processor. By way of example, and not limitation, a computer-readable medium may comprise RAM, ROM, EEPROM, compact disc read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD) , laser disc, optical disc, digital versatile disc (DVD) , floppy disk, and

disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. It should be noted that a computer-readable medium may be tangible and non-transitory. The term “computer-program product” refers to a computing device or processor in combination with code or instructions (e.g., a “program” ) that may be executed, processed, or computed by the computing device or processor. As used herein, the term “code” may refer to software, instructions, code, or data that is/are executable by a computing device or processor.

Software or instructions may also be transmitted over a transmission medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL) , or wireless technologies such as infrared, radio and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio and microwave are included in the definition of transmission medium.

The methods disclosed herein comprise one or more steps or actions for achieving the described method. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of steps or actions is required for proper operation of the method that is being described, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims. It should also be noted that one or more steps and/or actions may be added to the method (s) and/or omitted from the method (s) in some configurations of the systems and methods disclosed herein. In some configurations, one or more elements of a method described herein may be combined with one or more elements of another method described herein.

Further, it should be appreciated that modules and/or other appropriate means for performing the methods and techniques described herein, can be downloaded, and/or otherwise obtained by a device. For example, a device may be coupled to a server to facilitate the transfer of means for performing the methods described herein. Alternatively, various methods described herein can be provided via a storage means (e.g., random access memory (RAM) , read-only memory (ROM) , a physical storage medium such as a compact disc (CD) or floppy disk, etc. ) , such that a device may obtain the various methods upon coupling or providing the storage means to the device.

As used herein, the term “and/or” should be interpreted to mean one or more items. For example, the phrase “A, B, and/or C” should be interpreted to mean any of: only A, only B, only C, A and B (but not C) , B and C (but not A) , A and C (but not B) , or all of A, B, and C. As used herein, the phrase “at least one of” should be interpreted to mean one or more items. For example, the phrase “at least one of A, B, and C” or the phrase “at least one of A, B, or C” should be interpreted to mean any of: only A, only B, only C, A and B (but not C) , B and C (but not A) , A and C (but not B) , or all of A, B, and C. As used herein, the phrase “one or more of” should be interpreted to mean one or more items. For example, the phrase “one or more of A, B, and C” or the phrase “one or more of A, B, or C” should be interpreted to mean any of: only A, only B, only C, A and B (but not C) , B and C (but not A) , A and C (but not B) , or all of A, B, and C.

It is to be understood that the claims are not limited to the precise configuration and components illustrated above. Various modifications, changes, and variations may be made in the arrangement, operation, and details of the systems, methods, and electronic device described herein without departing from the scope of the claims. For example, one or more functions or operations of the techniques described herein may be reordered.

Claims

A method performed by an electronic device, comprising:

determining an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal;

comparing the ambient noise level with a noise threshold;

selecting, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user; and

determining whether to enter an active mode based on the selected verification threshold.
The method of claim 1, wherein selecting the verification threshold comprises selecting among different verification thresholds based on whether the comparison indicates a noisy condition.
The method of claim 2, wherein determining whether to enter the active mode comprises comparing a verification metric with the selected verification threshold, and wherein the method further comprises entering the active mode in response to determining that the verification metric satisfies the selected verification threshold.
The method of claim 1, wherein selecting the verification threshold comprises selecting the verification threshold from a first verification threshold and a second verification threshold, wherein the first verification threshold is greater than the second verification threshold.
The method of claim 4, wherein selecting the verification threshold comprises selecting the first verification threshold in response to determining that the ambient noise level is not less than the noise threshold.
The method of claim 5, wherein determining whether to enter the active mode comprises comparing a verification metric with the first verification threshold in response to determining that the ambient noise level is not less than the noise threshold, and wherein the method further comprises entering the active mode in response to determining that the verification metric is greater than the first verification threshold.
The method of claim 4, wherein selecting the verification threshold comprises selecting the second verification threshold in response to determining that the ambient noise level is less than the noise threshold.
The method of claim 7, wherein determining whether to enter the active mode comprises comparing a verification metric with the second verification threshold in response to determining that the ambient noise level is less than the noise threshold, and wherein the method further comprises entering the active mode in response to determining that the verification metric is greater than the second verification threshold.
The method of claim 1, further comprising:

performing noise suppression on the audio signal; and

detecting a keyword with an associated verification metric based on the noise suppressed audio signal.
The method of claim 1, further comprising providing a first level of device access in response to determining that a verification metric satisfies a first verification threshold or providing a second level of device access in response to determining that the verification metric satisfies a second verification threshold.
An electronic device, comprising:

a memory;

a processor in electronic communication with the memory, wherein the processor is configured to:

determine an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal;

compare the ambient noise level with a noise threshold;

select, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user; and

determine whether to enter an active mode based on the selected verification threshold.
The electronic device of claim 11, wherein the processor is configured to select among different verification thresholds based on whether the comparison indicates a noisy condition.
The electronic device of claim 12, wherein the processor is configured to compare a verification metric with the selected verification threshold, and enter the active mode in response to determining that the verification metric satisfies the selected verification threshold.
The electronic device of claim 11, wherein the processor is configured to select the verification threshold from a first verification threshold and a second verification threshold, wherein the first verification threshold is greater than the second verification threshold.
The electronic device of claim 14, wherein the processor is configured to select the first verification threshold in response to determining that the ambient noise level is not less than the noise threshold.
The electronic device of claim 15, wherein the processor is configured to compare a verification metric with the first verification threshold in response to determining that the ambient noise level is not less than the noise threshold, and enter the active mode in response to determining that the verification metric is greater than the first verification threshold.
The electronic device of claim 14, wherein the processor is configured to select the second verification threshold in response to determining that the ambient noise level is less than the noise threshold.
The electronic device of claim 17, wherein the processor is configured to compare a verification metric with the second verification threshold in response to determining that the ambient noise level is less than the noise threshold, and enter the active mode in response to determining that the verification metric is greater than the second verification threshold.
The electronic device of claim 11, wherein the processor is configured to:

perform noise suppression on the audio signal; and

detect a keyword with an associated verification metric based on the noise suppressed audio signal.
The electronic device of claim 11, wherein the processor is configured to provide a first level of device access in response to determining that a verification metric satisfies a first verification threshold or is configured to provide a second level of device access in response to determining that the verification metric satisfies a second verification threshold.
A non-transitory tangible computer-readable medium storing computer-executable code, comprising:

code for causing a processor to determine an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal;

code for causing the processor to compare the ambient noise level with a noise threshold;

code for causing the processor to select, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user; and

code for causing the processor to determine whether to enter an active mode based on the selected verification threshold.
The computer-readable medium of claim 21, wherein the code for causing the processor to select the verification threshold comprises code for causing the processor to select among different verification thresholds based on whether the comparison indicates a noisy condition.
The computer-readable medium of claim 22, wherein the code for causing the processor to determine whether to enter the active mode comprises code for causing the processor to compare a verification metric with the selected verification threshold, and wherein the computer-readable medium further comprises code for causing the processor to enter the active mode in response to determining that the verification metric satisfies the selected verification threshold.
The computer-readable medium of claim 21, further comprising:

code for causing the processor to perform noise suppression on the audio signal; and

code for causing the processor to detect a keyword with an associated verification metric based on the noise suppressed audio signal.
The computer-readable medium of claim 21, further comprising code for causing the processor to provide a first level of device access in response to determining that a verification metric satisfies a first verification threshold or to provide a second level of device access in response to determining that the verification metric satisfies a second verification threshold.
An apparatus, comprising:

means for determining an ambient noise level based on a target audio level estimate and a noise level estimate of an audio signal;

means for comparing the ambient noise level with a noise threshold;

means for selecting, based on comparing the ambient noise level with the noise threshold, a verification threshold for determining whether at least a portion of the audio signal corresponds to a designated user; and

means for determining whether to enter an active mode based on the selected verification threshold.
The apparatus of claim 26, wherein the means for selecting the verification threshold comprises means for selecting among different verification thresholds based on whether the comparison indicates a noisy condition.
The apparatus of claim 27, wherein the means for determining whether to enter the active mode comprises means for comparing a verification metric with the selected verification threshold, and wherein the apparatus further comprises means for entering the active mode in response to determining that the verification metric satisfies the selected verification threshold.
The apparatus of claim 26, further comprising:

means for performing noise suppression on the audio signal; and

means for detecting a keyword with an associated verification metric based on the noise suppressed audio signal.
The apparatus of claim 26, further comprising means for providing a first level of apparatus access in response to determining that a verification metric satisfies a first verification threshold or providing a second level of apparatus access in response to determining that the verification metric satisfies a second verification threshold.