US20170214997A1 - Dynamic frequency-dependent sidetone generation - Google Patents
Dynamic frequency-dependent sidetone generation Download PDFInfo
- Publication number
- US20170214997A1 US20170214997A1 US15/005,974 US201615005974A US2017214997A1 US 20170214997 A1 US20170214997 A1 US 20170214997A1 US 201615005974 A US201615005974 A US 201615005974A US 2017214997 A1 US2017214997 A1 US 2017214997A1
- Authority
- US
- United States
- Prior art keywords
- microphone
- sidetone
- signal
- speech
- mode
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R1/00—Details of transducers, loudspeakers or microphones
- H04R1/10—Earpieces; Attachments therefor ; Earphones; Monophonic headphones
- H04R1/1041—Mechanical or electronic switches, or control elements
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/58—Anti-side-tone circuits
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/60—Substation equipment, e.g. for use by subscribers including speech amplifiers
- H04M1/6008—Substation equipment, e.g. for use by subscribers including speech amplifiers in the transmitter circuit
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/02—Circuits for transducers, loudspeakers or microphones for preventing acoustic reaction, i.e. acoustic oscillatory feedback
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/04—Circuits for transducers, loudspeakers or microphones for correcting frequency response
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02165—Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/02—Constructional features of telephone sets
- H04M1/03—Constructional features of telephone transmitters or receivers, e.g. telephone hand-sets
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/41—Electronic components, circuits, software, systems or apparatus used in telephone systems using speaker recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/74—Details of telephonic subscriber devices with voice recognition means
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2410/00—Microphones
- H04R2410/05—Noise reduction with a separate noise microphone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2460/00—Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
- H04R2460/01—Hearing devices using active noise cancellation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2460/00—Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
- H04R2460/05—Electronic compensation of the occlusion effect
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W88/00—Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
- H04W88/02—Terminal devices
Definitions
- the instant disclosure relates to personal audio devices. More specifically, portions of this disclosure relate to frequency-dependent sidetone generation in personal audio devices.
- Audio devices such as mobile/cellular telephones, in which users need to hear their own voice during use, are increasing in prevalence. Audio of a user's own voice can be injected into a speaker output being provided to a user. Such audio can be referred to as a sidetone. Sidetones are presented such that the user's voice is heard by the user in the headphones or other speaker as if the speaker and housing were not covering the ear. For example, due to the obstruction provided by the speaker and housing, one or both ears may be partially or totally blocked, which can result in distortion and attenuation of the user's voice in the ambient acoustic environment.
- occlusion effects are usually termed occlusion effects because they can result from occlusion of an ear, such as by a headphone, earphone, earbud, and the like.
- Sidetones have been used to at least partially remedy the occlusion problem.
- conventional sidetones do not always provide a natural sound, especially under changing conditions, such as with changes in the speaker type or position or changes in the environment.
- FIG. 1 provides an example schematic block diagram illustrating a conventional sidetone generation system according to the prior art.
- One drawback of the system of FIG. 1 is that the sidetone generation path is fixed. Thus, the generation of sidetones cannot be adapted to have different characteristics for different applications
- the overall performance and power utilization of an audio device may be improved with an adaptive sidetone generation system that generates sidetones selected for different application-specific problems.
- systems that include sidetone generation capabilities may include numerous microphones from which information may be received and processed to generate sidetones.
- the information from the microphones may be used to receive and/or determine the audio device's operating mode.
- the information from the microphones and the received and/or determined mode may then be used to generate a sidetone for the particular mode and particular conditions in which the audio device is operating.
- the audio signal quality may be improved, thus reducing the amount of subsequent audio processing required, and resulting in improved performance, improved power utilization, and improved user experience.
- an apparatus may include a first microphone configured to generate a first microphone signal; a second microphone configured to generate a second microphone signal; a sidetone circuit configured to perform steps comprising: receiving a mode of operation of a user device; and generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation; and/or a transducer for reproducing an audio signal and the sidetone signal.
- the first microphone is configured to receive speech input
- the sidetone circuit is configured to generate the sidetone signal by mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input.
- the sidetone circuit is further configured: to detect speech based on at least one of the first microphone signal and the second microphone signal; and/or to determine the mode of operation is a phone call mode when speech is detected.
- the received mode of operation includes at least one of Phone Call, Speaker Recognition, and Automatic Speech Recognition.
- the sidetone circuit is configured to generate the sidetone to improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the received mode of operation is phone call.
- the sidetone circuit may also be configured to cancel bone conducted speech in an output of the transducer when the mode of operation is phone call.
- the sidetone circuit may also be configured to generate the sidetone based, at least in part, on an automatic speech recognition (ASR) algorithm when no speech is detected and the audio signal is generated by an audio playback application; and/or otherwise, generate the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected.
- ASR automatic speech recognition
- the first microphone is configured to receive speech input
- the second microphone is configured to receive in-ear audio
- the sidetone circuit is further configured to: compare a frequency response of speech captured by the first microphone and the second microphone; track the compared frequency response over a period of time; and/or apply a compensation filter to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- the sidetone circuit is further configured to compensate for an occlusion effect, such as by processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear.
- the sidetone circuit may also be configured to cancel low frequency air conducted speech.
- a method for frequency-dependent sidetone generation in personal audio devices may include receiving a first microphone signal from a first microphone; receiving a second microphone signal from a second microphone; receiving a mode of operation of a user device; and/or generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation.
- the method may also include reproducing, at a transducer, a combination of an audio signal and the sidetone signal.
- receiving the first microphone signal includes receiving speech input
- generating the sidetone signal includes mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input.
- the step of receiving the mode of operation includes detecting speech based on at least one of the first microphone signal and the second microphone signal; and/or determining the mode of operation is a phone call mode when speech is detected.
- the received mode of operation includes at least one of Phone Call, Speaker Recognition, and Speech Recognition.
- the method may include generating the sidetone to improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the received mode of operation is phone call.
- the method may further include cancelling bone-conducted speech when the mode of operation is Phone Call.
- the method may also include at least one of: generating the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected; and/or generating the sidetone based, at least in part, on an automatic speech recognition (ASR) algorithm when no speech is detected and the audio signal is generated by an audio playback application.
- SR speaker recognition
- ASR automatic speech recognition
- the first microphone signal includes speech input
- the second microphone signal includes in-ear audio
- the method further includes comparing a frequency response of speech captured by the first microphone and the second microphone; tracking the compared frequency response over a period of time; and/or applying a compensation filter to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- the method may include generating the sidetone to compensate for an occlusion effect.
- the step of compensating for an occlusion effect may include processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear.
- an apparatus includes a controller configured to perform the steps including: receiving a first microphone signal from a first microphone; receiving a second microphone signal from a second microphone; determining a mode of operation of a user device; and/or generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the determined mode of operation.
- the controller may be further configured to perform the step of causing reproduction, at a transducer, of a combination of an audio signal and the sidetone signal.
- receiving the first microphone signal includes receiving speech input
- the step of generating the sidetone signal includes mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input.
- the step of determining a mode of operation includes: detecting speech based on at least one of the first microphone signal and the second microphone signal; and/or determining the mode of operation is a phone call mode when speech is detected.
- the determined mode of operation includes at least one of Phone Call, Speaker Recognition, and Speech Recognition.
- the controller is further configured to perform a step of generating the sidetone to improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the determined mode of operation is a phone call mode.
- the controller may also be configured to cancel bone conducted speech when the mode of operation is phone call.
- the controller may be further configured to perform at least one steps of: generate the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected; and generate the sidetone based, at least in part, on an automatic speech recognition (ASR) algorithm when no speech is detected and the audio signal is generated by an audio playback application.
- SR speaker recognition
- ASR automatic speech recognition
- the first microphone signal comprises speech input and the second microphone signal comprises in-ear audio
- the controller is further configured to perform steps including: comparing a frequency response of speech captured by the first microphone and the second microphone; tracking the compared frequency response over a period of time; and/or applying a compensation filter to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- the controller is further configured to generate the sidetone to compensate for an occlusion effect.
- the step of compensating for an occlusion effect may include processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear.
- FIG. 1 is an example schematic block diagram illustrating a conventional sidetone generation system according to the prior art.
- FIG. 2A is an example illustration of a personal audio system according to one embodiment of the disclosure.
- FIG. 2B is another example illustration of a personal audio system according to one embodiment of the disclosure.
- FIG. 3 is an example schematic block diagram illustrating a sidetone generation system according to one embodiment of the disclosure.
- FIG. 4 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure.
- FIG. 5 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure.
- FIG. 6 is an example flow chart illustrating a method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure.
- FIG. 7 is an example flow chart illustrating another method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure.
- a personal audio device may be a wireless headphone, a wireless telephone, an Internet protocol (IP) or other telephone handset, a gaming headset, or a communications headset for aircraft, motorcycle, or automotive systems.
- IP Internet protocol
- the personal audio device may include a sidetone generation circuit that has one or more adjustable parameters that may be selected for the particular equipment, configuration, physical position, and/or ambient environment to improve users' perception of their own voice via the sidetone information. The selection may be performed dynamically in response to a user command or in response to a voice-activity detector (VAD) indicating whether or not near speech is present.
- VAD voice-activity detector
- Frequency shaping to generate the sidetone may be included in the form of low-pass, high-pass, and/or band-pass filtering of the user's speech and other captured audio. Frequency shaping may also include low-frequency cutoff filtering that compensates for a low-frequency enhancement provided by bone conduction from the transducer(s) to the inner ear.
- the sidetone may be presented, along with playback audio, such as downlink audio, by a stereo headset.
- the stereo headset may include two monaural earphones, each having a speaker, for outputting the sidetone and playback audio.
- the stereo headset may also include a first microphone to capture the voice of the user and a second microphone to capture sounds reaching the user's ear.
- a sidetone-generating apparatus may operate on the signals generated by the microphones to select a sound level and frequency content of the user's voice that is heard by the user via feedback output to the speaker.
- the voice microphone may be a single microphone provided near the user's mouth, for example, on a boom or a lanyard.
- the sidetone may be presented by a wireless telephone having a transducer on the housing of the wireless telephone, and with a first microphone to capture the user's voice and a second microphone for capturing the output of the transducer to approximate the sound heard by the user's ear.
- the sidetone-generating apparatus in any of the above configurations may be implemented with or without active noise cancellation (ANC) circuits, which can use the microphones to form part of the ambient noise and ANC error measurements.
- ANC active noise cancellation
- One or more of the parameters derived for ANC operation such as a secondary-path response estimate, may be used in determining the gain and/or frequency response to be applied to the sidetone signal.
- ambient noise reduction can be provided by the monaural earphones sealing the ear canal or sealing over the ear.
- the sidetone-generating apparatus may equalize the sound level of the user's voice as detected by the first and second microphones and may include an additional pre-set gain offset appropriate to the method of noise reduction and the position of the microphone that detects the sound reaching the user's ear.
- the sidetone-generating apparatus may equalize the sound level of the user's voice as detected by the first and second microphones and further allow for manual user control of gain offset in order to achieve the most desirable sidetone level.
- FIG. 2A shows a wireless telephone 10 and a pair of earbuds EB 1 and EB 2 , each inserted in a corresponding ear 5 A, 5 B of a listener.
- Illustrated wireless telephone 10 is an example of a device that may include a sidetone-generating apparatus, but it is understood that not all of the elements or configurations illustrated in wireless telephone 10 , or in the circuits depicted in subsequent illustrations, are required. In particular, some or all of the circuits illustrated below as being within wireless telephone 10 may alternatively be implemented in a cord-mounted module that interconnects earbuds EB 1 , EB 2 in a wired configuration, or implemented within earbuds EB 1 , EB 2 themselves.
- Wireless telephone 10 may be connected to earbuds EB 1 , EB 2 by a wired or wireless connection, e.g., a BLUETOOTHTM connection (BLUETOOTH is a trademark of Bluetooth SIG, Inc.).
- Each of the earbuds EB 1 and EB 2 may have a corresponding transducer, such as speakers SPKR 1 and SPKR 2 , to reproduce audio, which may include distant speech received from wireless telephone 10 , ringtones, stored audio program material, and a sidetone, which is an injection of near-end speech, i.e., the speech of the user of wireless telephone 10 .
- the source audio may also include any other audio that wireless telephone 10 is required to reproduce, such as source audio from web-pages or other network communications received by wireless telephone 10 and audio indications such as battery low and other system event notifications.
- First microphones M 1 A, M 1 B for receiving the speech of the user may be provided on a surface of the housing of respective earbuds EB 1 , EB 2 , may alternatively be mounted on a boom, or alternatively located within a cord-mounted module 7 .
- first microphones M 1 A, M 1 B may also serve as reference microphones for measuring the ambient acoustic environment.
- Second microphones M 2 A, M 2 B may be provided in order to measure the audio reproduced by respective speakers SPKR 1 , SPKR 2 close to corresponding ears 5 A, 5 B when earbuds EB 1 , EB 2 are inserted in the outer portion of ears 5 A, 5 B so that the listener's perception of the sound reproduced by speakers SPKR 1 , SPKR 2 can be more accurately modeled.
- the determination of the response of sidetone information as heard by the user is utilized in the circuits described below.
- Second microphones M 2 A, M 2 B may function as error microphones in embodiments that include ANC as described below, providing a measure of the ambient noise canceling performance of the ANC system in addition to estimating the sidetone as heard by the user.
- Wireless telephone 10 includes circuits and features performing the sidetone generation as described below, in addition to optionally providing ANC functionality.
- a circuit 14 within wireless telephone 10 may include an audio integrated circuit 20 that receives the signals from first microphones M 1 A, M 1 B and second microphones M 2 A, M 2 B and interfaces with other integrated circuits such as an RF integrated circuit 12 containing the wireless telephone transceiver.
- An alternative location places a microphone M 1 C on the housing of wireless telephone 10 or a microphone M 1 D on cord-mounted module 7 .
- a wireless telephone 10 A includes the first and second microphones, the speaker, and the sidetone calibration. Equalization may be performed by an integrated circuit within wireless telephone 10 .
- the sidetone circuits will be described as provided within wireless telephone 10 , but the above variations are understandable by a person of ordinary skill in the art and the consequent signals that are required between earbuds EB 1 , EB 2 , wireless telephone 10 , and a third module, if required, can be easily determined for those variations.
- FIG. 2B shows an example wireless telephone 10 A, which includes a speaker SPKR held in proximity to a human ear 5 .
- Illustrated wireless telephone 10 A is an example of a device that may include a sidetone-generating apparatus, but it is understood that not all of the elements or configurations embodied in illustrated wireless telephone 10 A, or in the circuits depicted in subsequent illustrations, are required.
- Wireless telephone 10 A includes a transducer, such as a speaker SPKR, that reproduces distant speech received by wireless telephone 10 A along with other local audio events, such as ringtones, stored audio program material, near-end speech, sources from web-pages or other network communications received by wireless telephone 10 , and audio indications, such as battery low and other system event notifications.
- a microphone M 1 is provided to capture near-end speech, which is transmitted from wireless telephone 10 A to the other conversation participant(s).
- Wireless telephone 10 A includes sidetone circuits that inject an anti-noise signal into speaker SPKR to improve intelligibility of the distant speech and other audio reproduced by speaker SPKR.
- FIG. 2B illustrates various acoustic paths and points of reference that are also present in the system of FIG. 2A , but are illustrated only in FIG. 2B for clarity. Therefore, the discussion below is also applicable in the system of FIG. 2A and is understood to apply to earphone-based applications as well as housing-mounted-transducer applications.
- a second microphone, microphone M 2 is provided in order to measure the audio reproduced by speaker SPKR close to ear 5 , when wireless telephone 10 is in close proximity to ear 5 , in order to perform sidetone calibration, and in ANC applications, to provide an error signal indicative of the ambient audio sounds as heard by the user.
- the sidetone signal is optimized for the best frequency response and gain at a drum reference position DRP which represents the sound heard by the listener.
- Microphone M 2 measures the audio at an error reference position ERP, and the sidetone can be calibrated to obtain a desired result at error reference position ERP.
- Wireless telephone 10 A also includes audio integrated circuit 20 that receives the signals from a reference microphone REF, microphone M 1 , and microphone M 2 and interfaces with other integrated circuits such as RF integrated circuit 12 .
- the circuits and techniques disclosed herein may be incorporated in a single integrated circuit that contains control circuits and other functionality for implementing the entirety of the personal audio device, such as an MP3 player-on-a-chip integrated circuit.
- a third microphone, reference microphone REF is optionally provided for measuring the ambient acoustic environment in ANC application and is positioned away from the typical position of a user's mouth, so that the near-end speech is minimized in the signal produced by reference microphone REF.
- a primary acoustic path P(z) illustrates the response that is modeled adaptively in an ANC system in order to cancel ambient acoustic noise at error reference position ERP, and a secondary electro-acoustic path S(z) illustrates the response that is modeled in the instant disclosure for both sidetone equalization and for ANC operations that represents the transfer function from audio integrated circuit 20 through speaker SPKR and through microphone M 2 .
- FIG. 3 is an example schematic block diagram illustrating a sidetone generation system according to one embodiment of the disclosure. Specifically, FIG. 3 illustrates a sidetone generation scheme which can be implemented in a personal audio device.
- the sidetone generation system 300 may be implemented in audio integrated circuit 20 illustrated in FIGS. 2A and 2B . In some embodiments, sidetone generation system 300 may be implemented with or without adaptive noise cancellation.
- Sidetone generation system 300 includes at least sidetone processing block 310 , sidetone processing block 320 , and adaptive sidetone control block 330 .
- the sidetone generation system 300 may receive information from a first microphone 340 , a second microphone 350 , an audio source 360 , and/or a transducer 370 .
- Audio from the audio source 360 may include distant speech received by a personal audio device, such as wireless telephones 10 and 10 A illustrated in FIGS. 2A and 2B , along with other local audio events, such as ringtones, stored audio program material, near-end speech, sources from web-pages or other network communications received by the personal audio device, and audio indications, such as low battery and other system event notifications.
- first microphone 340 may correspond to any of microphones M 1 , M 1 C, or M 1 D illustrated in FIGS. 2A and 2B
- second microphone 350 may correspond to any of microphones M 1 A, M 1 B, M 2 A, M 2 B, or M 2 illustrated in FIGS. 2A and 2B .
- the sidetone generation system 300 may output an audio signal, such as an audio signal including audio from the audio source and a generated sidetone, to a transducer 370 .
- an audio signal such as an audio signal including audio from the audio source and a generated sidetone
- both the second microphone 350 and the transducer 370 may be in close proximity to a human ear 380 .
- the second microphone 350 and the transducer 370 may be located in an earphone, headphone, earbud, or other component capable of being placed in or around a human ear 380 .
- audio M from audio source 360 may be received by an audio processing block, such as sidetone generation block 300 , which provides the audio to transducer 370 to be audibly reproduced for audible reception by a user's ear 380 .
- an audio processing block such as sidetone generation block 300
- the audible content received by a human's ear 380 includes more than the audio M from the audio source 360 .
- a human ear 380 may hear undesired audio from other sources.
- FIG. 3 a human ear 380 may hear undesired audio from other sources.
- 3 includes some undesirable audio typically heard by a human's ear 380 , such as ambient noise N in-ear captured by ear 380 , air-conducted speech made up of low frequency air-conducted speech component S air-LF and high frequency air-conducted speech component S air-HF , and bone-conducted speech S bone .
- the undesired audio may degrade the quality of the desired audio heard by the user, thus necessitating quality enhancement via audio processing, such as processing by a sidetone generation system 300 .
- a sidetone generation system 300 includes sidetone processing block 310 , which may be used to generate a sidetone to improve the quality of the audio ultimately heard by the user.
- sidetone processing block 310 receives a first microphone signal 311 from first microphone 340 .
- the first microphone signal 311 may include ambient noise N AMB and air-conducted speech S air .
- sidetone processing block 310 may also receive a first feedback signal 313 from the transducer 370 .
- the first feedback signal 313 may include residual feedback, such as any signal that is fed back to sidetone processing block 310 as a result of the electrical configuration of sidetone processing block 310 or other electrical components of sidetone generation system 300 and that is still present after feedback cancellation.
- Sidetone processing block 310 may include a first processing block 312 to process the signals received by sidetone processing block 310 .
- First processing block 312 may be configured to perform high-pass filtering (HPF), feedback suppression (FBS), and ambient noise reduction (ANR). Accordingly, sound captured from first microphone 340 may be processed by first processing block 312 to remove ambient noise N AMB , boost high frequency speech that is passively attenuated before reaching the human's ear 380 , and remove residual feedback still present in the signal.
- first processing block 312 may include a minimum phase filter configured to perform some of its processing.
- Sidetone processing block 310 may also include a second processing block 314 to process the signals received by sidetone processing block 310 .
- the second processing block 314 may be configured to perform feedback cancellation so as to cancel as much of first feedback signal 313 as possible.
- second processing block 314 may perform the feedback cancellation by generating a signal that gets subtracted from the first microphone signal 311 , for example, by subtraction block 315 , to cancel out as much feedback as possible from the transducer.
- the output of the subtraction block 315 may be received by the first processing block 312 to suppress some of the residual feedback still present in the signal.
- Sidetone processing block 310 may also include memory elements.
- sidetone processing block 310 includes a first memory element 312 A for the storage of the results of the first processing block 312 .
- the first memory element 312 A may not store the results of the first processing block 312 , but may instead be capable of being manipulated by first processing block 312 .
- Sidetone processing block 310 also includes a second memory element 314 A for the storage of the results of the second processing block 314 .
- the second memory element 314 A may not store the results of the second processing block 314 , but may instead be capable of being manipulated by second processing block 314 .
- the sidetone generated by sidetone processing block 310 i.e., the signal that results after processing by sidetone processing block 310 , may consist primarily of the boosted high-frequency speech S air-HF .
- the sidetone S air-HF generated by sidetone processing block 310 may be subsequently combined with the audio signal M received from audio source 360 and the sidetone generated by sidetone processing block 320 , for example, by adding, using addition block 319 , the sidetone S air-HF to the audio signal M received from audio source 360 and the sidetone generated by sidetone processing block 320 .
- the combined signal may be transferred to transducer 370 for audible reproduction.
- sidetone generation system 300 also includes sidetone processing block 320 to further improve the quality of the audio ultimately heard by the user.
- sidetone processing block 320 may be used to reduce the effects of bone-conducted speech S bone and ambient noise N in-ear captured by a human's ear 380 as well as to boost high frequency speech that is passively attenuated before reaching the human's ear 380 .
- sidetone processing block 320 receives a second feedback signal 323 from the transducer 370 .
- the second feedback signal 323 may include residual feedback, such as any signal that is fed back to sidetone processing block 320 as a result of the electrical configuration of sidetone processing block 320 or other electrical components of sidetone generation system 300 and that is still present after feedback cancellation.
- sidetone processing block 320 receives a second input signal 328 that is a combination of audio signal M from audio source 360 and a second microphone signal 326 received from second microphone 350 .
- the second microphone signal 326 received from second microphone 350 may include audio signal M in-ear captured by a human's ear 380 , ambient noise N in-ear captured by a human's ear 380 , air-conducted speech S air , and bone-conducted speech S bone .
- the audio signal M in-ear captured by a human's ear 380 may be subtracted from audio signal M to obtain a signal 328 that includes primarily N in-ear , S air , and S bone .
- Signal 328 may be subsequently processed by sidetone processing block 320 to generate a sidetone to further improve the quality of the audio heard by the user.
- Sidetone processing block 320 may include a first processing block 322 to process the signals received by sidetone processing block 320 .
- First processing block 322 may be configured to perform high-pass filtering (HPF), feedback suppression (FBS), and ambient noise reduction (ANR). Accordingly, sound captured from first microphone 350 may be processed by first processing block 322 to remove N in-ear , S bone , and S air-LF , boost high frequency speech that is passively attenuated before reaching the human's ear 380 , and remove residual feedback still present in the signal.
- first processing block 322 may include and employ a minimum phase filter to perform some of its processing.
- Sidetone processing block 320 also includes a second processing block 324 to process the signals received by sidetone processing block 320 .
- second processing block 324 may be configured to perform feedback cancellation so as to cancel as much as possible of first feedback signal 323 .
- second processing block 324 may perform the feedback cancellation by generating a signal that gets subtracted from signal 328 , for example, by subtraction block 327 , to cancel out as much as possible feedback from the transducer.
- the output of the subtraction block 327 may be received by the first processing block 322 to suppress some of the residual feedback still present in the signal.
- Sidetone processing block 320 may also include memory elements.
- sidetone processing block 320 includes a first memory element 322 A for the storage of the results of the first processing block 322 .
- the first memory element 322 A may not store the results of the first processing block 322 , but may instead be manipulated by first processing block 322 .
- Sidetone processing block 320 also includes a second memory element 324 A for the storage of the results of the second processing block 324 .
- the second memory element 324 A may not store the results of the second processing block 324 , but may instead be manipulated by second processing block 324 .
- the sidetone generated by sidetone processing block 320 i.e., the signal that results after processing by sidetone processing block 320 , may consist primarily of the boosted high-frequency speech S air-HF .
- the sidetone S air-HF generated by sidetone processing block 320 may be subsequently combined with the audio signal M received from audio source 360 and the sidetone generated by sidetone processing block 310 , for example, by adding, using addition block 319 , the sidetone S air-HF to the audio signal M received from audio source 360 and the sidetone generated by sidetone processing block 310 .
- the combined signal may be transferred to transducer 370 for audible reproduction.
- sidetone generation system 300 also includes adaptive sidetone control block 330 .
- the adaptive sidetone control block 330 may be used to adapt sidetone processing blocks 310 and 320 to mix in a combination of signals from the first microphone 340 and the second microphone 350 to recover the high frequencies in a user's voice and generate an optimized sidetone.
- signal processing block 310 receives a first microphone signal 311 from first microphone 340 and signal processing block 320 receives a second input signal 328 that is a combination of audio signal M from audio source 360 and a second microphone signal 326 received from second microphone 350 .
- the adaptive sidetone control block 330 may adapt sidetone processing block 310 and sidetone processing block 320 such that the majority of the sidetone S air-HF transferred to transducer 370 is provided by the sidetone S air-HF generated by sidetone processing block 310 . In other embodiments, such as when there is a significant amount of noise or wind in the environment, the adaptive sidetone control block 330 may adapt sidetone processing block 310 and sidetone processing block 320 so that the majority of the sidetone S air-HF transferred to transducer 370 is provided by the sidetone S air-HF generated by sidetone processing block 320 .
- Adaptive sidetone control block 330 may determine how to balance the processing between sidetone processing block 310 and sidetone processing block 320 based on numerous factors, such as the mode in which the personal audio device is operating.
- adaptive sidetone control block 330 may receive a first microphone signal from a first microphone, such as microphone 340 , and a second microphone signal from a second microphone, such as microphone 350 . Based on processing of the first microphone signal and the second microphone signal, adaptive sidetone control block 330 may determine a mode of operation of the personal audio device. For example, adaptive control block 330 may determine whether the personal audio device is operating in a Phone Call, Speaker Recognition, and/or Speech Recognition mode.
- the adaptive sidetone control block 330 may detect speech based on at least one of the first microphone signal and the second microphone signal, and then determine that the mode of operation is Phone Call mode when speech is detected. Based on the determined mode of operation, adaptive control block 330 may adapt sidetone processing blocks 310 and 320 to mix in a combination of signals from the first microphone 340 and the second microphone 350 to generate an optimized sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the determined mode of operation.
- the adaptive sidetone control block 330 may adapt the processing of sidetone processing blocks 310 and 320 based on audio recognition algorithms.
- sidetone generation system 300 may generate the sidetone that gets transferred to transducer 370 along with audio signal M from audio source 360 based, at least in part, on a speaker recognition (SR) algorithm.
- SR speaker recognition
- sidetone generation may be based on an SR algorithm when no speech is detected.
- sidetone generation system 300 may generate the sidetone that gets transferred to transducer 370 along with audio signal M from audio source 360 based, at least in part, on an automatic speech recognition (ASR) algorithm.
- ASR automatic speech recognition
- sidetone generation may be based on an ASR algorithm when no speech is detected and the audio signal is generated by an audio playback application.
- the adaptive sidetone control block 330 may also be configured to monitor the frequency of received speech signals and adapt sidetone processing blocks 310 and 320 to generate an optimized sidetone signal.
- the first microphone signal 311 may include speech input and the second microphone signal 326 may include in-ear audio.
- adaptive sidetone control block 330 may be configured to compare a frequency response of speech captured by the first microphone and the second microphone and to track the compared frequency response over a period of time. Adaptive sidetone control block 330 may then adapt sidetone processing blocks 310 and 320 to apply compensation filtering to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- the adaptive sidetone control block 330 may also be configured to receive the mode of operation of the personal audio device.
- another component of the personal audio device such as an application processor, which may also include a voice-activity detector (VAD), may also receive a first microphone signal from a first microphone and a second microphone signal from a second microphone and determine, based on processing of the first microphone signal and the second microphone signal, the mode of operation of the personal audio device.
- VAD voice-activity detector
- a component of the personal audio device such as audio integrated circuit 20 illustrated in FIGS. 2A and 2B or a component including audio integrated circuit 20 illustrated in FIGS.
- the other component of the personal audio device which determines the mode of operation may also determine the mode of operation based on processing of information that does not include the first and/or second microphone signals. For example, the mode of operation may be determined by a component of the personal audio device based on input provided by a user. Regardless of how a component of personal audio device determines the mode of operation, adaptive sidetone control block 330 may subsequently be informed of the mode of operation.
- adaptive sidetone control block 330 may adapt sidetone processing blocks 310 and 320 to mix in a combination of signals from the first microphone 340 and the second microphone 350 to generate an optimized sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation.
- adaptive sidetone control block 330 may also receive instructions from another component of the personal audio device.
- a component of the personal audio device such as audio integrated circuit 20 illustrated in FIGS. 2A and 2B or a component including audio integrated circuit 20 illustrated in FIGS. 2A and 2B , may receive the first microphone signal 311 that includes speech input and the second microphone signal 326 that includes in-ear audio.
- the component may compare a frequency response of speech captured by the first microphone and the second microphone and to track the compared frequency response over a period of time.
- Adaptive sidetone control block 330 may then be informed of the results of the comparing and tracking and instructed to adapt sidetone processing blocks 310 and 320 to apply compensation filtering to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- FIG. 4 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure. Specifically, FIG. 4 illustrates a sidetone generation scheme that can be implemented in a personal audio device.
- the sidetone generation system 400 may be implemented in audio integrated circuit 20 illustrated in FIGS. 2A and 2B . In some embodiments, sidetone generation system 400 may be implemented with or without adaptive noise cancellation.
- Sidetone generation system 400 may be similar to sidetone generation system 300 .
- sidetone generation system 400 includes at least sidetone processing block 410 , sidetone processing block 420 , and adaptive sidetone control block 430 .
- the sidetone generation system 400 may receive information from at least the first microphone 340 , the second microphone 350 , and the audio source 360 .
- the sidetone generation system 400 may output an audio signal, such as an audio signal including audio signal M from the audio source 360 and a generated sidetone, to a transducer 370 .
- Sidetone generation system 400 includes sidetone processing blocks 410 and 420 .
- sidetone processing blocks 410 and 420 may perform the same functions as sidetone processing blocks 310 and 320 illustrated in FIG. 3 with the exception that sidetone processing blocks 410 and 420 may forego reception of and processing of feedback signals from the transducer, such as feedback signal 313 or feedback signal 323 illustrated in FIG. 3 .
- FIG. 4 illustrates additional features that may be incorporated into a sidetone generation system to generate optimized sidetones to further improve the quality of the audio heard by a user.
- a feed forward path 401 may be included through which undesired audio heard by a user may be canceled.
- the undesired audio that may be canceled or reduced in magnitude may include at least bone-conducted speech S bone , ambient noise N in-ear captured by a human's ear 380 , and low frequency speech S air-LF that may have been amplified before reaching the human's ear 380 .
- the sidetone processing block 420 receives a second input signal 328 that is a combination of audio signal M from audio source 360 and a second microphone signal 326 received from second microphone 350 .
- the second microphone signal 326 received from second microphone 350 may include audio signal M in-ear captured by a human's ear 380 , ambient noise N in-ear captured by a human's ear 380 , air-conducted speech S air , and bone-conducted speech S bone .
- the audio signal M in-ear captured by a human's ear 380 may be subtracted from audio signal M to obtain a signal 328 that includes primarily N in-ear , S air , and S bone .
- Signal 328 may be subsequently processed by sidetone processing block 320 to generate a sidetone to further improve the quality of the audio heard by the user.
- Signal 328 which includes N in-ear , S air , and S bone , may also be fed forward and combined with the signal being transferred to transducer 370 in order to directly cancel the undesired audio consisting of N in-ear , S air , and S bone heard by the user.
- the signal 328 may be fed forward via feed forward path 401 to subtraction block 402 .
- signal 328 including N in-ear , S air , and S bone may be subtracted from the combined signal including the sidetone signals generated by signal processing blocks 410 and 420 to be combined with the audio M from audio source 360 to obtain a final signal to be transferred to transducer 370 for audible reproduction.
- Adaptive sidetone control block 430 may operate similar to adaptive sidetone control block 330 .
- adaptive sidetone control block 430 may include the additional feature of processing signal 328 to further optimize the processing by sidetone processing blocks 410 and 420 to generate an optimized sidetone signal.
- adaptive sidetone control block 430 may receive signal 328 , which includes N in-ear , S air , and S bone , and, based on processing of signal 328 , adapt sidetone processing blocks 410 and 420 to mix in a combination of signals from the first microphone 340 and the second microphone 350 to generate an optimized sidetone signal.
- adaptive control block 430 may determine that the high frequency speech signals S air-HF output by one or both of the signal processing blocks 410 and 420 may need to be further amplified and thus instructing signal processing blocks 410 and 420 to further amplify the high frequency speech signals S air-HF they output.
- FIG. 5 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure. Specifically, FIG. 5 illustrates a sidetone generation scheme that can be implemented in a personal audio device.
- the sidetone generation system 500 may be implemented in audio integrated circuit 20 illustrated in FIGS. 2A and 2B . In some embodiments, sidetone generation system 500 may be implemented with or without adaptive noise cancellation.
- Sidetone generation system 500 is similar to sidetone generation system 400 , but includes additional features that may be incorporated into a sidetone generation system to generate optimized sidetones to further improve the quality of the audio heard by a user.
- FIG. 5 illustrates another feed forward path 503 through which undesired audio heard by a user may be further canceled.
- the additional undesired audio which may be canceled or reduced in magnitude may include at least ambient noise N in-ear captured by a human's ear 380 , and low frequency speech S air-LF that may have been amplified before reaching the human's ear 380 .
- signal 328 may also be fed forward and combined with the signal being transferred to transducer 370 in order to further directly cancel the undesired audio consisting of N in-ear and S air heard by the user.
- the signal 328 may be fed forward via feed forward path 401 to subtraction block 402 .
- FIG. 5 illustrates that N in-ear and S air may also be fed forward to subtraction block 402 via feed forward path 503 to further subtract N in-ear and S air from the signal that reaches transducer 370 .
- signal 328 including N in-ear , S air , and S bone fed forward via feed forward path 401 and signal components N in-ear and S air fed forward via feed forward path 503 may be subtracted from the combined signal including the sidetone signals generated by signal processing blocks 410 and 420 to be combined with the audio M from audio source 360 to obtain a final signal to be transferred to transducer 370 for audible reproduction.
- adaptive sidetone control block 530 illustrated in FIG. 5 may also include the additional feature of processing signal 328 to further optimize the processing by sidetone processing blocks 410 and 420 to generate an optimized sidetone signal.
- adaptive sidetone control block 530 may receive signal 328 , which includes N in-ear , S air , and S bone , and, based on processing of signal 328 , adapt sidetone processing blocks 410 and 420 to mix in a combination of signals from the first microphone 340 and the second microphone 350 to generate an optimized sidetone signal.
- FIGS. 3-5 illustrate different features of a sidetone generation system which may be configured to perform any one of the adaptation schemes illustrated in FIGS. 3-5 .
- a sidetone generation system may be configured to use an adaptive sidetone control block to adapt sidetone processing blocks in accordance with the manner in which sidetone processing blocks 310 and 320 are adapted in FIG. 3 .
- the sidetone generation system may use an adaptive sidetone control block to adapt sidetone processing blocks in accordance with the manner in which sidetone processing blocks 410 and 420 are adapted in FIG. 4 or 5 utilizing either scheme illustrated in FIG. 4 or 5 .
- adaptation may be based on numerous factors. For example, as disclosed throughout this specification, adaptation may be based on the mode of operation in which the audio device is operating. In particular, each mode of operation may be optimized utilizing different signal enhancement features. For example, in one mode, speech enhancement may be the primary feature to be optimized. In another mode, ambient noise cancellation may be the primary feature to be optimized. Accordingly, a sidetone generation system may use any of the sidetone generation schemes described above to optimize the generation of sidetones for a particular mode in which an audio device is operating.
- FIG. 6 is an example flow chart illustrating a method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure.
- Method 600 may be implemented with the systems described with respect to FIGS. 2-5 .
- Method 600 includes, at block 602 , receiving a first microphone signal from a first microphone, and, at block 604 , receiving a second microphone signal from a second microphone.
- receiving the first microphone signal such as at block 602 , may include receiving speech input.
- Method 600 includes, at block 606 , receiving a mode of operation of a user device.
- the modes of operation may include a Phone Call, Speaker Recognition, and/or Speech Recognition modes.
- receiving the mode of operation may include detecting speech based on at least one of the first microphone signal and the second microphone signal, and then determining that the mode of operation is Phone Call mode when speech is detected.
- Method 600 includes, at block 608 , generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation.
- a sidetone generation system may generate the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected.
- SR speaker recognition
- ASR automatic speech recognition
- generating the sidetone signal may include mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input.
- the sidetone After the sidetone has been generated, it may be combined with an audio signal and transferred to a transducer. Upon reception, the transducer may reproduce the combined audio signal and sidetone signal, yielding higher quality audio and improved user experience for consumer devices, such as personal audio players and mobile phones.
- Generating a sidetone may enhance the quality of the audio heard by a user. For example, generating the sidetone may improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the received and/or determined mode of operation is Phone Call mode. In one embodiment, the sidetone generation system may yield such improvements by cancelling bone-conducted speech when the mode of operation is Phone Call mode. In another embodiment, generating the sidetone may also compensate for an occlusion effect. Compensating for an occlusion effect may include processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear.
- the first microphone signal may include speech input, such as speech input obtained via microphone 340 illustrated in FIGS. 3-5
- the second microphone signal may include in-ear audio, such as audio obtained via microphone 350 illustrated in FIGS. 3-5
- a sidetone generation system or a processing block in communication with the sidetone generation system, may be configured to compare a frequency response of speech captured by the first microphone and the second microphone and to track the compared frequency response over a period of time. Based on the comparison and tracking, the sidetone generation system may be configured to apply compensation filtering to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone, as discussed above with respect to adaptive sidetone control block 330 .
- FIG. 7 is an example flow chart illustrating another method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure.
- Method 700 may be implemented with the systems described with respect to FIGS. 2-5 .
- method 700 may be implemented with or without adaptive noise cancellation.
- Method 700 includes, at block 702 detecting the mode of operation and signal quality associated with a use of an audio device.
- the mode of operation may be detected by an adaptive sidetone control block, or other processing component of an audio device, as discussed with reference to block 606 illustrated in FIG. 6 .
- the step of detecting may include detecting when someone is talking with a reasonable signal-to-noise ratio (SNR).
- SNR signal-to-noise ratio
- the detection may be based on microphone signals, such as signals from microphones on either ear, which may provide high correlation, microphones in an ear, or microphones on the personal audio device.
- the signals from a microphone in an ear may be received prior to cancellation.
- method 700 includes removing noise from a speech signal.
- the noise may be removed from a speech signal captured from a combination of microphones not in an ear piece and microphones in an ear piece.
- noise may be removed utilizing any one of the sidetone generation systems 300 , 400 , or 500 .
- the removal of noise may be accomplished using an ultra-low delay (ULD) filter.
- ULD ultra-low delay
- method 700 includes measuring the in-ear SNRs and creating a resulting signal based on a maximum SNR. For example, the ratio of the in-ear signal to noise may be measured for each microphone in close proximity to each ear, such as for each microphone in an ear piece. The signals may be processed to create higher-quality signals based on the maximum SNR. In other words, the amount of improvement in the signal quality may be limited by the maximum attainable SNR.
- the measuring may be performed by an adaptive sidetone control block disclosed herein or other processing component of an audio device in communication with a sidetone generation system disclosed herein.
- the resulting signal may be combined with an audio file, such as a media file, and transferred to a transducer for audible reproduction.
- the resulting signal may be combined with the audio file in a manner similar to the manner in which resulting signals from sidetone processing blocks illustrated in FIGS. 3-5 are combined with media signals, in which the signals are combined using addition block 319 .
- Method 700 may proceed to block 710 , wherein the frequency responses of speech captured by external microphones may be compared to speech captured by internal microphones.
- the comparison may be performed by an in-ear monitor (IEM) after cancellation of media audio.
- IEM in-ear monitor
- the compared frequency response may be tracked over a period of time, such as at block 712 .
- a compensation filter may be utilized to minimize the difference between the frequency responses of the captured speech signals as indicated by the comparison performed at block 710 .
- the comparison, tracking, and compensation filtering may be performed by a sidetone generation system described above, such as a combination of one or more of sidetone generation systems 300 , 400 , and 500 .
- method 700 may include determining whether to switch between filters.
- a sidetone generation system may determine the mode in which the audio device is operating, such as by performing the determination step at block 702 or receiving an indication of the mode of operation. If the system determines that the device is in an ambient listening mode and that the compensation scheme currently being utilized for sidetone generation is optimizing audio processing for voice correction, which is different than optimization required for an ambient listening mode, the sidetone generation system may switch the processing performed by filters within the sidetone generation system to optimize the generated sidetones for an ambient listening mode.
- FIGS. 6 and 7 are generally set forth as a logical flow chart diagrams. As such, the depicted orders and labeled steps are indicative of aspects of the disclosed methods. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated methods. Additionally, the formats and symbols employed are provided to explain the logical steps of the methods and are understood not to limit the scope of the methods. Although various arrow types and line types may be employed in the flow chart diagrams, they are understood not to limit the scope of the corresponding methods. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the methods. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted methods. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown.
- Computer-readable media includes physical computer storage media.
- a storage medium may be any available medium that can be accessed by a computer.
- such computer-readable media can comprise random access memory (RAM), read-only memory (ROM), electrically-erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer.
- Disk and disc includes compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), floppy disks and Blu-ray discs. Generally, disks reproduce data magnetically, and discs reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.
- instructions and/or data may be provided as signals on transmission media included in a communication apparatus.
- a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims.
Abstract
Description
- This application is related to subject matter disclosed in U.S. patent Ser. No. 14/197,814 to Kaller et al. filed on Mar. 5, 2014 and entitled “Frequency-dependent sidetone calibration,” which is published as U.S. Patent Application Publication No. 2015/0256660, and which is hereby incorporated by reference.
- The instant disclosure relates to personal audio devices. More specifically, portions of this disclosure relate to frequency-dependent sidetone generation in personal audio devices.
- Audio devices, such as mobile/cellular telephones, in which users need to hear their own voice during use, are increasing in prevalence. Audio of a user's own voice can be injected into a speaker output being provided to a user. Such audio can be referred to as a sidetone. Sidetones are presented such that the user's voice is heard by the user in the headphones or other speaker as if the speaker and housing were not covering the ear. For example, due to the obstruction provided by the speaker and housing, one or both ears may be partially or totally blocked, which can result in distortion and attenuation of the user's voice in the ambient acoustic environment. These effects are usually termed occlusion effects because they can result from occlusion of an ear, such as by a headphone, earphone, earbud, and the like. Sidetones have been used to at least partially remedy the occlusion problem. However, conventional sidetones do not always provide a natural sound, especially under changing conditions, such as with changes in the speaker type or position or changes in the environment.
- To illustrate the current state of the art,
FIG. 1 provides an example schematic block diagram illustrating a conventional sidetone generation system according to the prior art. One drawback of the system ofFIG. 1 is that the sidetone generation path is fixed. Thus, the generation of sidetones cannot be adapted to have different characteristics for different applications - Shortcomings mentioned here are only representative and are included simply to highlight that a need exists for improved electrical components, particularly for sidetone generation systems employed in personal audio devices, such as mobile phones. Embodiments described herein address certain shortcomings but not necessarily each and every one described here or known in the art.
- The overall performance and power utilization of an audio device may be improved with an adaptive sidetone generation system that generates sidetones selected for different application-specific problems. In particular, systems that include sidetone generation capabilities may include numerous microphones from which information may be received and processed to generate sidetones. The information from the microphones may be used to receive and/or determine the audio device's operating mode. The information from the microphones and the received and/or determined mode may then be used to generate a sidetone for the particular mode and particular conditions in which the audio device is operating. Through the dynamic generation of sidetones, rather than the conventional fixed sidetones, the audio signal quality may be improved, thus reducing the amount of subsequent audio processing required, and resulting in improved performance, improved power utilization, and improved user experience.
- According to one embodiment, an apparatus may include a first microphone configured to generate a first microphone signal; a second microphone configured to generate a second microphone signal; a sidetone circuit configured to perform steps comprising: receiving a mode of operation of a user device; and generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation; and/or a transducer for reproducing an audio signal and the sidetone signal.
- In certain embodiments, the first microphone is configured to receive speech input, and the sidetone circuit is configured to generate the sidetone signal by mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input. In addition, in some embodiments, the sidetone circuit is further configured: to detect speech based on at least one of the first microphone signal and the second microphone signal; and/or to determine the mode of operation is a phone call mode when speech is detected. According to an embodiment, the received mode of operation includes at least one of Phone Call, Speaker Recognition, and Automatic Speech Recognition.
- In another embodiment, the sidetone circuit is configured to generate the sidetone to improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the received mode of operation is phone call. The sidetone circuit may also be configured to cancel bone conducted speech in an output of the transducer when the mode of operation is phone call. The sidetone circuit may also be configured to generate the sidetone based, at least in part, on an automatic speech recognition (ASR) algorithm when no speech is detected and the audio signal is generated by an audio playback application; and/or otherwise, generate the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected.
- According to an embodiment, the first microphone is configured to receive speech input, the second microphone is configured to receive in-ear audio, and the sidetone circuit is further configured to: compare a frequency response of speech captured by the first microphone and the second microphone; track the compared frequency response over a period of time; and/or apply a compensation filter to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- In some embodiments, the sidetone circuit is further configured to compensate for an occlusion effect, such as by processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear. The sidetone circuit may also be configured to cancel low frequency air conducted speech.
- According to another embodiment, a method for frequency-dependent sidetone generation in personal audio devices may include receiving a first microphone signal from a first microphone; receiving a second microphone signal from a second microphone; receiving a mode of operation of a user device; and/or generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation. In addition, in some embodiments, the method may also include reproducing, at a transducer, a combination of an audio signal and the sidetone signal.
- In certain embodiments, receiving the first microphone signal includes receiving speech input, and generating the sidetone signal includes mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input. In addition, in some embodiments, the step of receiving the mode of operation includes detecting speech based on at least one of the first microphone signal and the second microphone signal; and/or determining the mode of operation is a phone call mode when speech is detected. According to an embodiment, the received mode of operation includes at least one of Phone Call, Speaker Recognition, and Speech Recognition.
- In another embodiment, the method may include generating the sidetone to improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the received mode of operation is phone call. The method may further include cancelling bone-conducted speech when the mode of operation is Phone Call. The method may also include at least one of: generating the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected; and/or generating the sidetone based, at least in part, on an automatic speech recognition (ASR) algorithm when no speech is detected and the audio signal is generated by an audio playback application.
- According to an embodiment, the first microphone signal includes speech input, the second microphone signal includes in-ear audio, and the method further includes comparing a frequency response of speech captured by the first microphone and the second microphone; tracking the compared frequency response over a period of time; and/or applying a compensation filter to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- In some embodiments, the method may include generating the sidetone to compensate for an occlusion effect. The step of compensating for an occlusion effect may include processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear.
- According to yet another embodiment, an apparatus includes a controller configured to perform the steps including: receiving a first microphone signal from a first microphone; receiving a second microphone signal from a second microphone; determining a mode of operation of a user device; and/or generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the determined mode of operation. In addition, the controller may be further configured to perform the step of causing reproduction, at a transducer, of a combination of an audio signal and the sidetone signal.
- In certain embodiments, receiving the first microphone signal includes receiving speech input, and the step of generating the sidetone signal includes mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input. In addition, in some embodiments, the step of determining a mode of operation includes: detecting speech based on at least one of the first microphone signal and the second microphone signal; and/or determining the mode of operation is a phone call mode when speech is detected. According to an embodiment, the determined mode of operation includes at least one of Phone Call, Speaker Recognition, and Speech Recognition.
- In another embodiment, the controller is further configured to perform a step of generating the sidetone to improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the determined mode of operation is a phone call mode. The controller may also be configured to cancel bone conducted speech when the mode of operation is phone call. The controller may be further configured to perform at least one steps of: generate the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected; and generate the sidetone based, at least in part, on an automatic speech recognition (ASR) algorithm when no speech is detected and the audio signal is generated by an audio playback application.
- According to an embodiment, the first microphone signal comprises speech input and the second microphone signal comprises in-ear audio, and the controller is further configured to perform steps including: comparing a frequency response of speech captured by the first microphone and the second microphone; tracking the compared frequency response over a period of time; and/or applying a compensation filter to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone.
- In some embodiments, the controller is further configured to generate the sidetone to compensate for an occlusion effect. The step of compensating for an occlusion effect may include processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear.
- The foregoing has outlined rather broadly certain features and technical advantages of embodiments of the present invention in order that the detailed description that follows may be better understood. Additional features and advantages will be described hereinafter that form the subject of the claims of the invention. It should be appreciated by those having ordinary skill in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same or similar purposes. It should also be realized by those having ordinary skill in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims. Additional features will be better understood from the following description when considered in connection with the accompanying figures. It is to be expressly understood, however, that each of the figures is provided for the purpose of illustration and description only and is not intended to limit the present invention.
- For a more complete understanding of the disclosed system and methods, reference is now made to the following descriptions taken in conjunction with the accompanying drawings.
-
FIG. 1 is an example schematic block diagram illustrating a conventional sidetone generation system according to the prior art. -
FIG. 2A is an example illustration of a personal audio system according to one embodiment of the disclosure. -
FIG. 2B is another example illustration of a personal audio system according to one embodiment of the disclosure. -
FIG. 3 is an example schematic block diagram illustrating a sidetone generation system according to one embodiment of the disclosure. -
FIG. 4 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure. -
FIG. 5 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure. -
FIG. 6 is an example flow chart illustrating a method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure. -
FIG. 7 is an example flow chart illustrating another method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure. - Sidetones described throughout this application may be used in personal audio devices, which may include one or more transducers such as a speaker. A personal audio device may be a wireless headphone, a wireless telephone, an Internet protocol (IP) or other telephone handset, a gaming headset, or a communications headset for aircraft, motorcycle, or automotive systems. The personal audio device may include a sidetone generation circuit that has one or more adjustable parameters that may be selected for the particular equipment, configuration, physical position, and/or ambient environment to improve users' perception of their own voice via the sidetone information. The selection may be performed dynamically in response to a user command or in response to a voice-activity detector (VAD) indicating whether or not near speech is present. Frequency shaping to generate the sidetone may be included in the form of low-pass, high-pass, and/or band-pass filtering of the user's speech and other captured audio. Frequency shaping may also include low-frequency cutoff filtering that compensates for a low-frequency enhancement provided by bone conduction from the transducer(s) to the inner ear.
- The sidetone may be presented, along with playback audio, such as downlink audio, by a stereo headset. The stereo headset may include two monaural earphones, each having a speaker, for outputting the sidetone and playback audio. The stereo headset may also include a first microphone to capture the voice of the user and a second microphone to capture sounds reaching the user's ear. A sidetone-generating apparatus may operate on the signals generated by the microphones to select a sound level and frequency content of the user's voice that is heard by the user via feedback output to the speaker. Alternatively, instead of providing a microphone on each earphone to capture the voice of the user, the voice microphone may be a single microphone provided near the user's mouth, for example, on a boom or a lanyard. In another alternative embodiment, the sidetone may be presented by a wireless telephone having a transducer on the housing of the wireless telephone, and with a first microphone to capture the user's voice and a second microphone for capturing the output of the transducer to approximate the sound heard by the user's ear.
- The sidetone-generating apparatus in any of the above configurations may be implemented with or without active noise cancellation (ANC) circuits, which can use the microphones to form part of the ambient noise and ANC error measurements. One or more of the parameters derived for ANC operation, such as a secondary-path response estimate, may be used in determining the gain and/or frequency response to be applied to the sidetone signal. Alternatively, or in combination, ambient noise reduction can be provided by the monaural earphones sealing the ear canal or sealing over the ear. The sidetone-generating apparatus may equalize the sound level of the user's voice as detected by the first and second microphones and may include an additional pre-set gain offset appropriate to the method of noise reduction and the position of the microphone that detects the sound reaching the user's ear. As yet another alternative, the sidetone-generating apparatus may equalize the sound level of the user's voice as detected by the first and second microphones and further allow for manual user control of gain offset in order to achieve the most desirable sidetone level.
-
FIG. 2A shows awireless telephone 10 and a pair of earbuds EB1 and EB2, each inserted in acorresponding ear Illustrated wireless telephone 10 is an example of a device that may include a sidetone-generating apparatus, but it is understood that not all of the elements or configurations illustrated inwireless telephone 10, or in the circuits depicted in subsequent illustrations, are required. In particular, some or all of the circuits illustrated below as being withinwireless telephone 10 may alternatively be implemented in a cord-mounted module that interconnects earbuds EB1, EB2 in a wired configuration, or implemented within earbuds EB1, EB2 themselves.Wireless telephone 10 may be connected to earbuds EB1, EB2 by a wired or wireless connection, e.g., a BLUETOOTH™ connection (BLUETOOTH is a trademark of Bluetooth SIG, Inc.). Each of the earbuds EB1 and EB2 may have a corresponding transducer, such as speakers SPKR1 and SPKR2, to reproduce audio, which may include distant speech received fromwireless telephone 10, ringtones, stored audio program material, and a sidetone, which is an injection of near-end speech, i.e., the speech of the user ofwireless telephone 10. The source audio may also include any other audio thatwireless telephone 10 is required to reproduce, such as source audio from web-pages or other network communications received bywireless telephone 10 and audio indications such as battery low and other system event notifications. - First microphones M1A, M1B for receiving the speech of the user may be provided on a surface of the housing of respective earbuds EB1, EB2, may alternatively be mounted on a boom, or alternatively located within a cord-mounted module 7. In embodiments that include adaptive noise-canceling (ANC) as described below, first microphones M1A, M1B may also serve as reference microphones for measuring the ambient acoustic environment. Second microphones M2A, M2B may be provided in order to measure the audio reproduced by respective speakers SPKR1, SPKR2 close to corresponding
ears ears -
Wireless telephone 10 includes circuits and features performing the sidetone generation as described below, in addition to optionally providing ANC functionality. Acircuit 14 withinwireless telephone 10 may include an audiointegrated circuit 20 that receives the signals from first microphones M1A, M1B and second microphones M2A, M2B and interfaces with other integrated circuits such as an RFintegrated circuit 12 containing the wireless telephone transceiver. An alternative location places a microphone M1C on the housing ofwireless telephone 10 or a microphone M1D on cord-mounted module 7. In other implementations, the circuits and techniques disclosed herein may be incorporated in a single integrated circuit that contains control circuits and other functionality for implementing the entirety of the personal audio device, such as an MP3 player-on-a-chip integrated circuit or a wireless telephone implemented within a single one of earbuds EB1, EB2. In other embodiments, as illustrated inFIG. 2B below, awireless telephone 10A includes the first and second microphones, the speaker, and the sidetone calibration. Equalization may be performed by an integrated circuit withinwireless telephone 10. For the purposes of illustration, the sidetone circuits will be described as provided withinwireless telephone 10, but the above variations are understandable by a person of ordinary skill in the art and the consequent signals that are required between earbuds EB1, EB2,wireless telephone 10, and a third module, if required, can be easily determined for those variations. -
FIG. 2B shows anexample wireless telephone 10A, which includes a speaker SPKR held in proximity to ahuman ear 5.Illustrated wireless telephone 10A is an example of a device that may include a sidetone-generating apparatus, but it is understood that not all of the elements or configurations embodied in illustratedwireless telephone 10A, or in the circuits depicted in subsequent illustrations, are required.Wireless telephone 10A includes a transducer, such as a speaker SPKR, that reproduces distant speech received bywireless telephone 10A along with other local audio events, such as ringtones, stored audio program material, near-end speech, sources from web-pages or other network communications received bywireless telephone 10, and audio indications, such as battery low and other system event notifications. A microphone M1 is provided to capture near-end speech, which is transmitted fromwireless telephone 10A to the other conversation participant(s). -
Wireless telephone 10A includes sidetone circuits that inject an anti-noise signal into speaker SPKR to improve intelligibility of the distant speech and other audio reproduced by speaker SPKR. Further,FIG. 2B illustrates various acoustic paths and points of reference that are also present in the system ofFIG. 2A , but are illustrated only inFIG. 2B for clarity. Therefore, the discussion below is also applicable in the system ofFIG. 2A and is understood to apply to earphone-based applications as well as housing-mounted-transducer applications. A second microphone, microphone M2, is provided in order to measure the audio reproduced by speaker SPKR close toear 5, whenwireless telephone 10 is in close proximity toear 5, in order to perform sidetone calibration, and in ANC applications, to provide an error signal indicative of the ambient audio sounds as heard by the user. Ideally, the sidetone signal is optimized for the best frequency response and gain at a drum reference position DRP which represents the sound heard by the listener. Microphone M2 measures the audio at an error reference position ERP, and the sidetone can be calibrated to obtain a desired result at error reference position ERP. Fixed equalization can be used to adjust the sidetone response to optimize the sidetone present at drum reference position DRP, and to additionally compensate for bone conduction due to contact between earbuds EB1, EB2 in the system ofFIG. 2A or contact with the housing ofwireless telephone 10A in the system ofFIG. 2B .Wireless telephone 10A also includes audiointegrated circuit 20 that receives the signals from a reference microphone REF, microphone M1, and microphone M2 and interfaces with other integrated circuits such as RFintegrated circuit 12. In other implementations, the circuits and techniques disclosed herein may be incorporated in a single integrated circuit that contains control circuits and other functionality for implementing the entirety of the personal audio device, such as an MP3 player-on-a-chip integrated circuit. A third microphone, reference microphone REF, is optionally provided for measuring the ambient acoustic environment in ANC application and is positioned away from the typical position of a user's mouth, so that the near-end speech is minimized in the signal produced by reference microphone REF. A primary acoustic path P(z) illustrates the response that is modeled adaptively in an ANC system in order to cancel ambient acoustic noise at error reference position ERP, and a secondary electro-acoustic path S(z) illustrates the response that is modeled in the instant disclosure for both sidetone equalization and for ANC operations that represents the transfer function from audiointegrated circuit 20 through speaker SPKR and through microphone M2. -
FIG. 3 is an example schematic block diagram illustrating a sidetone generation system according to one embodiment of the disclosure. Specifically,FIG. 3 illustrates a sidetone generation scheme which can be implemented in a personal audio device. For example, thesidetone generation system 300 may be implemented in audiointegrated circuit 20 illustrated inFIGS. 2A and 2B . In some embodiments,sidetone generation system 300 may be implemented with or without adaptive noise cancellation. -
Sidetone generation system 300 includes at leastsidetone processing block 310,sidetone processing block 320, and adaptivesidetone control block 330. Thesidetone generation system 300 may receive information from afirst microphone 340, asecond microphone 350, anaudio source 360, and/or atransducer 370. Audio from theaudio source 360 may include distant speech received by a personal audio device, such aswireless telephones FIGS. 2A and 2B , along with other local audio events, such as ringtones, stored audio program material, near-end speech, sources from web-pages or other network communications received by the personal audio device, and audio indications, such as low battery and other system event notifications. In some embodiments,first microphone 340 may correspond to any of microphones M1, M1C, or M1D illustrated inFIGS. 2A and 2B , andsecond microphone 350 may correspond to any of microphones M1A, M1B, M2A, M2B, or M2 illustrated inFIGS. 2A and 2B . - The
sidetone generation system 300 may output an audio signal, such as an audio signal including audio from the audio source and a generated sidetone, to atransducer 370. As illustrated inFIG. 3 , both thesecond microphone 350 and thetransducer 370 may be in close proximity to ahuman ear 380. For example, thesecond microphone 350 and thetransducer 370 may be located in an earphone, headphone, earbud, or other component capable of being placed in or around ahuman ear 380. - In operation, audio M from
audio source 360 may be received by an audio processing block, such assidetone generation block 300, which provides the audio totransducer 370 to be audibly reproduced for audible reception by a user'sear 380. Ideally, no processing of the received audio to enhance quality is necessary, and the human hears only the desired audio. However, the audible content received by a human'sear 380 includes more than the audio M from theaudio source 360. For example, as illustrated inFIG. 3 , ahuman ear 380 may hear undesired audio from other sources.FIG. 3 includes some undesirable audio typically heard by a human'sear 380, such as ambient noise Nin-ear captured byear 380, air-conducted speech made up of low frequency air-conducted speech component Sair-LF and high frequency air-conducted speech component Sair-HF, and bone-conducted speech Sbone. The undesired audio may degrade the quality of the desired audio heard by the user, thus necessitating quality enhancement via audio processing, such as processing by asidetone generation system 300. - A
sidetone generation system 300 includessidetone processing block 310, which may be used to generate a sidetone to improve the quality of the audio ultimately heard by the user. In particular,sidetone processing block 310 receives afirst microphone signal 311 fromfirst microphone 340. Thefirst microphone signal 311 may include ambient noise NAMB and air-conducted speech Sair. In addition,sidetone processing block 310 may also receive afirst feedback signal 313 from thetransducer 370. Thefirst feedback signal 313 may include residual feedback, such as any signal that is fed back tosidetone processing block 310 as a result of the electrical configuration ofsidetone processing block 310 or other electrical components ofsidetone generation system 300 and that is still present after feedback cancellation. -
Sidetone processing block 310 may include afirst processing block 312 to process the signals received bysidetone processing block 310.First processing block 312 may be configured to perform high-pass filtering (HPF), feedback suppression (FBS), and ambient noise reduction (ANR). Accordingly, sound captured fromfirst microphone 340 may be processed byfirst processing block 312 to remove ambient noise NAMB, boost high frequency speech that is passively attenuated before reaching the human'sear 380, and remove residual feedback still present in the signal. In some embodiments,first processing block 312 may include a minimum phase filter configured to perform some of its processing. -
Sidetone processing block 310 may also include a second processing block 314 to process the signals received bysidetone processing block 310. The second processing block 314 may be configured to perform feedback cancellation so as to cancel as much offirst feedback signal 313 as possible. In some embodiments, second processing block 314 may perform the feedback cancellation by generating a signal that gets subtracted from thefirst microphone signal 311, for example, bysubtraction block 315, to cancel out as much feedback as possible from the transducer. The output of thesubtraction block 315 may be received by thefirst processing block 312 to suppress some of the residual feedback still present in the signal. -
Sidetone processing block 310 may also include memory elements. For example,sidetone processing block 310 includes afirst memory element 312A for the storage of the results of thefirst processing block 312. In some embodiments, thefirst memory element 312A may not store the results of thefirst processing block 312, but may instead be capable of being manipulated byfirst processing block 312.Sidetone processing block 310 also includes asecond memory element 314A for the storage of the results of the second processing block 314. Like thefirst memory element 312A, thesecond memory element 314A may not store the results of the second processing block 314, but may instead be capable of being manipulated by second processing block 314. - The sidetone generated by
sidetone processing block 310, i.e., the signal that results after processing bysidetone processing block 310, may consist primarily of the boosted high-frequency speech Sair-HF. The sidetone Sair-HF generated bysidetone processing block 310 may be subsequently combined with the audio signal M received fromaudio source 360 and the sidetone generated bysidetone processing block 320, for example, by adding, usingaddition block 319, the sidetone Sair-HF to the audio signal M received fromaudio source 360 and the sidetone generated bysidetone processing block 320. The combined signal may be transferred totransducer 370 for audible reproduction. - As illustrated in
FIG. 3 ,sidetone generation system 300 also includessidetone processing block 320 to further improve the quality of the audio ultimately heard by the user. In particular,sidetone processing block 320 may be used to reduce the effects of bone-conducted speech Sbone and ambient noise Nin-ear captured by a human'sear 380 as well as to boost high frequency speech that is passively attenuated before reaching the human'sear 380. - In
FIG. 3 ,sidetone processing block 320 receives a second feedback signal 323 from thetransducer 370. Thesecond feedback signal 323 may include residual feedback, such as any signal that is fed back tosidetone processing block 320 as a result of the electrical configuration ofsidetone processing block 320 or other electrical components ofsidetone generation system 300 and that is still present after feedback cancellation. However, rather than receiving a microphone signal from the first microphone likesidetone processing block 310,sidetone processing block 320 receives asecond input signal 328 that is a combination of audio signal M fromaudio source 360 and asecond microphone signal 326 received fromsecond microphone 350. Thesecond microphone signal 326 received fromsecond microphone 350 may include audio signal Min-ear captured by a human'sear 380, ambient noise Nin-ear captured by a human'sear 380, air-conducted speech Sair, and bone-conducted speech Sbone. Atsubtraction block 327, the audio signal Min-ear captured by a human'sear 380 may be subtracted from audio signal M to obtain asignal 328 that includes primarily Nin-ear, Sair, and Sbone. Signal 328 may be subsequently processed bysidetone processing block 320 to generate a sidetone to further improve the quality of the audio heard by the user. -
Sidetone processing block 320 may include afirst processing block 322 to process the signals received bysidetone processing block 320.First processing block 322 may be configured to perform high-pass filtering (HPF), feedback suppression (FBS), and ambient noise reduction (ANR). Accordingly, sound captured fromfirst microphone 350 may be processed byfirst processing block 322 to remove Nin-ear, Sbone, and Sair-LF, boost high frequency speech that is passively attenuated before reaching the human'sear 380, and remove residual feedback still present in the signal. In some embodiments,first processing block 322 may include and employ a minimum phase filter to perform some of its processing. -
Sidetone processing block 320 also includes asecond processing block 324 to process the signals received bysidetone processing block 320. Specifically,second processing block 324 may be configured to perform feedback cancellation so as to cancel as much as possible offirst feedback signal 323. In some embodiments,second processing block 324 may perform the feedback cancellation by generating a signal that gets subtracted fromsignal 328, for example, bysubtraction block 327, to cancel out as much as possible feedback from the transducer. The output of thesubtraction block 327 may be received by thefirst processing block 322 to suppress some of the residual feedback still present in the signal. -
Sidetone processing block 320 may also include memory elements. For example,sidetone processing block 320 includes afirst memory element 322A for the storage of the results of thefirst processing block 322. In some embodiments, thefirst memory element 322A may not store the results of thefirst processing block 322, but may instead be manipulated byfirst processing block 322.Sidetone processing block 320 also includes asecond memory element 324A for the storage of the results of thesecond processing block 324. Like thefirst memory element 322A, thesecond memory element 324A may not store the results of thesecond processing block 324, but may instead be manipulated bysecond processing block 324. - As illustrated in
FIG. 3 , the sidetone generated bysidetone processing block 320, i.e., the signal that results after processing bysidetone processing block 320, may consist primarily of the boosted high-frequency speech Sair-HF. The sidetone Sair-HF generated bysidetone processing block 320 may be subsequently combined with the audio signal M received fromaudio source 360 and the sidetone generated bysidetone processing block 310, for example, by adding, usingaddition block 319, the sidetone Sair-HF to the audio signal M received fromaudio source 360 and the sidetone generated bysidetone processing block 310. The combined signal may be transferred totransducer 370 for audible reproduction. - As illustrated in
FIG. 3 ,sidetone generation system 300 also includes adaptivesidetone control block 330. The adaptive sidetone control block 330 may be used to adapt sidetone processing blocks 310 and 320 to mix in a combination of signals from thefirst microphone 340 and thesecond microphone 350 to recover the high frequencies in a user's voice and generate an optimized sidetone. For example, as illustrated inFIG. 3 ,signal processing block 310 receives afirst microphone signal 311 fromfirst microphone 340 andsignal processing block 320 receives asecond input signal 328 that is a combination of audio signal M fromaudio source 360 and asecond microphone signal 326 received fromsecond microphone 350. The adaptive sidetone control block 330 may adaptsidetone processing block 310 andsidetone processing block 320 such that the majority of the sidetone Sair-HF transferred totransducer 370 is provided by the sidetone Sair-HF generated bysidetone processing block 310. In other embodiments, such as when there is a significant amount of noise or wind in the environment, the adaptive sidetone control block 330 may adaptsidetone processing block 310 andsidetone processing block 320 so that the majority of the sidetone Sair-HF transferred totransducer 370 is provided by the sidetone Sair-HF generated bysidetone processing block 320. - Adaptive sidetone control block 330 may determine how to balance the processing between
sidetone processing block 310 andsidetone processing block 320 based on numerous factors, such as the mode in which the personal audio device is operating. In one embodiment, adaptive sidetone control block 330 may receive a first microphone signal from a first microphone, such asmicrophone 340, and a second microphone signal from a second microphone, such asmicrophone 350. Based on processing of the first microphone signal and the second microphone signal, adaptive sidetone control block 330 may determine a mode of operation of the personal audio device. For example,adaptive control block 330 may determine whether the personal audio device is operating in a Phone Call, Speaker Recognition, and/or Speech Recognition mode. The adaptive sidetone control block 330 may detect speech based on at least one of the first microphone signal and the second microphone signal, and then determine that the mode of operation is Phone Call mode when speech is detected. Based on the determined mode of operation,adaptive control block 330 may adapt sidetone processing blocks 310 and 320 to mix in a combination of signals from thefirst microphone 340 and thesecond microphone 350 to generate an optimized sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the determined mode of operation. - The adaptive sidetone control block 330 may adapt the processing of sidetone processing blocks 310 and 320 based on audio recognition algorithms. For example,
sidetone generation system 300 may generate the sidetone that gets transferred totransducer 370 along with audio signal M fromaudio source 360 based, at least in part, on a speaker recognition (SR) algorithm. According to one embodiment, sidetone generation may be based on an SR algorithm when no speech is detected. In another embodiment,sidetone generation system 300 may generate the sidetone that gets transferred totransducer 370 along with audio signal M fromaudio source 360 based, at least in part, on an automatic speech recognition (ASR) algorithm. For example, sidetone generation may be based on an ASR algorithm when no speech is detected and the audio signal is generated by an audio playback application. - The adaptive sidetone control block 330 may also be configured to monitor the frequency of received speech signals and adapt sidetone processing blocks 310 and 320 to generate an optimized sidetone signal. For example, the
first microphone signal 311 may include speech input and thesecond microphone signal 326 may include in-ear audio. In such embodiments, adaptive sidetone control block 330 may be configured to compare a frequency response of speech captured by the first microphone and the second microphone and to track the compared frequency response over a period of time. Adaptive sidetone control block 330 may then adapt sidetone processing blocks 310 and 320 to apply compensation filtering to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone. - The adaptive sidetone control block 330 may also be configured to receive the mode of operation of the personal audio device. For example, another component of the personal audio device, such as an application processor, which may also include a voice-activity detector (VAD), may also receive a first microphone signal from a first microphone and a second microphone signal from a second microphone and determine, based on processing of the first microphone signal and the second microphone signal, the mode of operation of the personal audio device. For example, a component of the personal audio device, such as audio
integrated circuit 20 illustrated inFIGS. 2A and 2B or a component including audio integratedcircuit 20 illustrated inFIGS. 2A and 2B , may detect speech based on at least one of the first microphone signal and the second microphone signal, and then determine that the mode of operation is Phone Call mode when speech is detected. In some embodiments, the other component of the personal audio device which determines the mode of operation may also determine the mode of operation based on processing of information that does not include the first and/or second microphone signals. For example, the mode of operation may be determined by a component of the personal audio device based on input provided by a user. Regardless of how a component of personal audio device determines the mode of operation, adaptive sidetone control block 330 may subsequently be informed of the mode of operation. Based on the received mode of operation, adaptive sidetone control block 330 may adapt sidetone processing blocks 310 and 320 to mix in a combination of signals from thefirst microphone 340 and thesecond microphone 350 to generate an optimized sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation. - In addition to receiving an indication of the mode of operation of the audio device, adaptive sidetone control block 330 may also receive instructions from another component of the personal audio device. For example, a component of the personal audio device, such as audio
integrated circuit 20 illustrated inFIGS. 2A and 2B or a component including audio integratedcircuit 20 illustrated inFIGS. 2A and 2B , may receive thefirst microphone signal 311 that includes speech input and thesecond microphone signal 326 that includes in-ear audio. In such embodiments, the component may compare a frequency response of speech captured by the first microphone and the second microphone and to track the compared frequency response over a period of time. Adaptive sidetone control block 330 may then be informed of the results of the comparing and tracking and instructed to adapt sidetone processing blocks 310 and 320 to apply compensation filtering to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone. -
FIG. 4 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure. Specifically,FIG. 4 illustrates a sidetone generation scheme that can be implemented in a personal audio device. For example, thesidetone generation system 400 may be implemented in audiointegrated circuit 20 illustrated inFIGS. 2A and 2B . In some embodiments,sidetone generation system 400 may be implemented with or without adaptive noise cancellation. -
Sidetone generation system 400 may be similar tosidetone generation system 300. For example, likesidetone generation system 300,sidetone generation system 400 includes at leastsidetone processing block 410,sidetone processing block 420, and adaptivesidetone control block 430. Thesidetone generation system 400 may receive information from at least thefirst microphone 340, thesecond microphone 350, and theaudio source 360. Thesidetone generation system 400 may output an audio signal, such as an audio signal including audio signal M from theaudio source 360 and a generated sidetone, to atransducer 370. -
Sidetone generation system 400 includes sidetone processing blocks 410 and 420. In some embodiments, sidetone processing blocks 410 and 420 may perform the same functions as sidetone processing blocks 310 and 320 illustrated inFIG. 3 with the exception that sidetone processing blocks 410 and 420 may forego reception of and processing of feedback signals from the transducer, such asfeedback signal 313 orfeedback signal 323 illustrated inFIG. 3 . -
FIG. 4 illustrates additional features that may be incorporated into a sidetone generation system to generate optimized sidetones to further improve the quality of the audio heard by a user. For example, a feedforward path 401 may be included through which undesired audio heard by a user may be canceled. The undesired audio that may be canceled or reduced in magnitude may include at least bone-conducted speech Sbone, ambient noise Nin-ear captured by a human'sear 380, and low frequency speech Sair-LF that may have been amplified before reaching the human'sear 380. - The
sidetone processing block 420 receives asecond input signal 328 that is a combination of audio signal M fromaudio source 360 and asecond microphone signal 326 received fromsecond microphone 350. Thesecond microphone signal 326 received fromsecond microphone 350 may include audio signal Min-ear captured by a human'sear 380, ambient noise Nin-ear captured by a human'sear 380, air-conducted speech Sair, and bone-conducted speech Sbone. Atsubtraction block 327, the audio signal Min-ear captured by a human'sear 380 may be subtracted from audio signal M to obtain asignal 328 that includes primarily Nin-ear, Sair, and Sbone. Signal 328 may be subsequently processed bysidetone processing block 320 to generate a sidetone to further improve the quality of the audio heard by the user. -
Signal 328, which includes Nin-ear, Sair, and Sbone, may also be fed forward and combined with the signal being transferred totransducer 370 in order to directly cancel the undesired audio consisting of Nin-ear, Sair, and Sbone heard by the user. For example, after sidetone processing blocks 410 and 420 output their sidetone signals to be combined with the audio M fromaudio source 360 ataddition block 319, thesignal 328 may be fed forward via feedforward path 401 tosubtraction block 402. Specifically, atsubtraction block 402, signal 328 including Nin-ear, Sair, and Sbone may be subtracted from the combined signal including the sidetone signals generated by signal processing blocks 410 and 420 to be combined with the audio M fromaudio source 360 to obtain a final signal to be transferred totransducer 370 for audible reproduction. - Adaptive sidetone control block 430 may operate similar to adaptive
sidetone control block 330. However, adaptive sidetone control block 430 may include the additional feature ofprocessing signal 328 to further optimize the processing by sidetone processing blocks 410 and 420 to generate an optimized sidetone signal. In other words, adaptive sidetone control block 430 may receive signal 328, which includes Nin-ear, Sair, and Sbone, and, based on processing ofsignal 328, adapt sidetone processing blocks 410 and 420 to mix in a combination of signals from thefirst microphone 340 and thesecond microphone 350 to generate an optimized sidetone signal. For example, based on the processing ofsignal 328,adaptive control block 430 may determine that the high frequency speech signals Sair-HF output by one or both of the signal processing blocks 410 and 420 may need to be further amplified and thus instructing signal processing blocks 410 and 420 to further amplify the high frequency speech signals Sair-HF they output. -
FIG. 5 is an example schematic block diagram illustrating another sidetone generation system according to one embodiment of the disclosure. Specifically,FIG. 5 illustrates a sidetone generation scheme that can be implemented in a personal audio device. For example, thesidetone generation system 500 may be implemented in audiointegrated circuit 20 illustrated inFIGS. 2A and 2B . In some embodiments,sidetone generation system 500 may be implemented with or without adaptive noise cancellation. -
Sidetone generation system 500 is similar tosidetone generation system 400, but includes additional features that may be incorporated into a sidetone generation system to generate optimized sidetones to further improve the quality of the audio heard by a user. For example,FIG. 5 illustrates another feedforward path 503 through which undesired audio heard by a user may be further canceled. The additional undesired audio which may be canceled or reduced in magnitude may include at least ambient noise Nin-ear captured by a human'sear 380, and low frequency speech Sair-LF that may have been amplified before reaching the human'sear 380. - Some components of
signal 328, such as Nin-ear and Sair, may also be fed forward and combined with the signal being transferred totransducer 370 in order to further directly cancel the undesired audio consisting of Nin-ear and Sair heard by the user. For example, as illustrated inFIG. 4 , after sidetone processing blocks 410 and 420 output their sidetone signals to be combined with the audio M fromaudio source 360 ataddition block 319, thesignal 328 may be fed forward via feedforward path 401 tosubtraction block 402.FIG. 5 illustrates that Nin-ear and Sair may also be fed forward to subtraction block 402 via feedforward path 503 to further subtract Nin-ear and Sair from the signal that reachestransducer 370. Specifically, atsubtraction block 402, signal 328 including Nin-ear, Sair, and Sbone fed forward via feedforward path 401 and signal components Nin-ear and Sair fed forward via feedforward path 503 may be subtracted from the combined signal including the sidetone signals generated by signal processing blocks 410 and 420 to be combined with the audio M fromaudio source 360 to obtain a final signal to be transferred totransducer 370 for audible reproduction. - As with adaptive sidetone control block 430 illustrated in
FIG. 4 , adaptive sidetone control block 530 illustrated inFIG. 5 may also include the additional feature ofprocessing signal 328 to further optimize the processing by sidetone processing blocks 410 and 420 to generate an optimized sidetone signal. In other words, adaptive sidetone control block 530 may receive signal 328, which includes Nin-ear, Sair, and Sbone, and, based on processing ofsignal 328, adapt sidetone processing blocks 410 and 420 to mix in a combination of signals from thefirst microphone 340 and thesecond microphone 350 to generate an optimized sidetone signal. - Selection and optimization of sidetones generated for audio signal enhancement may be effectuated by a combination of the schemes illustrated in
FIGS. 3-5 . In other words,FIGS. 3-5 illustrate different features of a sidetone generation system which may be configured to perform any one of the adaptation schemes illustrated inFIGS. 3-5 . For example, a sidetone generation system may be configured to use an adaptive sidetone control block to adapt sidetone processing blocks in accordance with the manner in which sidetone processing blocks 310 and 320 are adapted inFIG. 3 . In another example, the sidetone generation system may use an adaptive sidetone control block to adapt sidetone processing blocks in accordance with the manner in which sidetone processing blocks 410 and 420 are adapted inFIG. 4 or 5 utilizing either scheme illustrated inFIG. 4 or 5 . - The foregoing adaptation may be based on numerous factors. For example, as disclosed throughout this specification, adaptation may be based on the mode of operation in which the audio device is operating. In particular, each mode of operation may be optimized utilizing different signal enhancement features. For example, in one mode, speech enhancement may be the primary feature to be optimized. In another mode, ambient noise cancellation may be the primary feature to be optimized. Accordingly, a sidetone generation system may use any of the sidetone generation schemes described above to optimize the generation of sidetones for a particular mode in which an audio device is operating.
- In view of the systems shown and described herein, methodologies that may be implemented in accordance with the disclosed subject matter will be better appreciated with reference to various functional block diagrams. While, for purposes of simplicity of explanation, methodologies are shown and described as a series of acts/blocks, it is to be understood and appreciated that the claimed subject matter is not limited by the number or order of blocks, as some blocks may occur in different orders and/or at substantially the same time with other blocks from what is depicted and described herein. Moreover, not all illustrated blocks may be required to implement methodologies described herein. It is to be appreciated that functionality associated with blocks may be implemented by software, hardware, a combination thereof or any other suitable means (e.g. device, system, process, or component). Additionally, it should be further appreciated that methodologies disclosed throughout this specification are capable of being stored on an article of manufacture to facilitate transporting and transferring such methodologies to various devices. Those skilled in the art will understand and appreciate that a methodology could alternatively be represented as a series of interrelated states or events, such as in a state diagram.
-
FIG. 6 is an example flow chart illustrating a method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure.Method 600 may be implemented with the systems described with respect toFIGS. 2-5 .Method 600 includes, atblock 602, receiving a first microphone signal from a first microphone, and, atblock 604, receiving a second microphone signal from a second microphone. In some embodiments, receiving the first microphone signal, such as atblock 602, may include receiving speech input. -
Method 600 includes, atblock 606, receiving a mode of operation of a user device. The modes of operation may include a Phone Call, Speaker Recognition, and/or Speech Recognition modes. In some embodiments, receiving the mode of operation may include detecting speech based on at least one of the first microphone signal and the second microphone signal, and then determining that the mode of operation is Phone Call mode when speech is detected. -
Method 600 includes, atblock 608, generating a sidetone signal based, at least in part, on the first microphone signal and the second microphone signal and the received mode of operation. For example, a sidetone generation system may generate the sidetone based, at least in part, on a speaker recognition (SR) algorithm when no speech is detected. In another embodiment, a sidetone generation system may generate the sidetone based, at least in part, on an automatic speech recognition (ASR) algorithm when no speech is detected and the audio signal is generated by an audio playback application. In some embodiments, generating the sidetone signal may include mixing a combination of the first microphone signal and the second microphone signal to recover high frequencies in the received speech input. - After the sidetone has been generated, it may be combined with an audio signal and transferred to a transducer. Upon reception, the transducer may reproduce the combined audio signal and sidetone signal, yielding higher quality audio and improved user experience for consumer devices, such as personal audio players and mobile phones.
- Generating a sidetone, such as at
block 608, may enhance the quality of the audio heard by a user. For example, generating the sidetone may improve voice characteristics including at least one of louder speech and enhanced signal-to-noise when the received and/or determined mode of operation is Phone Call mode. In one embodiment, the sidetone generation system may yield such improvements by cancelling bone-conducted speech when the mode of operation is Phone Call mode. In another embodiment, generating the sidetone may also compensate for an occlusion effect. Compensating for an occlusion effect may include processing sound to match a frequency response of the transducer to simulate a frequency response of an open ear. - In some embodiments, the first microphone signal may include speech input, such as speech input obtained via
microphone 340 illustrated inFIGS. 3-5 , and the second microphone signal may include in-ear audio, such as audio obtained viamicrophone 350 illustrated inFIGS. 3-5 . In such embodiments, a sidetone generation system, or a processing block in communication with the sidetone generation system, may be configured to compare a frequency response of speech captured by the first microphone and the second microphone and to track the compared frequency response over a period of time. Based on the comparison and tracking, the sidetone generation system may be configured to apply compensation filtering to minimize a difference of the frequency response of speech captured by the first microphone and the second microphone, as discussed above with respect to adaptivesidetone control block 330. -
FIG. 7 is an example flow chart illustrating another method for frequency-dependent sidetone generation in personal audio devices according to one embodiment of the disclosure. Method 700 may be implemented with the systems described with respect toFIGS. 2-5 . In some embodiments, method 700 may be implemented with or without adaptive noise cancellation. Method 700 includes, atblock 702 detecting the mode of operation and signal quality associated with a use of an audio device. For example, the mode of operation may be detected by an adaptive sidetone control block, or other processing component of an audio device, as discussed with reference to block 606 illustrated inFIG. 6 . According to an embodiment, the step of detecting may include detecting when someone is talking with a reasonable signal-to-noise ratio (SNR). In some embodiments, the detection may be based on microphone signals, such as signals from microphones on either ear, which may provide high correlation, microphones in an ear, or microphones on the personal audio device. According to another embodiment, the signals from a microphone in an ear may be received prior to cancellation. - At
block 704, method 700 includes removing noise from a speech signal. In particular, the noise may be removed from a speech signal captured from a combination of microphones not in an ear piece and microphones in an ear piece. For example, noise may be removed utilizing any one of thesidetone generation systems - At
block 706, method 700 includes measuring the in-ear SNRs and creating a resulting signal based on a maximum SNR. For example, the ratio of the in-ear signal to noise may be measured for each microphone in close proximity to each ear, such as for each microphone in an ear piece. The signals may be processed to create higher-quality signals based on the maximum SNR. In other words, the amount of improvement in the signal quality may be limited by the maximum attainable SNR. In some embodiments, the measuring may be performed by an adaptive sidetone control block disclosed herein or other processing component of an audio device in communication with a sidetone generation system disclosed herein. Atblock 708, the resulting signal may be combined with an audio file, such as a media file, and transferred to a transducer for audible reproduction. For example, the resulting signal may be combined with the audio file in a manner similar to the manner in which resulting signals from sidetone processing blocks illustrated inFIGS. 3-5 are combined with media signals, in which the signals are combined usingaddition block 319. - Method 700 may proceed to block 710, wherein the frequency responses of speech captured by external microphones may be compared to speech captured by internal microphones. For example, the comparison may be performed by an in-ear monitor (IEM) after cancellation of media audio. In addition to comparing the frequency responses, the compared frequency response may be tracked over a period of time, such as at
block 712. Atblock 714, a compensation filter may be utilized to minimize the difference between the frequency responses of the captured speech signals as indicated by the comparison performed atblock 710. In some embodiments, the comparison, tracking, and compensation filtering may be performed by a sidetone generation system described above, such as a combination of one or more ofsidetone generation systems - At
block 716, method 700 may include determining whether to switch between filters. For example, a sidetone generation system may determine the mode in which the audio device is operating, such as by performing the determination step atblock 702 or receiving an indication of the mode of operation. If the system determines that the device is in an ambient listening mode and that the compensation scheme currently being utilized for sidetone generation is optimizing audio processing for voice correction, which is different than optimization required for an ambient listening mode, the sidetone generation system may switch the processing performed by filters within the sidetone generation system to optimize the generated sidetones for an ambient listening mode. - The schematic flow chart diagrams of
FIGS. 6 and 7 are generally set forth as a logical flow chart diagrams. As such, the depicted orders and labeled steps are indicative of aspects of the disclosed methods. Other steps and methods may be conceived that are equivalent in function, logic, or effect to one or more steps, or portions thereof, of the illustrated methods. Additionally, the formats and symbols employed are provided to explain the logical steps of the methods and are understood not to limit the scope of the methods. Although various arrow types and line types may be employed in the flow chart diagrams, they are understood not to limit the scope of the corresponding methods. Indeed, some arrows or other connectors may be used to indicate only the logical flow of the methods. For instance, an arrow may indicate a waiting or monitoring period of unspecified duration between enumerated steps of the depicted methods. Additionally, the order in which a particular method occurs may or may not strictly adhere to the order of the corresponding steps shown. - If implemented in firmware and/or software, functions described above may be stored as one or more instructions or code on a computer-readable medium. Examples include non-transitory computer-readable media encoded with a data structure and computer-readable media encoded with a computer program. Computer-readable media includes physical computer storage media. A storage medium may be any available medium that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise random access memory (RAM), read-only memory (ROM), electrically-erasable programmable read-only memory (EEPROM), compact disc read-only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Disk and disc includes compact discs (CD), laser discs, optical discs, digital versatile discs (DVD), floppy disks and Blu-ray discs. Generally, disks reproduce data magnetically, and discs reproduce data optically. Combinations of the above should also be included within the scope of computer-readable media.
- In addition to storage on computer readable medium, instructions and/or data may be provided as signals on transmission media included in a communication apparatus. For example, a communication apparatus may include a transceiver having signals indicative of instructions and data. The instructions and data are configured to cause one or more processors to implement the functions outlined in the claims.
- Although the present disclosure and certain representative advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the present disclosure, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.
Claims (33)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/005,974 US9729957B1 (en) | 2016-01-25 | 2016-01-25 | Dynamic frequency-dependent sidetone generation |
GB1603392.0A GB2549065B (en) | 2016-01-25 | 2016-02-26 | Frequency-dependent sidetones for improved automatic speech recognition,speaker recognition, and occlusion effect correction |
GB1606838.9A GB2546563B (en) | 2016-01-25 | 2016-04-19 | Dynamic frequency-dependent sidetone generation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/005,974 US9729957B1 (en) | 2016-01-25 | 2016-01-25 | Dynamic frequency-dependent sidetone generation |
Publications (2)
Publication Number | Publication Date |
---|---|
US20170214997A1 true US20170214997A1 (en) | 2017-07-27 |
US9729957B1 US9729957B1 (en) | 2017-08-08 |
Family
ID=55806994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/005,974 Active US9729957B1 (en) | 2016-01-25 | 2016-01-25 | Dynamic frequency-dependent sidetone generation |
Country Status (2)
Country | Link |
---|---|
US (1) | US9729957B1 (en) |
GB (2) | GB2549065B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2583543A (en) * | 2019-04-29 | 2020-11-04 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for biometric processes |
US10896682B1 (en) * | 2017-08-09 | 2021-01-19 | Apple Inc. | Speaker recognition based on an inside microphone of a headphone |
US20210390972A1 (en) * | 2020-06-11 | 2021-12-16 | Apple Inc. | Self-voice adaptation |
WO2022119752A1 (en) * | 2020-12-02 | 2022-06-09 | HearUnow, Inc. | Dynamic voice accentuation and reinforcement |
US11483664B2 (en) | 2019-04-29 | 2022-10-25 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes |
US11531738B2 (en) | 2019-04-29 | 2022-12-20 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes |
US11700473B2 (en) | 2019-04-29 | 2023-07-11 | Cirrus Logic, Inc. | Methods, apparatus and systems for authentication |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10062373B2 (en) * | 2016-11-03 | 2018-08-28 | Bragi GmbH | Selective audio isolation from body generated sound system and method |
US10110997B2 (en) * | 2017-02-17 | 2018-10-23 | 2236008 Ontario, Inc. | System and method for feedback control for in-car communications |
US11206003B2 (en) * | 2019-07-18 | 2021-12-21 | Samsung Electronics Co., Ltd. | Personalized headphone equalization |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2969862B2 (en) | 1989-10-04 | 1999-11-02 | 松下電器産業株式会社 | Voice recognition device |
JP2974423B2 (en) | 1991-02-13 | 1999-11-10 | シャープ株式会社 | Lombard Speech Recognition Method |
DE4322372A1 (en) | 1993-07-06 | 1995-01-12 | Sel Alcatel Ag | Method and device for speech recognition |
US5742928A (en) | 1994-10-28 | 1998-04-21 | Mitsubishi Denki Kabushiki Kaisha | Apparatus and method for speech recognition in the presence of unnatural speech effects |
US8019050B2 (en) | 2007-01-03 | 2011-09-13 | Motorola Solutions, Inc. | Method and apparatus for providing feedback of vocal quality to a user |
US8363820B1 (en) | 2007-05-17 | 2013-01-29 | Plantronics, Inc. | Headset with whisper mode feature |
JP4530051B2 (en) | 2008-01-17 | 2010-08-25 | 船井電機株式会社 | Audio signal transmitter / receiver |
US8290537B2 (en) | 2008-09-15 | 2012-10-16 | Apple Inc. | Sidetone adjustment based on headset or earphone type |
EP2362678B1 (en) | 2010-02-24 | 2017-07-26 | GN Audio A/S | A headset system with microphone for ambient sounds |
US9491306B2 (en) * | 2013-05-24 | 2016-11-08 | Broadcom Corporation | Signal processing control in an audio device |
US9369557B2 (en) | 2014-03-05 | 2016-06-14 | Cirrus Logic, Inc. | Frequency-dependent sidetone calibration |
-
2016
- 2016-01-25 US US15/005,974 patent/US9729957B1/en active Active
- 2016-02-26 GB GB1603392.0A patent/GB2549065B/en active Active
- 2016-04-19 GB GB1606838.9A patent/GB2546563B/en active Active
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10896682B1 (en) * | 2017-08-09 | 2021-01-19 | Apple Inc. | Speaker recognition based on an inside microphone of a headphone |
US11450097B2 (en) | 2019-04-29 | 2022-09-20 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes |
US10970575B2 (en) | 2019-04-29 | 2021-04-06 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes |
GB2583543B (en) * | 2019-04-29 | 2021-08-25 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for biometric processes |
GB2583543A (en) * | 2019-04-29 | 2020-11-04 | Cirrus Logic Int Semiconductor Ltd | Methods, apparatus and systems for biometric processes |
US11483664B2 (en) | 2019-04-29 | 2022-10-25 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes |
US11531738B2 (en) | 2019-04-29 | 2022-12-20 | Cirrus Logic, Inc. | Methods, apparatus and systems for biometric processes |
US11700473B2 (en) | 2019-04-29 | 2023-07-11 | Cirrus Logic, Inc. | Methods, apparatus and systems for authentication |
US11934506B2 (en) | 2019-04-29 | 2024-03-19 | Cirrus Logic Inc. | Methods, apparatus and systems for biometric processes |
US20210390972A1 (en) * | 2020-06-11 | 2021-12-16 | Apple Inc. | Self-voice adaptation |
US11715483B2 (en) * | 2020-06-11 | 2023-08-01 | Apple Inc. | Self-voice adaptation |
WO2022119752A1 (en) * | 2020-12-02 | 2022-06-09 | HearUnow, Inc. | Dynamic voice accentuation and reinforcement |
US11581004B2 (en) | 2020-12-02 | 2023-02-14 | HearUnow, Inc. | Dynamic voice accentuation and reinforcement |
Also Published As
Publication number | Publication date |
---|---|
GB2549065B (en) | 2019-07-03 |
GB2546563B (en) | 2020-01-08 |
GB2549065A (en) | 2017-10-11 |
GB201603392D0 (en) | 2016-04-13 |
GB2546563A (en) | 2017-07-26 |
US9729957B1 (en) | 2017-08-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9729957B1 (en) | Dynamic frequency-dependent sidetone generation | |
KR102266080B1 (en) | Frequency-dependent sidetone calibration | |
KR102153277B1 (en) | An integrated circuit for implementing at least a portion of a personal audio device, a method for canceling ambient audio sounds in the proximity of a transducer of the personal audio device, and the personal audio device | |
KR102196012B1 (en) | Systems and methods for enhancing performance of audio transducer based on detection of transducer status | |
JP5400166B2 (en) | Handset and method for reproducing stereo and monaural signals | |
EP2847760B1 (en) | Error-signal content controlled adaptation of secondary and leakage path models in noise-canceling personal audio devices | |
US11026041B2 (en) | Compensation of own voice occlusion | |
KR102303693B1 (en) | Frequency domain adaptive noise cancellation system | |
KR20160144461A (en) | Frequency-shaped noise-based adaptation of secondary path adaptive response in noise-canceling personal audio devices | |
US11922917B2 (en) | Audio system and signal processing method for an ear mountable playback device | |
CN112889297B (en) | Auricle proximity detection | |
US10720138B2 (en) | SDR-based adaptive noise cancellation (ANC) system | |
US10249283B2 (en) | Tone and howl suppression in an ANC system | |
US20230328462A1 (en) | Method, device, headphones and computer program for actively suppressing the occlusion effect during the playback of audio signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CIRRUS LOGIC INTERNATIONAL SEMICONDUCTOR LTD., UNI Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KALLER, ROY SCOTT;HENDRIX, JON;SHILTON, ANTHONY;AND OTHERS;SIGNING DATES FROM 20160203 TO 20160210;REEL/FRAME:037751/0868 |
|
AS | Assignment |
Owner name: CIRRUS LOGIC, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CIRRUS LOGIC INTERNATIONAL SEMICONDUCTOR LTD.;REEL/FRAME:042852/0961 Effective date: 20150407 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |