WO2022151156A1 - Method and system for headphone with anc - Google Patents

Method and system for headphone with anc Download PDF

Info

Publication number
WO2022151156A1
WO2022151156A1 PCT/CN2021/071768 CN2021071768W WO2022151156A1 WO 2022151156 A1 WO2022151156 A1 WO 2022151156A1 CN 2021071768 W CN2021071768 W CN 2021071768W WO 2022151156 A1 WO2022151156 A1 WO 2022151156A1
Authority
WO
WIPO (PCT)
Prior art keywords
vad flag
vad
flag
snr
headphone
Prior art date
Application number
PCT/CN2021/071768
Other languages
French (fr)
Inventor
Xiang DENG
Zhe Chen
Shaomin Peng
Songcun Chen
Zhuo Chen
Original Assignee
Harman International Industries, Incorporated
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harman International Industries, Incorporated filed Critical Harman International Industries, Incorporated
Priority to CN202180090307.3A priority Critical patent/CN116803100A/en
Priority to PCT/CN2021/071768 priority patent/WO2022151156A1/en
Publication of WO2022151156A1 publication Critical patent/WO2022151156A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1041Mechanical or electronic switches, or control elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2460/00Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
    • H04R2460/01Hearing devices using active noise cancellation

Definitions

  • the present disclosure relates to a method and system for a headphone with Active Noise Cancellation (ANC) , and specifically relates to a method and system for a Hybrid-VAD-based intelligent ambient sound controlling for headphone with ANC.
  • ANC Active Noise Cancellation
  • Headphone products the most popular types among which are in-ear, on-ear, over-the ear, are widely used.
  • the headphone with ANC becomes popular in the consumer market. If the headphone is equipped with an ANC system, it can significantly reduce the low frequency noise, which creates a quiet environment for a user but causes inconvenient issues at the same time, especially in some scenarios in which the user who wears the headphone/earphones/earbuds needs to notice the ambient sound. For example, in a conversation scenario, the user has to take off headphones before talking to others, as the ANC system cancels the human speech in ambient sound and reduces the speech intelligibility in conversation.
  • a method for a headphone with ANC may comprise at least one feedforward microphone and at least one feedback microphone or sensor.
  • the method may comprise: receiving a feedback signal from the feedback microphone or sensor of the headphone; determining, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generating a control signal based on the value of the determined FB VAD flag; and automatically adjusting a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
  • FB feedback
  • VAD voice activity detection
  • the method may further comprise receiving a feedforward signal from the feedforwad microphone of the headphone; determining, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determining, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
  • FF feedforward
  • VAD voice activity detection
  • a system for a headphone with ANC may comprise at least one feedforward microphone and at least one feedback microphone or sensor.
  • the system may comprise a processor.
  • the processor may be configured to receive a feedback signal from the feedback microphone or sensor of the headphone; determine, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generate a control signal based on the value of the determined FB VAD flag; and automatically adjust a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
  • FB feedback
  • VAD voice activity detection
  • the processor may be further configured to receive a feedforward signal from the feedforwad microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determining, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
  • FF feedforward
  • VAD voice activity detection
  • the system and method described in the present disclosure may automatically adjust the ANC module, function or operation of the headphone based on the estimation of the presence or absence of a user’s voice, which brings more convenient use experience to the user.
  • the VAD detection mechanism provided by the method and system of the disclosure can improve the judgment accuracy of user’s voice in a noisy environment.
  • the method and system of the present disclosure can make the headphone switch between different operation modes, such as between the ANC mode and the transparency mode, relatively smooth, and improve the user's hearing experience.
  • FIG. 1 illustrates a schematic diagram in accordance with one or more embodiments of the present disclosure.
  • FIG. 2 illustrates a schematic diagram in accordance with one or more embodiments of the present disclosure.
  • FIG. 3 illustrates a schematic diagram of VAD mechanism corresponding to FIG. 2 in accordance with one or more embodiments of the present disclosure.
  • FIG. 4 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
  • FIG. 5 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
  • FIG. 6 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
  • FIG. 7 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
  • FIG. 8 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
  • FIG. 9 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
  • FIG. 1 illustrates a schematic diagram of the general operation concept in accordance with one or more embodiments of the present disclosure.
  • the ambient sound including human speech and background noise is presented around a user.
  • the ANC module, function or operation in a headphone can invert the phase of the ambient sound signal, then use the phase-inverted signal to cancel the noise.
  • the term of “headphone” in the present disclosure may include in-ear, on-ear, over-the ear headphone and may be equivalently replaced by the word “earphone” , “earbud” or “headset” .
  • the headphone may be a wireless or wired headphone.
  • the present disclosure provides a method and a system which may adjust the ANC module, function or operation of the headphone based on the estimation of the presence and absence of a user’s voice, the user’s voice is generated by one who is wearing the headphone and is speaking.
  • FIG. 1 shows an operation scenario wherein a Voice Activity Detection (VAD) module, function or technique may be adopted to detect the user’s voice and the detection result may be used to adjust the ANC of the headphone.
  • VAD Voice Activity Detection
  • a conventional VAD system works in a quiet environment but struggles in a noisy environment where the SNR (signal-noise-ratio) drops.
  • the method and system provided by the present disclosure contains additional microphones and/or sensors (FIG.
  • the additional microphones and/or sensors may be arranged in the side of headphone facing and close to the ear canal.
  • the additional microphone is, but not limited to, a feedback microphone as part of the ANC system.
  • the additional sensor is, but not limited to, an accelerometer, a bone conduction sensor or other general vibration sensor.
  • a control signal may be generated to toggle the headphone to operate from the ANC mode to a transparency mode which completely or partially allows the ambient sound to reach the user ear. That means, the operation mode of the headphone may be transited from an ANC mode to a transparency mode.
  • a control signal may be generated to toggle the headphone to operate from the transparency to the ANC mode in which the ambient sound may be cancelled or reduced.
  • FIG. 2 illustrates a schematic diagram of a further general operation concept in accordance with another one or more embodiments of the present disclosure.
  • FIG. 2 differs from FIG. 1 in that the VAD detection is based on both the input of the additional microphones and/or sensors and an input from the at least one microphone (such as feedforward microphone) .
  • the arrow from the user to VAD may be understood as the input from the feedforward microphone to the VAD.
  • the details regarding the VAD detection mechanism thereof will be described below in reference to FIG. 3.
  • FIG. 3 illustrates a schematic diagram of VAD detection mechanism corresponding to FIG. 2 in accordance with one or more embodiments of the present disclosure.
  • the headphone may comprise a feedforward microphone (FF mic) and a feedback microphone (FB mic) .
  • FIG. 3 only shows one example for illustrate the principle of the VAD detection mechanism.
  • the sensors may include without limitation an accelerometer, a bone conduction sensor, other general vibration sensor or combination thereof.
  • the FF microphone is arranged in the side of the headphone toward the outside environment, and the FB microphone is arranged in the side of the headphone facing and close to the ear canal.
  • both the FF microphone and the FB microphone may capture the user voice respectively as a FF signal and a FB signal. Then, a FF VAD detection and a FB VAD detection may be separately performed based on the FF signal from the FF microphone and the FB signal from the FB microphone. A FF VAD flag and a FB VAD flag can be obtained as a result of the FF VAD detection and the FB VAD detection. The FF VAD flag and the FB VAD flag indicate the possibility that the user who wears the headphone is speaking, respectively based on the FF signal and the FB signal.
  • a combination VAD can be determined as a final detection result which indicates the possibility that the user wearing the headphone is speaking.
  • a control signal is generated to automatically adjust a transition between an ANC mode and a transparency mode.
  • the adjustment could be performed for the FF microphone and the FB microphone in together or separately.
  • the ANC module/function/operation (Hereinafter simply referred to as ANC) may be turned off for both the FF microphone and the FB microphone, i.e., a transition from the ANC mode to the transparency mode.
  • the ambient sound may be delivered to the user.
  • the ANC may be turned off only for the FF microphone, and the ANC for the FB may be maintained the state of turning on.
  • the ANC may be turned off for the FF microphone and the ANC may be turned off with a certain ratio for the FB microphone. In the contrary transition, i.e., in the transition from the transparency mode to the ANC mode, the adjustment could also be performed for the FF microphone and the FB microphone separately.
  • FIG. 4 illustrates a flowchart of a method for controlling the headphone with ANC in accordance with one or more embodiments of the present disclosure, which could be also understood in combination with FIG. 1.
  • a feedback (FB) signal is received from the FB microphone and/or sensor of the headphone.
  • the FB signal may include the sound captured by the FB microphone and/or the vibration captured by the sensor.
  • a FB VAD flag may be determined based on the received FB signal, for example, based on the SNR of the received FB signal.
  • the FB VAD flag may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the FB VAD flag is determined as 0, then the detection result indicates the absence of the user’s voice, i.e., the user who wears the headphone is not speaking. If the FB VAD flag is determined as 1, then the detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the FB VAD flag is determined as a value between 0 and 1, the detection result indicates the possibility of presence of the user’s voice.
  • a corresponding control signal is generated based on the value of the determined FB VAD flag.
  • the corresponding adjustment may be performed. For example, if the FB VAD flag indicates the presence of the user’s voice, then a transition from an ANC mode to a transparency mode is performed. The ANC may be completely or partially turned off so that the user can hear the ambient sound. After this adjustment, if the FB VAD flag becomes 0 which indicates the absence of the user’ voice, then a transition from the transparency mode to the ANC mode is performed, i.e., the ANC is turned on.
  • FIG. 5 illustrates a flowchart of a method for controlling the headphone with ANC in accordance with another one or more embodiments of the present disclosure, which could be also understood in combination with FIGs. 2-3.
  • a feedforward (FF) signal is received from the FF microphone of the headphone.
  • the FF signal may include the sound captured by the FF microphone.
  • a feedback (FB) signal is received from the FB microphone or sensor of the headphone.
  • the FB signal may include the sound captured by the FB microphone and/or the vibration captured by the sensor.
  • a FF VAD flag may be determined based on the received FF signal, for example, based on the SNR of the received FF signal.
  • the FF VAD flag indicates a detection result, which may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the FF VAD flag is determined as 0, then the detection result indicates the absence of the user’s voice, i.e., the user who wears the headphone is not speaking. If the FF VAD flag is determined as 1, then the detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the FF VAD flag is determined as a value between 0 and 1, then the detection result indicates that the possibility of presence of the user’s voice.
  • a FB VAD flag may be determined based on the received FB signal, for example, based on the SNR of the received FB signal.
  • the FB VAD flag may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the FB VAD flag is determined as 0, then the detection result indicates the absence of the user’s voice, i.e., the user who wears the headphone is not speaking. If the FB VAD flag is determined as 1, then the detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the FB VAD flag is determined as a value between 0 and 1, the detection result indicates the possibility of presence of the user’s voice.
  • the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of signal, the energy of the signal, the frequency response of the signal, and so on.
  • a combination VAD flag may be determined based on the FB VAD flag and the FF VAD flag.
  • the combination VAD flag represents the final VAD detection result.
  • the combination VAD flag may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the combination VAD flag is determined as 0, then the final detection result indicates the absence of the user’s voice, i.e., the user who wears the headphone is not speaking. If the combination VAD flag is determined as 1, then the final detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the combination VAD flag is determined as a value between 0 and 1, the final detection result indicates the possibility of presence of the user’s voice.
  • a corresponding control signal is generated based on the value of the determined combination VAD flag.
  • the corresponding adjustment may be performed. For example, if the combination VAD flag indicates the presence of the user’s voice, then a transition from an ANC mode to a transparency mode is performed. The ANC may be completely or partially turned off so that the user can hear the ambient sound. After this adjustment, if the combination VAD flag becomes 0 which indicates the absence of the user’ voice, then a transition from the transparency mode to the ANC mode is performed, i.e., the ANC is turned on.
  • FIG. 6 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to determine the FF VAD flag.
  • the SNR of the FF signal from the FF microphone is calculated.
  • the FF microphone may capture the user’s voice and the ambient sound, wherein the ambient sound may be treated as noise relative the user’ voice.
  • the calculated SNR of the FF signal may be a metric for determining whether the user who wears the headphone is speaking and/or for determining the possibility of the presence of the user’s voice.
  • a predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold TH H and a low threshold TH L . In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment.
  • the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold TH H , then the flow goes to S603, in which the FF VAD flag is set to 1. If not, the method goes to S604.
  • the SNR of the FF signal is compared to the low threshold TH L . If the SNR is less than or equal to the low threshold, then the method goes to S605, in which the FF VAD flag is set to 0. If not, the method goes to S606, in which the FF VAD flag is set to a value between 0 and 1. The closer the value of the flag is to 1, the higher the probability that the user is speaking, and the closer the value of the flag is to 0, the lower the possibility that the user is speaking.
  • FIG. 7 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to determine the FB VAD flag.
  • the SNR of the FB signal from the FB microphone is calculated.
  • the FB microphone may capture the user’s voice and some of the ambient sound, wherein the ambient sound may be treated as noise relative the user’ voice.
  • the calculated SNR of the FB signal may be a metric for determining whether the user who wears the headphone is speaking and/or for determining the possibility of the presence of the user’s voice.
  • a predetermined threshold interval for SNR of the FB signal may be set, which is defined by a high threshold TH H and a low threshold TH L . In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment.
  • the SNR of the FB signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold, then the method goes to S703, in which the FB VAD flag is set to 1. If not, the method goes to S704.
  • the SNR of the FF signal is compared to the low threshold. If the SNR is less than or equal to the low threshold, then the method goes to S705, in which the FF VAD flag is set to 0. If not, the method goes to S706, in which the FF VAD flag is set to a value between 0 and 1. The closer the value of the flag is to 1, the higher the probability that the user is speaking, and the closer the value of the flag is to 0, the lower the possibility that the user is speaking.
  • the threshold interval for SNR of the FF signal may be the same as the threshold interval for SNR of the FB signal. According to another embodiment, the threshold interval for SNR of the FF signal may be different the threshold interval for SNR of the FB signal. Considering that the FB signal is relatively less affected by noise in comparison to the FF signal, the low threshold and the high threshold for the FB signal can be set higher than the low threshold and the high threshold for the FB signal, respectively. The advantage of this is that it can further reduce the misjudgment caused by the user's non-voice sounds or face actions.
  • FIG. 8 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to determine the combination VAD flag based on the FF VAD flag and the FB VAD flag.
  • the FF VAD flag and the FB VAD flag may be obtained as described in reference to FIG. 6 and FIG. 7.
  • the method determines whether both the values of the FF VAD flag and the FB VAD flag are 0. If so, the method goes to S803.
  • the combination VAD flag is set to 0. If not, the method goes to S804.
  • the method determines whether both the values of the FF VAD flag and the FB VAD flag are 1. If so, the method goes to S805.
  • the combination VAD flag is set to 1. If not, the method goes to S806.
  • the combination VAD flag is calculated using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, such as using the following equation:
  • the alpha is a weight parameter related to the SNR of the FB signal and is a value between 0 and 1.
  • the alpha value increases as the SNR of the FB signal increases.
  • the alpha value may be dependent on the noise ratio that represents the noise level of the environment. The alpha value increases as the noise ratio increases. For example, if the environment is more noisy, the VAD detection result based on the FB signal is more reliable. Then, the alpha value should be selected to a larger value, i.e., a value closer to 1.
  • the alpha value may further be dependent on the combination of the SNR of the FB signal and the noise ratio.
  • FIG. 9 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to adjust the ANC based on the combination VAD flag.
  • the method determines whether the combination VAD flag is equal to 1. If it is equal to 1, then a corresponding control signal is generated to turn off the ANC, at S903.
  • the adjustment of turning off the ANC may be performed for both the FF microphone and the FB microphone. In another example, the adjustment of turning off the ANC may be performed only for the FF microphone, and the ANC state for the FB microphone is maintained.
  • the current state of the ANC may be determined first. If the current state of the ANC is OFF, then the control signal can be ignored since the ANC is already in an OFF state.
  • a delay time dt on-off may be set for the transition from the ANC mode (ANC is on) to the transparency mode (ANC is off) according to practice requirement.
  • the delay time dt on-off may be set to a very small value, for example, of an order of microseconds or milliseconds.
  • the method determines whether the combination VAD flag is equal to 0. If the combination VAD flag is equal to 0, a corresponding control signal is generated to turn on the ANC, at S905.
  • the adjustment of turning on the ANC may be performed for both the FF microphone and the FB microphone.
  • the current state of the ANC may be determined first. If the current state of the ANC is ON, then the control signal can be ignored since the ANC is already in an ON state. If the current state of the ANC is OFF, then the transition from the transparency mode to the ANC mode may be started.
  • a delay time dt off-on may be set for the transition from the transparency mode (ANC is off) to the ANC mode (ANC is on) . That means, the transition is started after a while, for example after the period of the delay time, dt off-on .
  • the use of the delay time dt off-on can avoid the false transition causing by a short pause in a conversation.
  • the delay time dt off-on may be greater than the abovesaid delay time dt on-off .
  • the delay time dt off-on may be of an order of seconds.
  • a corresponding control signal is generated to turn on or off ANC with a certain ratio according to the value of the combination VAD flag.
  • the control signal may include an adjusting factor which indicates a degree/level of transition, such as percentage of the transition.
  • the adjusting factor may be a value between 0 and 1 and may vary with the value of the combination VAD flag. Accordingly, the ANC is turned on or off with a certain ratio according to the adjusting factor included in the control signal.
  • a system for a headphone with an active noise cancellation (ANC) may be provided according to one or more embodiments, wherein the headphone comprises at least one feedforward microphone and at least one feedback microphone or sensor.
  • the system may comprise a processor, which may be configured to receive a feedback signal from the feedback microphone or sensor of the headphone; determine, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generate a control signal based on the value of the determined FB VAD flag; and automatically adjust a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
  • FB feedback
  • VAD voice activity detection
  • the processor is further configured to receive a feedforward signal from the feedforwad microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
  • FF feedforward
  • VAD voice activity detection
  • the processor may be further configured to calculate a first SNR of the received feedforward signal; obtain the FF VAD flag for the received feedforward signal based on the first SNR; calculate a second SNR of the received feedback signal; obtain a FB VAD flag for the received feedback signal based on the second SNR; and determine a combination VAD flag based on the FF VAD flag and the FB VAD flag.
  • the processor may be further configured to: compare the first SNR to a first high threshold and a first low threshold respectively; set the FF VAD flag to 1, if the first SNR is greater than or equal to the first high threshold; set the FF VAD flag to 0, if the first SNR is less than or equal to the first low threshold; and set the FF VAD flag to a value between 0 and 1, if the first SNR is greater than the first low threshold and less than the first high threshold.
  • the processor may be further configured to: compare the second SNR to a second high threshold and a second low threshold; set the FB VAD flag to 1, if the second SNR is greater than or equal to the second high threshold; set the FB VAD flag to 0, if the second SNR is less than or equal to the second low threshold; and set the FB VAD flag to a value between 0 and 1, if the second SNR is greater than the second low threshold and less than the second high threshold.
  • the processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.
  • the processor may be any technically feasible hardware unit configured to process data and execute software applications, including without limitation, a central processing unit (CPU) , a microcontroller unit (MCU) , an application specific integrated circuit (ASIC) , a digital signal processor (DSP) chip and so forth.
  • the processor may be integrated in the headphone.
  • the disclosure further includes a non-transitory computer-readable medium storing program instructions that, when executed by a processor, cause the processor to perform the steps of: receiving a feedback signal from the feedback microphone or sensor of the headphone; determining, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generating a control signal based on the value of the determined FB VAD flag; and automatically adjusting a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
  • FB feedback
  • VAD voice activity detection
  • the system and method described in the present disclosure may automatically adjust the ANC module/function of the headphone based on the estimation of the presence and absence of a user’s voice, which brings more convenient use experience to the user.
  • the VAD detection mechanism provided by the method and system of the disclosure can improve the accuracy of user voice judgment in a noisy environment.
  • the method and system of the present disclosure can make the headphone switch between different operation modes relatively smooth, and improve the user's hearing experience.
  • aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit, ” “module” or “system” .
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Headphones And Earphones (AREA)

Abstract

The disclosure provides a method and a system for a headphone with an active noise cancellation (ANC). The headphone may comprises at least one feedforward microphone and at least one feedback microphone or sensor. The method comprises receiving a feedback signal from the feedback microphone or sensor of the headphone; determining, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generating a control signal based on the value of the determined FB VAD flag; and automatically adjusting a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.

Description

METHOD AND SYSTEM FOR HEADPHONE WITH ANC
TECHINICAL FIELD
The present disclosure relates to a method and system for a headphone with Active Noise Cancellation (ANC) , and specifically relates to a method and system for a Hybrid-VAD-based intelligent ambient sound controlling for headphone with ANC.
BACKGROUND
Headphone products, the most popular types among which are in-ear, on-ear, over-the ear, are widely used. As concerns for the impact of noise on personal health arises, the headphone with ANC becomes popular in the consumer market. If the headphone is equipped with an ANC system, it can significantly reduce the low frequency noise, which creates a quiet environment for a user but causes inconvenient issues at the same time, especially in some scenarios in which the user who wears the headphone/earphones/earbuds needs to notice the ambient sound. For example, in a conversation scenario, the user has to take off headphones before talking to others, as the ANC system cancels the human speech in ambient sound and reduces the speech intelligibility in conversation.
Therefore, it is necessary to provide improved techniques to enable users to obtain good auditory effect as well as convenient experience.
SUMMARY
According to one or more embodiments of the disclosure, a method for a headphone with ANC is provided. The headphone may comprise at least one feedforward microphone and at least one feedback microphone or sensor. The method may comprise: receiving a feedback signal from the feedback microphone or sensor of the headphone; determining, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generating a control signal based on the value of the determined FB VAD flag; and automatically adjusting a transition of the headphone between an ANC mode and a transparency mode, based on the control  signal. According to one or more embodiments, the method may further comprise receiving a feedforward signal from the feedforwad microphone of the headphone; determining, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determining, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
According to one or more embodiments of the disclosure, a system for a headphone with ANC is provided. The headphone may comprise at least one feedforward microphone and at least one feedback microphone or sensor. The system may comprise a processor. The processor may be configured to receive a feedback signal from the feedback microphone or sensor of the headphone; determine, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generate a control signal based on the value of the determined FB VAD flag; and automatically adjust a transition of the headphone between an ANC mode and a transparency mode, based on the control signal. According to one or more embodiments, the processor may be further configured to receive a feedforward signal from the feedforwad microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determining, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
The system and method described in the present disclosure may automatically adjust the ANC module, function or operation of the headphone based on the estimation of the presence or absence of a user’s voice, which brings more convenient use experience to the user. Moreover, the VAD detection mechanism provided by the method and system of the disclosure can improve the judgment accuracy of user’s voice in a noisy environment. In addition, the method and system of the present disclosure can make the headphone switch between different operation modes, such as between the ANC mode and the transparency mode, relatively smooth, and improve the user's hearing experience.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a schematic diagram in accordance with one or more embodiments of the present disclosure.
FIG. 2 illustrates a schematic diagram in accordance with one or more embodiments of the present disclosure.
FIG. 3 illustrates a schematic diagram of VAD mechanism corresponding to FIG. 2 in accordance with one or more embodiments of the present disclosure.
FIG. 4 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
FIG. 5 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
FIG. 6 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
FIG. 7 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
FIG. 8 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
FIG. 9 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure.
To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures. It is contemplated that elements disclosed in one embodiment may be beneficially utilized on other embodiments without specific recitation. The drawings referred to here should not be understood as being drawn to scale unless specifically noted. Also, the drawings are often simplified and details or components omitted for clarity of presentation and explanation. The drawings and discussion serve to explain principles discussed below, where like designations denote like elements.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Examples will be provided below for illustration. The descriptions of the various examples will be presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.
FIG. 1 illustrates a schematic diagram of the general operation concept in accordance with one or more embodiments of the present disclosure. Usually, in a normal environment, the ambient sound including human speech and background noise is presented around a user. The ANC module, function or operation in a headphone can invert the phase of the ambient sound signal, then use the phase-inverted signal to cancel the noise. The term of “headphone” in the present disclosure may include in-ear, on-ear, over-the ear headphone and may be equivalently replaced by the word “earphone” , “earbud” or “headset” . The headphone may be a wireless or wired headphone.
In a general concept, the present disclosure provides a method and a system which may adjust the ANC module, function or operation of the headphone based on the estimation of the presence and absence of a user’s voice, the user’s voice is generated by one who is wearing the headphone and is speaking. For the purpose of simple explanation, FIG. 1 shows an operation scenario wherein a Voice Activity Detection (VAD) module, function or technique may be adopted to detect the user’s voice and the detection result may be used to adjust the ANC of the headphone. A conventional VAD system works in a quiet environment but struggles in a noisy environment where the SNR (signal-noise-ratio) drops. The method and system provided by the present disclosure contains additional microphones and/or sensors (FIG. 1 only shows sensor for simplicity) in the headphone to capture the user voice using extra secondary transfer path (s) . The additional microphones and/or sensors may be arranged in the side of headphone facing and close to the ear canal. The additional microphone is, but not limited to, a feedback microphone as part of the ANC system. The additional sensor is, but not limited to, an accelerometer, a bone conduction sensor or other general vibration sensor. Using the method and system of the disclosure, the  SNR of signal may be increased since the ear canal may couple to extra secondary transfer path of the user voice besides acoustic path, meanwhile the noise in the ear canal has been attenuated by ANC. As the SNR increases, the accuracy of detection results of the VAD system can be improved even in a noisy environment.
Then, for example, if the detection result indicates that the user voice is active, then a control signal may be generated to toggle the headphone to operate from the ANC mode to a transparency mode which completely or partially allows the ambient sound to reach the user ear. That means, the operation mode of the headphone may be transited from an ANC mode to a transparency mode. Furthermore, for example, if the user stops speaking for a while, the detection result indicates the user voice is not active, then a control signal may be generated to toggle the headphone to operate from the transparency to the ANC mode in which the ambient sound may be cancelled or reduced.
FIG. 2 illustrates a schematic diagram of a further general operation concept in accordance with another one or more embodiments of the present disclosure. As can be noted, FIG. 2 differs from FIG. 1 in that the VAD detection is based on both the input of the additional microphones and/or sensors and an input from the at least one microphone (such as feedforward microphone) . In FIG. 2, the arrow from the user to VAD may be understood as the input from the feedforward microphone to the VAD. The details regarding the VAD detection mechanism thereof will be described below in reference to FIG. 3.
FIG. 3 illustrates a schematic diagram of VAD detection mechanism corresponding to FIG. 2 in accordance with one or more embodiments of the present disclosure. As shown in FIG. 3, the headphone may comprise a feedforward microphone (FF mic) and a feedback microphone (FB mic) . FIG. 3 only shows one example for illustrate the principle of the VAD detection mechanism. Those of ordinary skill in the art would understand that more than one FF microphones and more than one FB microphones could be arranged as practice, and the FB microphone (s) could be replaced by or combined with sensors. The sensors may include without limitation an accelerometer, a bone conduction sensor, other general vibration sensor or combination thereof. The FF microphone is arranged in the side of the headphone  toward the outside environment, and the FB microphone is arranged in the side of the headphone facing and close to the ear canal.
When the user is speaking, both the FF microphone and the FB microphone may capture the user voice respectively as a FF signal and a FB signal. Then, a FF VAD detection and a FB VAD detection may be separately performed based on the FF signal from the FF microphone and the FB signal from the FB microphone. A FF VAD flag and a FB VAD flag can be obtained as a result of the FF VAD detection and the FB VAD detection. The FF VAD flag and the FB VAD flag indicate the possibility that the user who wears the headphone is speaking, respectively based on the FF signal and the FB signal. Based on the FF VAD flag and the FB VAD flag, a combination VAD can be determined as a final detection result which indicates the possibility that the user wearing the headphone is speaking. According to the final detection result indicated by the combination VAD flag, a control signal is generated to automatically adjust a transition between an ANC mode and a transparency mode. The adjustment could be performed for the FF microphone and the FB microphone in together or separately. For example, according to the control signal, the ANC module/function/operation (Hereinafter simply referred to as ANC) may be turned off for both the FF microphone and the FB microphone, i.e., a transition from the ANC mode to the transparency mode. Thus, the ambient sound may be delivered to the user. For example, according to another control signal, the ANC may be turned off only for the FF microphone, and the ANC for the FB may be maintained the state of turning on. For example, according to another control signal, the ANC may be turned off for the FF microphone and the ANC may be turned off with a certain ratio for the FB microphone. In the contrary transition, i.e., in the transition from the transparency mode to the ANC mode, the adjustment could also be performed for the FF microphone and the FB microphone separately.
FIG. 4 illustrates a flowchart of a method for controlling the headphone with ANC in accordance with one or more embodiments of the present disclosure, which could be also understood in combination with FIG. 1.
As the method shown in FIG. 4, at S401, a feedback (FB) signal is received from the FB microphone and/or sensor of the headphone. For example, the FB signal  may include the sound captured by the FB microphone and/or the vibration captured by the sensor.
At S402, a FB VAD flag may be determined based on the received FB signal, for example, based on the SNR of the received FB signal. The FB VAD flag may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the FB VAD flag is determined as 0, then the detection result indicates the absence of the user’s voice, i.e., the user who wears the headphone is not speaking. If the FB VAD flag is determined as 1, then the detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the FB VAD flag is determined as a value between 0 and 1, the detection result indicates the possibility of presence of the user’s voice.
At S403, a corresponding control signal is generated based on the value of the determined FB VAD flag. At S404, according to the control signal, the corresponding adjustment may be performed. For example, if the FB VAD flag indicates the presence of the user’s voice, then a transition from an ANC mode to a transparency mode is performed. The ANC may be completely or partially turned off so that the user can hear the ambient sound. After this adjustment, if the FB VAD flag becomes 0 which indicates the absence of the user’ voice, then a transition from the transparency mode to the ANC mode is performed, i.e., the ANC is turned on.
FIG. 5 illustrates a flowchart of a method for controlling the headphone with ANC in accordance with another one or more embodiments of the present disclosure, which could be also understood in combination with FIGs. 2-3.
As the method shown in FIG. 5, at S501, a feedforward (FF) signal is received from the FF microphone of the headphone. For example, the FF signal may include the sound captured by the FF microphone. At S502, a feedback (FB) signal is received from the FB microphone or sensor of the headphone. For example, the FB signal may include the sound captured by the FB microphone and/or the vibration captured by the sensor.
At S503, a FF VAD flag may be determined based on the received FF signal, for example, based on the SNR of the received FF signal. The FF VAD flag indicates a detection result, which may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the FF VAD flag is determined as 0, then the detection result indicates the absence of the user’s voice, i.e., the user who wears the headphone is not speaking. If the FF VAD flag is determined as 1, then the detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the FF VAD flag is determined as a value between 0 and 1, then the detection result indicates that the possibility of presence of the user’s voice.
At S504, a FB VAD flag may be determined based on the received FB signal, for example, based on the SNR of the received FB signal. The FB VAD flag may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the FB VAD flag is determined as 0, then the detection result indicates the absence of the user’s voice, i.e., the user who wears the headphone is not speaking. If the FB VAD flag is determined as 1, then the detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the FB VAD flag is determined as a value between 0 and 1, the detection result indicates the possibility of presence of the user’s voice.
It is can be understood the SNR is one example for a metric of characterizing the microphone signal without specific limitation, any other metrics that can characterize the microphone signal could be used in the method of system disclosed herein, such as the magnitude of signal, the energy of the signal, the frequency response of the signal, and so on.
At S505, a combination VAD flag may be determined based on the FB VAD flag and the FF VAD flag. The combination VAD flag represents the final VAD detection result. Also, the combination VAD flag may be a variable with a value greater than or equal to 0 and less than or equal to 1, and represents the possibility that the user who wears the headphone is speaking. If the combination VAD flag is determined as 0, then the final detection result indicates the absence of the user’s voice, i.e., the user  who wears the headphone is not speaking. If the combination VAD flag is determined as 1, then the final detection result indicates the presence of the user’s voice, i.e., the user who wears the headphone is speaking. If the combination VAD flag is determined as a value between 0 and 1, the final detection result indicates the possibility of presence of the user’s voice.
At S506, a corresponding control signal is generated based on the value of the determined combination VAD flag. At S507, according to the control signal, the corresponding adjustment may be performed. For example, if the combination VAD flag indicates the presence of the user’s voice, then a transition from an ANC mode to a transparency mode is performed. The ANC may be completely or partially turned off so that the user can hear the ambient sound. After this adjustment, if the combination VAD flag becomes 0 which indicates the absence of the user’ voice, then a transition from the transparency mode to the ANC mode is performed, i.e., the ANC is turned on.
FIG. 6 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to determine the FF VAD flag. At S601, the SNR of the FF signal from the FF microphone is calculated. As described above, the FF microphone may capture the user’s voice and the ambient sound, wherein the ambient sound may be treated as noise relative the user’ voice. Thus, the calculated SNR of the FF signal may be a metric for determining whether the user who wears the headphone is speaking and/or for determining the possibility of the presence of the user’s voice.
A predetermined threshold interval for SNR of the FF signal may be set, which is defined by a high threshold TH H and a low threshold TH L. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S602, the SNR of the FF signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold TH H, then the flow goes to S603, in which the FF VAD flag is set to 1. If not, the method goes to S604.
At S604, the SNR of the FF signal is compared to the low threshold TH L. If the SNR is less than or equal to the low threshold, then the method goes to S605, in which the FF VAD flag is set to 0. If not, the method goes to S606, in which the FF VAD flag  is set to a value between 0 and 1. The closer the value of the flag is to 1, the higher the probability that the user is speaking, and the closer the value of the flag is to 0, the lower the possibility that the user is speaking.
FIG. 7 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to determine the FB VAD flag. At S701, the SNR of the FB signal from the FB microphone is calculated. As described above, the FB microphone may capture the user’s voice and some of the ambient sound, wherein the ambient sound may be treated as noise relative the user’ voice. Thus, the calculated SNR of the FB signal may be a metric for determining whether the user who wears the headphone is speaking and/or for determining the possibility of the presence of the user’s voice.
A predetermined threshold interval for SNR of the FB signal may be set, which is defined by a high threshold TH H and a low threshold TH L. In comparison with a single threshold, using a threshold interval may improve a fault tolerance rate and reduce the misjudgment. At S702, the SNR of the FB signal is compared to the high threshold. If the SNR is greater than or equal to the high threshold, then the method goes to S703, in which the FB VAD flag is set to 1. If not, the method goes to S704.
At S704, the SNR of the FF signal is compared to the low threshold. If the SNR is less than or equal to the low threshold, then the method goes to S705, in which the FF VAD flag is set to 0. If not, the method goes to S706, in which the FF VAD flag is set to a value between 0 and 1. The closer the value of the flag is to 1, the higher the probability that the user is speaking, and the closer the value of the flag is to 0, the lower the possibility that the user is speaking.
According to one embodiment, the threshold interval for SNR of the FF signal may be the same as the threshold interval for SNR of the FB signal. According to another embodiment, the threshold interval for SNR of the FF signal may be different the threshold interval for SNR of the FB signal. Considering that the FB signal is relatively less affected by noise in comparison to the FF signal, the low threshold and the high threshold for the FB signal can be set higher than the low threshold and the  high threshold for the FB signal, respectively. The advantage of this is that it can further reduce the misjudgment caused by the user's non-voice sounds or face actions.
FIG. 8 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to determine the combination VAD flag based on the FF VAD flag and the FB VAD flag.
At S801, the FF VAD flag and the FB VAD flag may be obtained as described in reference to FIG. 6 and FIG. 7. At S802, the method determines whether both the values of the FF VAD flag and the FB VAD flag are 0. If so, the method goes to S803. At S803, the combination VAD flag is set to 0. If not, the method goes to S804. At S804, the method determines whether both the values of the FF VAD flag and the FB VAD flag are 1. If so, the method goes to S805. At S805, the combination VAD flag is set to 1. If not, the method goes to S806.
At S806, the combination VAD flag is calculated using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, such as using the following equation:
combination VAD flag = alpha *FB VAD flag + (1-alpha) *FF VAD flag
wherein the alpha is a weight parameter related to the SNR of the FB signal and is a value between 0 and 1. For example, the alpha value increases as the SNR of the FB signal increases. Alternatively, the alpha value may be dependent on the noise ratio that represents the noise level of the environment. The alpha value increases as the noise ratio increases. For example, if the environment is more noisy, the VAD detection result based on the FB signal is more reliable. Then, the alpha value should be selected to a larger value, i.e., a value closer to 1. Alternatively, the alpha value may further be dependent on the combination of the SNR of the FB signal and the noise ratio.
Upon the combination VAD flag is determined, a corresponding control signal is generated to adjust the ANC of the headphone. FIG. 9 illustrates a flowchart of a method in accordance with one or more embodiments of the present disclosure, which illustrates how to adjust the ANC based on the combination VAD flag.
If the combination VAD flag is determined as shown at S901, then the method goes to S902. At S902, the method determines whether the combination VAD flag is equal to 1. If it is equal to 1, then a corresponding control signal is generated to turn off the ANC, at S903. In one example, the adjustment of turning off the ANC may be performed for both the FF microphone and the FB microphone. In another example, the adjustment of turning off the ANC may be performed only for the FF microphone, and the ANC state for the FB microphone is maintained. According to one or more embodiments, before the adjustment of turning off the ANC, the current state of the ANC may be determined first. If the current state of the ANC is OFF, then the control signal can be ignored since the ANC is already in an OFF state. If the current state of the ANC is ON, then the transition from the ANC mode to the transparency mode may be started immediately. In one example, a delay time dt on-off may be set for the transition from the ANC mode (ANC is on) to the transparency mode (ANC is off) according to practice requirement. Usually, the delay time dt on-off may be set to a very small value, for example, of an order of microseconds or milliseconds.
If the combination VAD flag is not equal to 1, then the method goes to S904. At S904, the method determines whether the combination VAD flag is equal to 0. If the combination VAD flag is equal to 0, a corresponding control signal is generated to turn on the ANC, at S905. In one example, the adjustment of turning on the ANC may be performed for both the FF microphone and the FB microphone. According to one or more embodiments, before the adjustment of turning on the ANC, the current state of the ANC may be determined first. If the current state of the ANC is ON, then the control signal can be ignored since the ANC is already in an ON state. If the current state of the ANC is OFF, then the transition from the transparency mode to the ANC mode may be started. In one example, a delay time dt off-on may be set for the transition from the transparency mode (ANC is off) to the ANC mode (ANC is on) . That means, the transition is started after a while, for example after the period of the delay time, dt off-on. The use of the delay time dt off-on can avoid the false transition causing by a short pause in a conversation. The delay time dt off-on may be greater than the abovesaid delay time dt on-off. For example, the delay time dt off-on may be of an order of seconds.
If the combination VAD flag is not equal to 0, the method goes to S906. At S906, a corresponding control signal is generated to turn on or off ANC with a certain ratio according to the value of the combination VAD flag. For example, the control signal may include an adjusting factor which indicates a degree/level of transition, such as percentage of the transition. The adjusting factor may be a value between 0 and 1 and may vary with the value of the combination VAD flag. Accordingly, the ANC is turned on or off with a certain ratio according to the adjusting factor included in the control signal.
A system for a headphone with an active noise cancellation (ANC) may be provided according to one or more embodiments, wherein the headphone comprises at least one feedforward microphone and at least one feedback microphone or sensor. The system may comprise a processor, which may be configured to receive a feedback signal from the feedback microphone or sensor of the headphone; determine, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generate a control signal based on the value of the determined FB VAD flag; and automatically adjust a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
Further, the processor is further configured to receive a feedforward signal from the feedforwad microphone of the headphone; determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
Further, the processor may be further configured to calculate a first SNR of the received feedforward signal; obtain the FF VAD flag for the received feedforward signal based on the first SNR; calculate a second SNR of the received feedback signal; obtain a FB VAD flag for the received feedback signal based on the second SNR; and determine a combination VAD flag based on the FF VAD flag and the FB VAD flag.
Further, the processor may be further configured to: compare the first SNR to a first high threshold and a first low threshold respectively; set the FF VAD flag to 1, if the first SNR is greater than or equal to the first high threshold; set the FF VAD flag to 0, if the first SNR is less than or equal to the first low threshold; and set the FF VAD  flag to a value between 0 and 1, if the first SNR is greater than the first low threshold and less than the first high threshold.
The processor may be further configured to: compare the second SNR to a second high threshold and a second low threshold; set the FB VAD flag to 1, if the second SNR is greater than or equal to the second high threshold; set the FB VAD flag to 0, if the second SNR is less than or equal to the second low threshold; and set the FB VAD flag to a value between 0 and 1, if the second SNR is greater than the second low threshold and less than the second high threshold.
The processor may be further configured to: set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0; set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.
The processor may be any technically feasible hardware unit configured to process data and execute software applications, including without limitation, a central processing unit (CPU) , a microcontroller unit (MCU) , an application specific integrated circuit (ASIC) , a digital signal processor (DSP) chip and so forth. The processor may be integrated in the headphone.
The disclosure further includes a non-transitory computer-readable medium storing program instructions that, when executed by a processor, cause the processor to perform the steps of: receiving a feedback signal from the feedback microphone or sensor of the headphone; determining, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal; generating a control signal based on the value of the determined FB VAD flag; and automatically adjusting a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
The system and method described in the present disclosure may automatically adjust the ANC module/function of the headphone based on the estimation of the presence and absence of a user’s voice, which brings more convenient use experience  to the user. Moreover, the VAD detection mechanism provided by the method and system of the disclosure can improve the accuracy of user voice judgment in a noisy environment. In addition, the method and system of the present disclosure can make the headphone switch between different operation modes relatively smooth, and improve the user's hearing experience.
The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
In the preceding, reference sign is made to embodiments presented in this disclosure. However, the scope of the present disclosure is not limited to specific described embodiments. Instead, any combination of the preceding features and elements, whether related to different embodiments or not, is contemplated to implement and practice contemplated embodiments. Furthermore, although embodiments disclosed herein may achieve advantages over other possible solutions or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the scope of the present disclosure. Thus, the preceding aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim (s) .
Aspects of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit, ” “module” or “system” .
Any combination of one or more computer readable medium (s) may be utilized. The computer readable medium may be a computer readable signal medium or a  computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, enable the implementation of the functions/acts specified in the flowchart and/or block diagram block or blocks. Such processors may be, without limitation, general purpose processors, special-purpose processors, application-specific processors, or field-programmable processors.
While the foregoing is directed to embodiments of the present disclosure, other and further embodiments of the disclosure may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (20)

  1. A method for a headphone with an active noise cancellation (ANC) , the headphone comprising at least one feedforward microphone and at least one feedback microphone or sensor, the method comprising:
    receiving a feedback signal from the feedback microphone or sensor of the headphone;
    determining, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal;
    generating a control signal based on the value of the determined FB VAD flag; and
    automatically adjusting a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
  2. The method according to claim 1, further comprising:
    receiving a feedforward signal from the feedforwad microphone of the headphone;
    determining, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and
    determining, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
  3. The method according to claim 1 or claim 2, further comprising:
    calculating a first SNR of the received feedforward signal;
    obtaining the FF VAD flag for the received feedforward signal based on the first SNR;
    calculating a second SNR of the received feedback signal;
    obtaining a FB VAD flag for the received feedback signal based on the second SNR; and
    determining a combination VAD flag based on the FF VAD flag and the FB VAD flag.
  4. The method according to any one of claims 1-3, wherein
    each of the FF VAD flag, the FB VAD flag and the combination VAD flag is a value greater than or equal to 0 and less than or equal to 1, and
    each of the FF VAD flag, the FB VAD flag and the combination VAD flag represents the possibility that the user wearing the headphone is speaking.
  5. The method according to claim 3 or 4, the obtaining the FF VAD flag based on the first SNR further comprising:
    comparing the first SNR to a first high threshold and a first low threshold respectively;
    setting the FF VAD flag to 1, if the first SNR is greater than or equal to the first high threshold;
    setting the FF VAD flag to 0, if the first SNR is less than or equal to the first low threshold; and
    setting the FF VAD flag to a value between 0 and 1, if the first SNR is greater than the first low threshold and less than the first high threshold.
  6. The method according to any one of claims 3-5, the obtaining the FB VAD flag based on the second SNR further comprising:
    comparing the second SNR to a second high threshold and a second low threshold;
    setting the FB VAD flag to 1, if the second SNR is greater than or equal to the second high threshold;
    setting the FB VAD flag to 0, if the second SNR is less than or equal to the second low threshold; and
    setting the FB VAD flag to a value between 0 and 1if the second SNR is greater than the second low threshold and less than the second high threshold.
  7. The method according to any one of claims 1-6, the determining a combination VAD flag based on the FF VAD flag and the FB VAD flag further comprising:
    setting the combination VAD flag to 0, if both the values of the FF VAD flag and the FB VAD flag are 0;
    setting the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and
    calculating the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.
  8. The method according to claim 7, wherein the weight parameter is dependent on the second SNR of the feedback signal.
  9. The method according to claim 7, wherein the weight parameter is dependent on a noise ratio, which is calculated using the feedforward signal and represents a noise level of the environment where the user wearing the headphone is located.
  10. The method according to any one of claims 1-9, wherein the automatically adjusting the transition of the headphone between the ANC mode and a transparency mode comprising: performing the same ANC adjustment or different ANC adjustments for the feedforward signal and the feedback signal; wherein the ANC of the headphone is turned on in the ANC mode, and the ANC of the headphone is turned on in the transparency mode.
  11. The method according to any one of claims 1-10, further comprising:
    determining, a corresponding adjusting factor based on the combination VAD flag; and
    automatically adjusting the transition of the headphone between the ANC mode and the transparency mode, using the adjusting factor.
  12. The method according to claim 11, wherein the adjusting factor represents the level of the transition from the ANC mode to the transparency mode or from the transparency mode to the ANC mode.
  13. The method according to claim 11, further comprising adjusting an intensity of the ambient sound passing through the headphone in the transparency mode, based on the adjusting factor.
  14. The method according to any one of claims 1-13, further comprising:
    predetermining a first time threshold for the transition from the ANC mode to the transparency mode, the first time threshold represents a start time for the transition from the ANC mode to the transparency mode; and
    predetermining a second time threshold for the transition from the transparency mode to the ANC mode, the second time threshold represents a start time for the transition from the transparency mode to the ANC mode;
    wherein the first time threshold is less than the second time threshold.
  15. A system for a headphone with an active noise cancellation (ANC) , the headphone comprising at least one feedforward microphone and at least one feedback microphone or sensor, wherein the system comprises a processor configured to:
    receive a feedback signal from the feedback microphone or sensor of the headphone;
    determine, a feedback (FB) voice activity detection (VAD) flag, based on the received feedback signal;
    generate a control signal based on the value of the determined FB VAD flag; and
    automatically adjust a transition of the headphone between an ANC mode and a transparency mode, based on the control signal.
  16. The system according to claim 15, wherein the processor is further configured to:
    receive a feedforward signal from the feedforwad microphone of the headphone;
    determine, a feedforward (FF) voice activity detection (VAD) flag, based on the received feedback signal; and
    determine, a combination VAD flag, based on the FF VAD flag and the FB VAD flag.
  17. The system according to claim 15 or claim 16, wherein the processor is further configured to:
    calculate a first SNR of the received feedforward signal;
    obtain the FF VAD flag for the received feedforward signal based on the first SNR;
    calculate a second SNR of the received feedback signal;
    obtain a FB VAD flag for the received feedback signal based on the second SNR;
    determine a combination VAD flag based on the FF VAD flag and the FB VAD flag.
  18. The system according to any one of claims 15-17, wherein the processor is further configured to:
    compare the first SNR to a first high threshold and a first low threshold respectively;
    set the FF VAD flag to 1, if the first SNR is greater than or equal to the first high threshold;
    set the FF VAD flag to 0, if the first SNR is less than or equal to the first low threshold; and
    set the FF VAD flag to a value between 0 and 1, if the first SNR is greater than the first low threshold and less than the first high threshold.
  19. The system according to any one of claims 15-17, wherein the processor is further configured to:
    compare the second SNR to a second high threshold and a second low threshold;
    set the FB VAD flag to 1, if the second SNR is greater than or equal to the second high threshold;
    set the FB VAD flag to 0, if the second SNR is less than or equal to the second low threshold; and
    set the FB VAD flag to a value between 0 and 1, if the second SNR is greater than the second low threshold and less than the second high threshold.
  20. The system according to any one of claims 15-19, wherein the processor is further configured to:
    set the combination VAD flag is to 0, if both the values of the FF VAD flag and the FB VAD flag are 0;
    set the combination VAD flag to 1, if both the values of the FF VAD flag and the FB VAD flag are 1; and
    calculate the combination VAD flag using a weight parameter based on the value of the FF VAD flag and the value of the FB VAD flag, if one of the value of the FF VAD flag and the value of the FB VAD flag is not equal to 1.
PCT/CN2021/071768 2021-01-14 2021-01-14 Method and system for headphone with anc WO2022151156A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180090307.3A CN116803100A (en) 2021-01-14 2021-01-14 Method and system for headphones with ANC
PCT/CN2021/071768 WO2022151156A1 (en) 2021-01-14 2021-01-14 Method and system for headphone with anc

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/071768 WO2022151156A1 (en) 2021-01-14 2021-01-14 Method and system for headphone with anc

Publications (1)

Publication Number Publication Date
WO2022151156A1 true WO2022151156A1 (en) 2022-07-21

Family

ID=82447790

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071768 WO2022151156A1 (en) 2021-01-14 2021-01-14 Method and system for headphone with anc

Country Status (2)

Country Link
CN (1) CN116803100A (en)
WO (1) WO2022151156A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4322548A1 (en) * 2022-08-09 2024-02-14 Beijing Xiaomi Mobile Software Co., Ltd. Earphone controlling method and apparatus, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180225082A1 (en) * 2017-02-07 2018-08-09 Avnera Corporation User Voice Activity Detection Methods, Devices, Assemblies, and Components
CN108882092A (en) * 2018-07-03 2018-11-23 歌尔智能科技有限公司 A kind of earphone noise-reduction method and feedback noise reduction system
US10714073B1 (en) * 2019-04-30 2020-07-14 Synaptics Incorporated Wind noise suppression for active noise cancelling systems and methods
CN111800690A (en) * 2019-04-03 2020-10-20 Gn 奥迪欧有限公司 Headset with active noise reduction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180225082A1 (en) * 2017-02-07 2018-08-09 Avnera Corporation User Voice Activity Detection Methods, Devices, Assemblies, and Components
CN108882092A (en) * 2018-07-03 2018-11-23 歌尔智能科技有限公司 A kind of earphone noise-reduction method and feedback noise reduction system
CN111800690A (en) * 2019-04-03 2020-10-20 Gn 奥迪欧有限公司 Headset with active noise reduction
US10714073B1 (en) * 2019-04-30 2020-07-14 Synaptics Incorporated Wind noise suppression for active noise cancelling systems and methods

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4322548A1 (en) * 2022-08-09 2024-02-14 Beijing Xiaomi Mobile Software Co., Ltd. Earphone controlling method and apparatus, and storage medium

Also Published As

Publication number Publication date
CN116803100A (en) 2023-09-22

Similar Documents

Publication Publication Date Title
US11710473B2 (en) Method and device for acute sound detection and reproduction
JP7098771B2 (en) Audio signal processing for noise reduction
WO2020034544A1 (en) Earphone wearing status detection method and device and earphone
US20230353928A1 (en) Automatic active noise reduction (anr) control to improve user interaction
US9486823B2 (en) Off-ear detector for personal listening device with active noise control
KR102578147B1 (en) Method for detecting user voice activity in a communication assembly, its communication assembly
US20180225082A1 (en) User Voice Activity Detection Methods, Devices, Assemblies, and Components
KR102409536B1 (en) Event detection for playback management on audio devices
US10249323B2 (en) Voice activity detection for communication headset
US9654855B2 (en) Self-voice occlusion mitigation in headsets
US20210112338A1 (en) Dynamic control of multiple feedforward microphones in active noise reduction devices
CN115735362A (en) Voice activity detection
JPH10294989A (en) Noise control head set
WO2022151156A1 (en) Method and system for headphone with anc
US11696065B2 (en) Adaptive active noise cancellation based on movement
WO2023070917A1 (en) Noise reduction adjustment method, earphone, and computer-readable storage medium
CN115668370A (en) Voice detector of hearing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21918365

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180090307.3

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21918365

Country of ref document: EP

Kind code of ref document: A1