WO2011149969A2 - Separating voice from noise using a network of proximity filters - Google Patents

Separating voice from noise using a network of proximity filters Download PDF

Info

Publication number
WO2011149969A2
WO2011149969A2 PCT/US2011/037781 US2011037781W WO2011149969A2 WO 2011149969 A2 WO2011149969 A2 WO 2011149969A2 US 2011037781 W US2011037781 W US 2011037781W WO 2011149969 A2 WO2011149969 A2 WO 2011149969A2
Authority
WO
WIPO (PCT)
Prior art keywords
noise
proximity
estimate
filter
voice
Prior art date
Application number
PCT/US2011/037781
Other languages
French (fr)
Other versions
WO2011149969A3 (en
Inventor
Shridhar Mukund
Vivek Nigam
Original Assignee
Ikoa Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ikoa Corporation filed Critical Ikoa Corporation
Publication of WO2011149969A2 publication Critical patent/WO2011149969A2/en
Publication of WO2011149969A3 publication Critical patent/WO2011149969A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/07Applications of wireless loudspeakers or wireless microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/11Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's

Definitions

  • the present invention relates generally to audio signal enhancement and in particular to systems, devices and methods for separating voice from noise in an audio signal.
  • a speech signal is received by one of these devices, in the presence of ambient noise, and is either transmitted to a user on the other side (in the case of cell phones, headsets, etc.) or translated to a set of actions (command consoles).
  • the noise-corrupted speech signal is captured by either a single microphone (cell phones) or multiple microphones (car command console).
  • ANC Adaptive noise cancellation
  • One of the microphones called the primary microphone
  • the remaining microphones provide noise references, relatively free of primary speech, which are assumed to be correlated with noise sources corrupting the primary microphone.
  • This method gives good noise suppression as long as good noise references are available.
  • the noise reference is not available, the method fails to perform satisfactorily.
  • providing a clean noise reference is usually a problem in devices that have a small form factor.
  • Another method proposed to suppress noise in primary speech utilizes an array of microphones.
  • the array forms a beam towards the target of primary speech thus capturing most of the speech energy and rejecting any energy that comes from outside the beam.
  • satisfactory performance is obtained only when the array is large in dimension and operates in an essentially reverberation-less environment.
  • the noise energy that falls in the speech beam is difficult to suppress.
  • the method is difficult to implement in communication devices due to their small form factor that limits the placement of microphones on the devices.
  • SS spectral subtraction
  • VAD voice activity detector
  • SS is mostly successful when the speech is corrupted by stationary noise.
  • SS performance is poor in the presence of rapidly changing non-stationary noise that defines the majority of practical noise scenarios.
  • BSS blind source separation
  • the present invention generally describes a device that assists in speech communication. Particularly, it describes a unique placement of sensors and a set of techniques that suppress noise in an audio signal and hence can be readily used with a multitude of devices including mobile phones, laptops, video games console, headsets and automobile command console, etc.
  • This invention provides an audio signal enhancement device.
  • the device includes a first and a second microphone, placed as close together as possible in one embodiment.
  • the first and second microphones have receiving surfaces facing in opposing or 5 different directions.
  • the first and second microphones receive a desired target audio signal originating in the proximity of the microphones and undesired noise signals not originating in the proximity of the microphones.
  • the audio signal enhancement device is incorporated into a small form factor device, such as a cell phone.
  • the acoustic pressure gradient is captured and utilized to enhance an audio signal referred to as a target signal.
  • the acoustic pressure gradient from the desired target signal between the first and the second microphones is greater than that from the noise signals.
  • Signal processing logic is included and is configured to generate a proximity-indicator signal and a pre- target-estimate signal by combining output from the first microphone and output of the second microphone.
  • the signal processing logic is further configured to generate a noise-estimate signal by combining the output from the first microphone with the proximity-indicator and the pre-target-estimate.
  • the signal processing logic is further configured to generate a target-estimate signal by combining the output from the first microphone with the proximity-indicator and the noise-estimate.
  • the signal processing logic is further configured to provide a target signal substantially free from noise by combining the target-estimate, noise-estimate and the proximity-indicator.
  • VT video telephony
  • An embodiment of the proposed invention is capable of suppressing noise in speech in VT applications.
  • the device proposed in this invention utilizes two microphones in the back to back configuration and hence has a small factor. This facilitates the usage of the signal enhancement circuitry in mobile phones, laptops and video game consoles.
  • an effective method to perform echo cancellation is provided. Echo is generated when speech emanating from the speakers of the cell phone is coupled with audio captured by the microphones and propagated back to the user on the other end. Echo is a problem in VT mode when the cell-phone speakers are operating at a relatively high volume. Echo not only is annoying, but also degrades the intelligibility of speech.
  • Figure 1A is a simplified schematic diagram illustrating a possible placement of microphones where the receiving surfaces of the two microphones make an angle that is other than zero in accordance with one embodiment of the invention.
  • Figure 1 B is a simplified schematic diagram illustrating the placement of the first and the second microphones of the proximity filter in accordance with one embodiment of the invention.
  • Figures 1 C and ID illustrate the concepts of near-field, far-field, and proximity-field in accordance with one embodiment of the invention.
  • Figure 2A is a simplified schematic diagram illustrating a mobile phone having microphones in back to back configuration for enhancing audio signals of a phone conversation in the proximity-field of the target speaker in accordance with one embodiment of the invention.
  • Figure 2B is a simplified schematic diagram of a laptop having microphones in back to back configuration in accordance with one embodiment of the invention.
  • Figure 2C is simplified schematic diagram of a wireless headset having microphones in back to back configuration in the proximity-field of the target speaker in accordance with one embodiment of the invention.
  • Figure 2D is a simplified schematic diagram illustrating a side view of the wireless headset of Figure 2C.
  • Figure 3A is a block diagram of the components of the proximity filter capable of suppressing noise from an audio signal of interest in accordance with one embodiment of the invention.
  • Figure 3B is a flow chart diagram illustrating the method operations for proximity filtering to provide a relatively noise-free and enhanced signal from an audio source within a noisy environment in accordance with one embodiment of the invention.
  • Figure 3C is a flow chart diagram illustrating further details of the balanced differential subtraction in accordance with one embodiment of the invention.
  • Figure 4A is a simplified schematic diagram illustrating the noise-estimating adaptive filter in accordance with one embodiment of the invention.
  • Figure 4B is a simplified schematic diagram illustrating the target-estimating adaptive filter in accordance with one embodiment of the invention.
  • Figure 4C is a simplified schematic diagram of the post-processing block in
  • FIG. 3A in accordance with one embodiment of the invention.
  • Figure 5A is a simplified schematic diagram of a proximity filter having a cylindrical shape in accordance with one embodiment of the invention.
  • Figure 5B is a simplified schematic diagram of multiple proximity filters where pairs of microphones are diametrically opposed to each other in accordance with one embodiment of the invention.
  • Figures 6A-6C illustrate proximity filter configurations including equidistant loud speaker placement in accordance with one embodiment of the invention.
  • Figure 7 is a simplified schematic diagram illustrating the data flow path that uses plurality of pairs of the first and the second microphones in accordance with one embodiment of the invention.
  • Figure 8 is a conceptual diagram illustrating one use of a network of proximity filters according to certain aspects of the invention.
  • Figure 9 is a flowchart illustrating a process used according to certain aspects of the invention.
  • Figure 10 depicts an embodiment in which three microphones can be used to suppress the ambient noise generated by plural noise sources according to certain aspects of the invention.
  • Figure 11 describes a solution applicable to the embodiment of Figure 10 using three microphones and a network of two proximity filters.
  • Figure 12 is a simplified schematic diagram illustrating one exemplary application for the network of proximity filters according to certain aspects of the invention.
  • FIG. 8 is a conceptual diagram for using a network of proximity filters that is fed by n microphones that capture signal of interest (primary voice) and ambient noise sources.
  • the network may also be provided with information about excitation sources of ambient noise sources (for example the reference signal that excites the loud-speakers in a room).
  • Each proximity filter in the network generates its voice and noise estimates by selectively utilizing information from the microphones, the reference signal and voice and noise estimates generated by other proximity filters in the network.
  • each proximity filter creates a bubble and an antibubble.
  • the voice estimate consists of all the voice and noise sources that fall within the bubble and the noise estimate consists of all the voice and noise sources that fall outside the bubble (inside the antibubble).
  • the differential processor receives the voice estimate and the noise estimate from each proximity filter in the network or in a sub-network and processes them to produce the final clean voice. In one
  • the differential processor decomposes the voice and the noise estimates into different frequency bands and then suppresses those components of the noise estimate that are present in the voice estimate by applying a certain time varying adaptive filtering to the voice estimate.
  • the time varying filter can be a time domain adaptive Wiener filter.
  • Figure 8a shows a flowchart corresponding to the functioning of the differential processor.
  • Figure 9 shows a scenario where three microphones can be used to suppress the ambient noise generated by noise sources N1 , N2 and N3 from the speech of the primary speaker.
  • Figure 10 explains how the use case of Figure 9 can be solved by using three microphones and a network of two proximity filters.
  • Proximity filter 1 processes the primary voice and the noise in mic 3 to produce cleaner primary voice (voice estimate 1 ) and a purer estimate of the ambient noise being sampled by mic 3 (noise estimate I).
  • Proximity filter 2 takes as its input the output of mic 2 along with the voice estimate 1 and produces an even cleaner version of the primary speech (voice estimate 2) and an estimate of the ambient noise being sampled by mic 2 (noise estimate 2).
  • the differential processor accepts as its inputs voice estimate 1 , noise estimate 1 and noise estimate 2 and processes them to generate the final clean primary speech.
  • Figure 11 is a simplified schematic diagram illustrating one exemplary application for the network of proximity filters described herein. In the exemplary illustration of Figure 11 , a videoconference session is taking place. The
  • a remote control unit includes microphone 1 attached thereto.
  • the remote control unit may control the television and a participant of the videoconference session is proximate to the remote control unit.
  • Microphones 2 and 3 are placed proximate to a computing device and the television, respectively.
  • microphones 1 -3, and corresponding proximity filters provide output which is processed by a differential processor to provide a clean version of the audio from the participant for the videoconference session.
  • the proximity filter logic may be integrated with each microphone or located remotely from the corresponding microphone. Details for the proximity filter logic and the signal processing of is provided in the attached supplement to this application.
  • the embodiments provide for enhanced audio clarity for many applications in addition to the video conferencing application described above.
  • the embodiments described herein may be integrated with those applications to enhance the audio clarity.
  • the television may be a web enabled television, such as the GOOGLE TV application and the remote may be a smart phone.
  • any of the operations described herein that form part of the invention are useful machine operations.
  • the invention also relates to a device or an apparatus for performing these operations.
  • the apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated, implemented, or configured by a computer program stored in the computer.
  • various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • any sound originating from a point source in space radiates from the point in a spherical pattern.
  • the wave of acoustic energy originating at this point moves outward in a spherical wavefront, whose size increases with time.
  • the intensity of sound decreases as the wavefront moves farther from the point source. This decrease is proportional to the square of the radius of the sphere.
  • the region very close to the sound source is called the "near-field" of the sound source and in this region a spherical propagating wavefront appears spherical to the sound capturing microphone.
  • the wavefront becomes larger in radius and appears planar to a sound capturing microphone. This region is called the "far-field of the sound source.
  • This region extends in space beyond a radius of
  • the acoustic pressure gradient which is the pressure level difference between two points in space, is largest if these points are located in the "proximity- field” and decreases as one move from the "near-field to the "far-field.
  • Noise canceling microphones make use of a large pressure gradient when placed in the "proximity-field” of a sound source.
  • the pressure difference due to the speaker between the front and the rear ports of a noise canceling microphone is large, giving rise to a significant resultant target signal.
  • noise sources that are located in the "far-field” of the noise canceling microphones have very small pressure gradients across their ports, giving rise to a very weak resultant noise signal and hence, a weaker impact on the signal of interest being captured by the microphones.
  • the embodiments described below describe a method and apparatus for providing a clean audio signal generated from a relatively close by signal source in a noisy environment.
  • Microphone pairs either in a single configuration or in an array, are placed back to back, or facing in different directions, on a suitable device to be operated in the proximity-field of a target speaker.
  • the microphone pairs receive a noise corrupted target signal, and the proximity filter amplifies one of the outputs of the microphones and subtracts this result from the output of the second microphone to yield a pre-target estimate.
  • a proximity indicator is then created to control further signal enhancement.
  • the pre-target estimate signal and the output from the second microphone of the microphone pair, along with the proximity indicator, are combined to generate a noise estimate. This noise estimate is then combined with the output of the first microphone and the proximity indicator to obtain a target-estimate
  • Figure 1A is a simplified schematic diagram illustrating a possible placement of microphones where the receiving surfaces of the two microphones make an angle that is other than zero in accordance with one embodiment of the invention.
  • Figure 1 B is a simplified schematic diagram illustrating the placement of the first and the second microphones of the proximity filter in accordance with one embodiment of the invention.
  • the first microphone's receiving surface 200a-1 faces the most preferred direction of incoming speech signal of interest.
  • First microphone 200a and second microphone 200b are placed back-to-back as close 10 together as possible, with their receiving surfaces 200a-1 and 200b-1 facing in opposing directions, in
  • the receiving surfaces are placed in a manner relative to each other where angle 201 , which represents an angle between an axis of receiving surfaces 200a-1 and 200b-1 , can be any angle between 0 degrees and 180 degrees.
  • angle 201 which represents an angle between an axis of receiving surfaces 200a-1 and 200b-1
  • the spacing between the first microphone and the second microphone is governed by the thickness of the device on which the microphones are mounted and may be as small as tens of microns and as large as tens of millimeters.
  • Figure 1C shows the concept of near-field and far-field for a point source where the point source does not generate turbulence and hence does not generate a proximity-field in accordance with one embodiment of the invention.
  • Point source 204 has associated with it near field 206 and far field 208.
  • Microphones 200a and 200b are illustrated as being placed within far field 208 and near field 206 for exemplary purposes. It should be noted that real world sound sources, such as the human head which may be referred to as
  • Figure ID is a simplified schematic diagram illustrating a near-field, far-field, and a proximity-field for an extended source in accordance with one embodiment of the invention.
  • An extended source generates turbulence and hence exhibits
  • proximity-field 210 in proximity to extended source 212.
  • the pressure variation is turbulent.
  • Near-field 206 shrinks for extended source 212 and might be altogether absent in one embodiment.
  • the acoustic pressure gradient of the primary sound source between the receiving surfaces of 200a and 200b is much greater in proximity-field 210 than in near-field 206 or far-field 208. It should be appreciated that the acoustic pressure gradient of a noise source, not in the proximity-field of the microphones, between the receiving surfaces of 200a and 200b is relatively small compared to the acoustic pressure gradient within proximity field 210.
  • FIG. 2A is a simplified schematic diagram illustrating a mobile phone 100a having microphones 200a and 200b in back to back configuration for enhancing audio signals of a phone conversation in the proximity-field of the target speaker in accordance with one embodiment of the invention.
  • Mobile phone 100a includes loudspeakers 300a and 300b that are positioned so that their transmitting surfaces are maximally orthogonal to the receiving surfaces of microphones 200a and 200b. Such placement of loudspeakers 300a and 300b enables cancellation of echo by the proposed proximity filter as discussed in further detail below.
  • FIG. 2B is a simplified schematic diagram of a laptop having microphones 200a and 200b in back to back configuration in accordance with one embodiment of the invention.
  • Laptop 100b includes loudspeakers 300a and 300b that are placed in such a way so that their transmitting surfaces are maximally orthogonal to the receiving surfaces of microphones 200a and 200b. Such placement of loudspeakers 300a and 300b guarantees cancellation of echo by the proposed proximity filter.
  • Microphones 200a and 200b provide a noise corrupted audio signal to the proximity filter that generates a clean audio signal.
  • Another embodiment of the proposed proximity filter makes use of multiple pairs (200a, 200b, and 200c, 200d) of back to back microphones as shown in Figure 2B. Each of these pairs captures noise corrupted target signal that is processed by an embodiment of the proximity filter shown in Figure 7 that can accept inputs from multiple pairs of back to back microphones and outputs final clear target estimate.
  • FIG. 2C is simplified schematic diagram of a wireless headset 100c having microphones 200a and 200b in back to back configuration in the proximity-field of the target speaker in accordance with one embodiment of the invention.
  • Wireless headset 100c also has loudspeaker 300a placed in such a way so that a transmitting surface is maximally orthogonal to the receiving surfaces of microphones 200a and 200b. Such placement of 300a enables cancellation of echo by the proposed proximity filter.
  • Microphones 200a and 200b provide noise corrupted audio signal to the proximity filter that generates a clean audio signal.
  • Figure 2D is a simplified schematic diagram illustrating a side view of the wireless headset of Figure 2C. In one embodiment, the wireless headset is hooked to the collar or pocket of the user in the proximity field of his mouth and performs noise and echo suppression in similar fashion as the device shown in 100c.
  • FIG. 3A is a block diagram of the components of the proximity filter capable of suppressing noise from an audio signal of interest in accordance with one embodiment of the invention.
  • the audio signals captured from the first microphone 200a, and the second microphone 200b, are provided to differential amplification and proximity indicator block 400a.
  • the differential amplification portion of block 400a applies differential amplification techniques to balance gains of microphones 200a and 200b as well as balanced differential subtraction between the outputs of the 200a and 200b to provide a pre-target estimate.
  • the balanced differential subtraction is further described in more detail with reference to flowchart in Figure 3C.
  • the proximity indicator portion of block 400a is configured to detect an audio signal of interest that is in proximity of 200a and 200b.
  • the proximity indicator detects non-diffused proximity speech, i.e., the audio signal of interest, and separates the audio signal of interest from diffused noise sources that are not in proximity of the microphones.
  • the proximity indicator provides an indication of speech presence in order to facilitate speech processing, as well as possibly providing the limiters for the beginning and end of speech segment.
  • the proximity indicator provides the percentage of the signal that is voice, i.e., proximity voice, which enables some of the adaptation techniques described herein.
  • the proximity indicator extracts some measured features or quantities from the input signal and compares these values with thresholds, usually extracted from the characteristics of the noise and speech signals.
  • the output from differential amplification and proximity indicator block 400a is then provided to noise estimating adaptive filter 400b and target estimating adaptive filter 400c. More specifically, the balanced rear microphone signal, which is the balanced output of microphone 200b, is inverted in block 500a and this inverted signal is added to the output of microphone 200a, the first microphone output, in block 500b. The output of first microphone 200a is also provided to adaptive filers 400b and 400c along with the proximity indicator signal.
  • adaptive filters 400b and 400c are used to remove background noise from the target signal which in one embodiment is speech. In another embodiment, the adaptive filters perform adaptive noise cancellation in the time domain.
  • adaptive noise cancellation algorithms pass a corrupted signal through a filter that tends to suppress the noise, while leaving the signal unchanged.
  • two inputs into each of adaptive filter 400b and 400c are provided.
  • One input into each of adaptive filter 400b and 400c is the signal corrupted by noise, and the other input contains noise correlated to the noise in the first input, but not correlated to the audio signal of interest.
  • the filter readjusts itself continuously to minimize error, thus, the adaptive label.
  • This adjustment is assisted by providing a third input, the proximity indicator signal, to each of the adaptive filters 400b and 400c. Accordingly, based on a certain percentage of the proximity voice in the signal, as indicated by the proximity indicator signal, the processing is adjusted.
  • one aspect of the adaptive nature of the filters is related to the proximity indicator signal.
  • the time interval over which the adaptive filters are adapted, as well as the speed of adaptation is governed by the proximity indicator signal.
  • the output of the adaptive noise cancellation block 400c is provided to post-processing block 400d.
  • Post processing block 400d processes the noise estimate input and the target estimate to provide a clean speech signal for output.
  • the output of post-processing block 400d is the final clear target estimate provided through the proximity filtering described herein.
  • the embodiments described herein operate optimally when the audio signal of interest has more differential impact on the front and the rear microphones as compared to the interfering noise. This condition more or less holds as long as the user is within the proximity field of the microphones.
  • Exemplary devices that the microphones may be attached to include a cell phone, a pocket personal computer, a web tablet, a laptop, a video game console, a digital voice recorder, and any other hand-held device in which voice related applications may be integrated therein.
  • FIG. 3B is a flow chart diagram illustrating the method operations for proximity filtering to provide a relatively noise-free and enhanced signal from an audio source within a noisy environment in accordance with one embodiment of the invention.
  • the method initiates with operation 600 where an audio signal of interest is captured along with interfering noise using the first and the second microphones in a back to back configuration.
  • the back to back configuration includes the receiving surfaces being angled relative to each other rather than directly opposing each other.
  • Exemplary configurations for the first and the second microphones are provided in Figures IA, IB, and 2A-2D.
  • the user is in proximity to a device having the microphone configuration described herein, and the user's voice, i.e., the source, is captured by the receiving surfaces of the microphones.
  • the microphones may be any commercially available microphones, such as, micro electro-mechanical system (MEMS) type microphones, electret microphones, etc.
  • MEMS micro electro-mechanical system
  • differential amplification and balanced differential subtraction are utilized between the outputs of the first and the second microphones to produce a pre-target estimate, which may be referred to as a good audio estimate. It should be noted that the differential amplification and balanced differential subtraction take place in the 10 differential amplification and proximity indicator block 400a of Figure 3A.
  • the method of Figure 3B then advances to operation 604 where the balanced first and balanced second microphone outputs are used to create a proximity indicator signal to detect the audio signal of interest.
  • the proximity indicator signal provides a measure of the proximity of the target speaker from the first and the second microphones.
  • the processing takes place in block 400a of Figure 3A.
  • the method of Figure 3B then moves to operation 606, where the pre-target estimate provided from operation 602 and the output of the first
  • the microphone, as well as the output of the proximity indicator are processed by an adaptive filter, e.g., the adaptive filter in block 400b of Figure 3A, are combined to obtain a noise estimate.
  • the proximity indicator signal assists the adaptive filter to adapt to the correct solution in an efficient way.
  • the method then advances to operation 608 where the noise estimate from operation 606 and the output of the frrst microphone, along with the output of the proximity indicator, are combined to obtain a target estimate, which is the source signal of interest substantially free from any noise.
  • the target estimate and the noise estimate are processed by the post processing block 400d in Figure 3A to yield final clear target estimate.
  • FIG. 3C is a flow chart diagram illustrating further details of the balanced differential subtraction in accordance with one embodiment of the invention.
  • the method starts with operation 612 where an audio signal of interest is captured, along with the interfering noise, using a back to back configuration of the first and second microphones in accordance with one embodiment of the invention.
  • the method then moves to operation 614 where the energy in the outputs of the first and the second microphones are calculated.
  • the energy output may be characterized as a function of the amplitude of the outputs of the first and the second microphones in one embodiment.
  • the method advances to operation 616 where the time indices when only noise is present in the outputs of the first and the second microphones are determined.
  • suitable thresholds which are a function of energy statistics, are used to determine time indices when only noise from outside the proximity field exists in the output of each of the microphones. The method then proceeds to operation 61 8 where for the time indices found above in operation 616, the ratio of energy between the 1
  • first and the second microphones is determined. That is, at the time indices when noise is predominantly present, the corresponding ratio of energy between the first and the second microphones is calculated.
  • the method then advances to operation 620 where the ratio calculated in operation 618 is analyzed to determine the value of the ratio assumed the most number of times, i.e., the maximally assumed ratio, and the maximally assumed ratio is used to calculate the
  • the value of the amplification factor from operation 620 is used to amplify the output of the second microphone which is then subtracted from the output of the first microphone to obtain a pre-target estimate.
  • FIG. 4A is a simplified schematic diagram illustrating the noise-estimating adaptive filter in accordance with one embodiment of the invention.
  • Causality delay 700 functions to delay the first microphone output to enable adaptive filter 701 to converge faster to the optimum solution by utilizing information ahead in time.
  • the signal component in the output of the first microphone that is correlated with the pre- target-estimate is adaptively subtracted by the filter 701 to yield the noise-estimate.
  • Figure 4B is a simplified schematic diagram illustrating the target-estimating adaptive filter in accordance with one embodiment of the invention.
  • Causality delay 702 delays the first microphone output to enable adaptive filter 703 to converge faster to the optimum solution by utilizing information ahead in time.
  • the noise component in the output of the first microphone that is correlated with the noise- estimate is adaptively subtracted by the filter 703 to yield the target-estimate.
  • Figure 4C is a simplified schematic diagram of the post-processing block in Figure 3A in accordance with one embodiment of the invention. Blocks 704a and 704b calculate the Fast Fourier Transform of the target-estimate and the noise- estimate, respectively.
  • Block 705 The outputs of 704a and 704b are fed to block 705 that adaptively remove the remaining noise from the target-estimate, in the frequency domain, to yield the final clean target-estimate.
  • Block 707 transforms the final clean target-estimate into the time domain.
  • Block 706 takes the outputs of blocks 704% 704b and 707 to adaptively select a smoothing parameter that helps the adaptive filtering in block 705.
  • FIG. 5A is a simplified schematic diagram of a proximity filter having a cylindrical shape in accordance with one embodiment of the invention.
  • Proximity filter 500 has a plurality of microphones 502 disposed over the cylindrical surface. As illustrated, microphones 502 are spatially arranged as columns of five microphones disposed along the cylindrical surface. It should be noted that the embodiments are not limited to this configuration. That is, any configuration may be utilized where pairs of microphones are diametrically opposed to each other to achieve the processing through the proximity filter described above.
  • Figure 5B is a simplified schematic diagram of multiple proximity filters where pairs of microphones are diametrically opposed to each other in accordance with one embodiment of the invention. In this embodiment, microphones are disposed at top surface 504 and bottom surface 506 in the back to back manner.
  • proximity filter 500 can obtain multiple noise estimates to be used to further enhance a voice signal.
  • a signal is captured through the microphones of column 502a and corresponding opposing column (not shown). This signal may be enhanced through the processing described above with respect to Figure 3A.
  • signals that are captured through the microphones of columns 502b and 502c may be used to provide noise estimates to further enhance the processing and achieve a better voice signal.
  • FIGs 6A-6C illustrate proximity filter configurations including equidistant loud speaker placement in accordance with one embodiment of the invention.
  • attachment device 600 has a front microphone attached to a top surface of the attachment device.
  • Speakers 602a and 602b are attached to opposing side surfaces. The placement of speakers 602a and 602b are such that the speakers are equidistant from each microphone of microphone pair in the back-to-back
  • the attachment device is a cell phone
  • the structure of Figures 6A-C provides for acoustic echo cancellation for operating the cell phone in full duplex mode.
  • the attachment device described herein may be any of the above mentioned portable devices shown in Figures 2A through 2D.
  • Figure 6B illustrates a side view of the microphone and speaker
  • speakers 602a and 602b can be placed anywhere along the corresponding side surface of attachment device 600 as long as speakers 602a and 602b are equidistantly placed and symmetrically located relative to microphones 200a and 200b.
  • speakers 602a and 602b can be placed anywhere along the corresponding side surface of attachment device 600 as long as speakers 602a and 602b are equidistantly placed and symmetrically located relative to microphones 200a and 200b.
  • Figure 6C illustrates an alternative embodiment to the speaker configuration of Figures 6A and 6B.
  • a single speaker is symmetrically placed relative to microphones 200a and 200b.
  • speaker 602c is disposed on a different side of device 600 than the speakers of Figures 6A and 6B, the speaker is equidistant to each of the microphones.
  • an axis of speaker 602c is orthogonal to an axis shared by microphones 200a and 200b.
  • Figure 7 is a simplified schematic diagram illustrating the data flow path that uses plurality of pairs of the first 200a and the second 200b microphones in accordance with one embodiment of the invention.
  • One of the first microphones from the plurality of pairs e.g., one which is closest to the target signal, is designated as the primary sensor of the constituent device.
  • the outputs microphones of 200a and 200b in each of multiple pairs are processed by differential amplification and proximity indicator block to generate a localized pre-tarqet estimate 71 for each pair.
  • Each of the localized pre-target estimates is array processed to generate a pre- target estimate 73.
  • the array processor 74 may be a broadside beamformer, an endfire beamformer or an independent component analysis unit in exemplary embodiments.
  • the pre-target estimate 73 and the balanced output of the first microphone 200a from each pair are passed through an adaptive filter 400b to generate the localized noise estimate 75 as perceived by each pair of microphones.
  • the plurality of localized noise estimates are passed as reference to the adaptive filter 76 whose primary signal is the output of the primary sensor.
  • the output of adaptive filter 76 is a target estimate.
  • the plurality of noise estimates are also array processed by an array processor 77 to yield a noise estimate.
  • the target estimate and the noise estimate are processed by a frequency domain adaptive filter 78 to yield a clear target estimate.
  • any of the operations described herein that form part of the invention are useful machine operations.
  • the invention also relates to a device or an apparatus for performing these operations.
  • the apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated, implemented, or configured by a computer program stored in the computer.
  • various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
  • Certain embodiments of the invention provide systems and methods for separating a signal of interest from noise. Certain embodiments comprise a first proximity filter adapted to receive an audio signal representative of sound from a microphone and an outputs from one or more other proximity filters. In certain embodiments, each of the first proximity filter and the other proximity filters
  • Certain embodiments comprise a differential processor configured to receive the estimates of the voice portions and the estimates of the noise from each proximity filter and to produce an output representative of the sound with noise extracted.
  • the first proximity filter receives audio signals representative of the sound from a plurality of microphones.
  • each of the other proximity filters receives audio signals representative of the sound from a plurality of microphones.
  • the each of the first proximity filter and the other proximity filters are interconnected in a network proximity filters.
  • the network proximity filters receives information related to an excitation source of ambient noise in the sound.
  • the excitation source comprises a reference signal that excites a loudspeaker.
  • each filter of the network of proximity filters effectively creates a bubble wherein the estimate of the voice portion comprises all voice and noise sources that fall within the bubble and the noise estimate comprises all voice and noise sources that fall outside the bubble.
  • the differential processor decomposes the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands and suppresses certain components of the noise estimate found in the voice estimate using a time-varying adaptive filter.
  • the time-varying adaptive filter comprises a time domain adaptive Wiener filter.
  • Certain embodiments of the invention provide a non-transitory computer- readable media having instructions and data encoded thereon, the instructions and data causing a processing system to perform a method.
  • the method comprises receiving audio signals representative of sound detected by one or more microohones associated with the each filter at each filter of a network of proximity filters.
  • the method comprises receiving an output from one or more of the other proximity filters at each filter of the network of proximity filters.
  • the method comprises generating an estimate of a voice portion of the sound and an estimate of noise in the sound at each filter of the network of proximity filters.
  • the method comprises producing an output representative of the voice portion with the noise suppressed using a differential processor configured to receive the estimates of the voice portions and the estimates of the noise from the network of proximity filters.
  • the network of proximity filters receives information related to a reference signal that excites a loudspeaker.
  • the method further comprises causing each proximity filter to create an effective bubble.
  • the estimate of the voice portion comprises all voice and noise sources that fall within the bubble and the noise estimate comprises all voice and noise sources that fall outside the bubble.
  • producing an output using the differential processor includes decomposing the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands.
  • producing an output using the differential processor includes decomposing the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands suppressing certain components of the noise estimate found in the voice estimate.
  • suppressing certain components of the noise estimate includes using a time-varying adaptive filter.
  • the time-varying adaptive filter comprises a time domain adaptive Wiener filter.
  • the differential processor performs at least a portion of the process shown in Figure 9.

Abstract

Systems and methods and a device are described that assist in speech communication using a network of proximity filters to suppress the ambient noise contaminating the signal of interest. A unique placement of sensors and a set of techniques that suppress noise in speech and hence can be readily used with plural devices including mobile phones, laptops, video games console, headsets and automobile command console, internet connected television, and so on.

Description

SEPARATING VOICE FROM NOISE USING A NETWORK OF PROXIMITY
FILTERS
Cross-Reference to Related Applications
[0001] The present Application claims priority from U.S. Provisional Patent
Application No. 61/349,164 filed May 27, 2010, which is expressly incorporated by reference herein. This application is also related to copending U.S. Patent
Application No. 11/757,110, which was filed on June 1 , 2007.
BACKGROUND OF THE INVENTION
Field of the Invention
[0002] The present invention relates generally to audio signal enhancement and in particular to systems, devices and methods for separating voice from noise in an audio signal.
Description of Related Art
[0003] There is constant drive to improve the operational performance of devices deigned for mobility, including mobile phones, laptops, video games console, headsets and automobile command console, etc. In many applications, a speech signal is received by one of these devices, in the presence of ambient noise, and is either transmitted to a user on the other side (in the case of cell phones, headsets, etc.) or translated to a set of actions (command consoles). The noise-corrupted speech signal is captured by either a single microphone (cell phones) or multiple microphones (car command console).
[0004] The presence of noise in the primary speech degrades its intelligibility, with the degradation being proportional to the noise energy. In cell phones, a person conversing in a noisy environment, like a crowded cafe or a busy train station, might not be able to converse properly as the noise corrupted speech perceived by the user on the other side is less intelligible. Similarly, a set of commands, delivered to a voice command console in an automobile, might not translate into proper actions, due to the presence of strong wind noise, or other environmental noises. In all such cases of speech corruption, a way of improving the quality of transmitted speech, by suppressing the interrupting noise, is desirable.
[0005] The problem of noise suppression has been addressed in a variety of manners, although these techniques do not provide a generic satisfactory solution for the small form consumer devices. Adaptive noise cancellation (ANC), which utilizes multiple microphones, was one attempt to improve capturing a signal in a noisy environment. One of the microphones, called the primary microphone, receives the primary speech signal that is corrupted by several noise sources. The remaining microphones provide noise references, relatively free of primary speech, which are assumed to be correlated with noise sources corrupting the primary microphone. This method gives good noise suppression as long as good noise references are available. However, in applications where the noise reference is not available, the method fails to perform satisfactorily. Furthermore, under ANC, providing a clean noise reference is usually a problem in devices that have a small form factor.
[0006] Another method proposed to suppress noise in primary speech utilizes an array of microphones. The array forms a beam towards the target of primary speech thus capturing most of the speech energy and rejecting any energy that comes from outside the beam. However, satisfactory performance is obtained only when the array is large in dimension and operates in an essentially reverberation-less environment. Also, the noise energy that falls in the speech beam is difficult to suppress. The method is difficult to implement in communication devices due to their small form factor that limits the placement of microphones on the devices.
[0007] Another widely used method to suppress noise in primary speech utilizes the method of spectral subtraction (SS). SS utilizes a voice activity detector (VAD) that identifies voice segments in speech and subtracts from it the spectrum of noise estimated from the non-voice (quiet) segments of the microphone output. However, VAD might not identify primary speech in the presence of strong speech-like noise sources, like the restaurant babble of people talking in the background. Moreover, SS is mostly successful when the speech is corrupted by stationary noise. SS performance is poor in the presence of rapidly changing non-stationary noise that defines the majority of practical noise scenarios.
[0008] Recently, methods utilizing statistical independence of speech and noise sources have been proposed to separate noise from speech. These methods, commonly called blind source separation (BSS) techniques, require as many sensors as the number of sound sources involved (sensor constraint). However, BSS algorithms perform poorly in realistic environments, where sensor constraint is not satisfied and where reverberations are dominant, which are conditions
encountered in almost all noisy environments. Thus, BSS techniques are not an optimal solution for small form factor devices. Based on these observations, there is a need for suppressing noise in an audio signal that is captured in a noisy
environment.
BRIEF SUMMARY OF THE INVENTION
[0009] The present invention generally describes a device that assists in speech communication. Particularly, it describes a unique placement of sensors and a set of techniques that suppress noise in an audio signal and hence can be readily used with a multitude of devices including mobile phones, laptops, video games console, headsets and automobile command console, etc.
[0010] This invention provides an audio signal enhancement device. The device includes a first and a second microphone, placed as close together as possible in one embodiment. The first and second microphones have receiving surfaces facing in opposing or 5 different directions. The first and second microphones receive a desired target audio signal originating in the proximity of the microphones and undesired noise signals not originating in the proximity of the microphones. In one embodiment, the audio signal enhancement device is incorporated into a small form factor device, such as a cell phone.
[0011] In the embodiments described below, the acoustic pressure gradient is captured and utilized to enhance an audio signal referred to as a target signal. The acoustic pressure gradient from the desired target signal between the first and the second microphones is greater than that from the noise signals. Signal processing logic is included and is configured to generate a proximity-indicator signal and a pre- target-estimate signal by combining output from the first microphone and output of the second microphone. The signal processing logic is further configured to generate a noise-estimate signal by combining the output from the first microphone with the proximity-indicator and the pre-target-estimate. The signal processing logic is further configured to generate a target-estimate signal by combining the output from the first microphone with the proximity-indicator and the noise-estimate. The signal processing logic is further configured to provide a target signal substantially free from noise by combining the target-estimate, noise-estimate and the proximity-indicator.
[0012] With more and more cell phones providing web services, cell phone users are taking up to browsing the Internet, reading text messages and watching videos on their cell phones besides giving speech commands to them to perform specific actions (like dialing a friend by calling his name or requesting a song by humming the song). These applications require the cell phone to be away from the human speaker while still capable of receiving the speech. This mode may be referred to as the video telephony (VT) mode. An embodiment of the proposed invention is capable of suppressing noise in speech in VT applications. In one embodiment, the device proposed in this invention utilizes two microphones in the back to back configuration and hence has a small factor. This facilitates the usage of the signal enhancement circuitry in mobile phones, laptops and video game consoles.
[0013] In one embodiment of the invention, an effective method to perform echo cancellation is provided. Echo is generated when speech emanating from the speakers of the cell phone is coupled with audio captured by the microphones and propagated back to the user on the other end. Echo is a problem in VT mode when the cell-phone speakers are operating at a relatively high volume. Echo not only is annoying, but also degrades the intelligibility of speech.
[0014] Other aspects and advantages of the invention will become apparent from the
[0015] following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
[0016] Figure 1A is a simplified schematic diagram illustrating a possible placement of microphones where the receiving surfaces of the two microphones make an angle that is other than zero in accordance with one embodiment of the invention.
[0017] Figure 1 B is a simplified schematic diagram illustrating the placement of the first and the second microphones of the proximity filter in accordance with one embodiment of the invention.
[0018] Figures 1 C and ID illustrate the concepts of near-field, far-field, and proximity-field in accordance with one embodiment of the invention.
[0019] Figure 2A is a simplified schematic diagram illustrating a mobile phone having microphones in back to back configuration for enhancing audio signals of a phone conversation in the proximity-field of the target speaker in accordance with one embodiment of the invention. [0020] Figure 2B is a simplified schematic diagram of a laptop having microphones in back to back configuration in accordance with one embodiment of the invention.
[0021] Figure 2C is simplified schematic diagram of a wireless headset having microphones in back to back configuration in the proximity-field of the target speaker in accordance with one embodiment of the invention.
[0022] Figure 2D is a simplified schematic diagram illustrating a side view of the wireless headset of Figure 2C.
[0023] Figure 3A is a block diagram of the components of the proximity filter capable of suppressing noise from an audio signal of interest in accordance with one embodiment of the invention.
[0024] Figure 3B is a flow chart diagram illustrating the method operations for proximity filtering to provide a relatively noise-free and enhanced signal from an audio source within a noisy environment in accordance with one embodiment of the invention.
[0025] Figure 3C is a flow chart diagram illustrating further details of the balanced differential subtraction in accordance with one embodiment of the invention.
[0026] Figure 4A is a simplified schematic diagram illustrating the noise-estimating adaptive filter in accordance with one embodiment of the invention.
[0027] Figure 4B is a simplified schematic diagram illustrating the target-estimating adaptive filter in accordance with one embodiment of the invention.
[0028] Figure 4C is a simplified schematic diagram of the post-processing block in
Figure 3A in accordance with one embodiment of the invention.
[0029] Figure 5A is a simplified schematic diagram of a proximity filter having a cylindrical shape in accordance with one embodiment of the invention.
[0030] Figure 5B is a simplified schematic diagram of multiple proximity filters where pairs of microphones are diametrically opposed to each other in accordance with one embodiment of the invention.
[0031] Figures 6A-6C illustrate proximity filter configurations including equidistant loud speaker placement in accordance with one embodiment of the invention.
[0032] Figure 7 is a simplified schematic diagram illustrating the data flow path that uses plurality of pairs of the first and the second microphones in accordance with one embodiment of the invention.
[0033] Figure 8 is a conceptual diagram illustrating one use of a network of proximity filters according to certain aspects of the invention. [0034] Figure 9 is a flowchart illustrating a process used according to certain aspects of the invention.
[0035] Figure 10 depicts an embodiment in which three microphones can be used to suppress the ambient noise generated by plural noise sources according to certain aspects of the invention.
[0036] Figure 11 describes a solution applicable to the embodiment of Figure 10 using three microphones and a network of two proximity filters.
[0037] Figure 12 is a simplified schematic diagram illustrating one exemplary application for the network of proximity filters according to certain aspects of the invention.
DETAILED DESCRIPTION OF THE INVENTION
[0038] Embodiments of the present invention will now be described in detail with reference to the drawings, which are provided as illustrative examples so as to enable those skilled in the art to practice the invention. Notably, the figures and examples below are not meant to limit the scope of the present invention to a single embodiment, but other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Wherever convenient, the same reference numbers will be used throughout the drawings to refer to same or like parts. Where certain elements of these embodiments can be partially or fully implemented using known components, only those portions of such known
components that are necessary for an understanding of the present invention will be described, and detailed descriptions of other portions of such known components will be omitted so as not to obscure the invention. In the present specification, an embodiment showing a singular component should not be considered limiting; rather, the invention is intended to encompass other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein.
Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such.
Further, the present invention encompasses present and future known equivalents to the components referred to herein by way of illustration.
[0039] Aspects of the present invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention. [0040] Figure 8 is a conceptual diagram for using a network of proximity filters that is fed by n microphones that capture signal of interest (primary voice) and ambient noise sources. The network may also be provided with information about excitation sources of ambient noise sources (for example the reference signal that excites the loud-speakers in a room). Each proximity filter in the network generates its voice and noise estimates by selectively utilizing information from the microphones, the reference signal and voice and noise estimates generated by other proximity filters in the network. Conceptually, each proximity filter creates a bubble and an antibubble. The voice estimate consists of all the voice and noise sources that fall within the bubble and the noise estimate consists of all the voice and noise sources that fall outside the bubble (inside the antibubble). The differential processor receives the voice estimate and the noise estimate from each proximity filter in the network or in a sub-network and processes them to produce the final clean voice. In one
embodiment of the invention, the differential processor decomposes the voice and the noise estimates into different frequency bands and then suppresses those components of the noise estimate that are present in the voice estimate by applying a certain time varying adaptive filtering to the voice estimate. In one embodiment of the invention the time varying filter can be a time domain adaptive Wiener filter. Figure 8a shows a flowchart corresponding to the functioning of the differential processor.
[0041] Figure 9 shows a scenario where three microphones can be used to suppress the ambient noise generated by noise sources N1 , N2 and N3 from the speech of the primary speaker.
[0042] Figure 10 explains how the use case of Figure 9 can be solved by using three microphones and a network of two proximity filters. Proximity filter 1 processes the primary voice and the noise in mic 3 to produce cleaner primary voice (voice estimate 1 ) and a purer estimate of the ambient noise being sampled by mic 3 (noise estimate I). Proximity filter 2 takes as its input the output of mic 2 along with the voice estimate 1 and produces an even cleaner version of the primary speech (voice estimate 2) and an estimate of the ambient noise being sampled by mic 2 (noise estimate 2). The differential processor accepts as its inputs voice estimate 1 , noise estimate 1 and noise estimate 2 and processes them to generate the final clean primary speech. [0043] Figure 11 is a simplified schematic diagram illustrating one exemplary application for the network of proximity filters described herein. In the exemplary illustration of Figure 11 , a videoconference session is taking place. The
videoconference session is being presented through a television or other suitable audio/video (A/V) equipment. A remote control unit includes microphone 1 attached thereto. The remote control unit may control the television and a participant of the videoconference session is proximate to the remote control unit. Microphones 2 and 3 are placed proximate to a computing device and the television, respectively. Thus, microphones 1 -3, and corresponding proximity filters provide output which is processed by a differential processor to provide a clean version of the audio from the participant for the videoconference session. It should be appreciated that the proximity filter logic may be integrated with each microphone or located remotely from the corresponding microphone. Details for the proximity filter logic and the signal processing of is provided in the attached supplement to this application. One skilled in the art will appreciate that the embodiments provide for enhanced audio clarity for many applications in addition to the video conferencing application described above. In addition, as smart phones become capable of more applications, including videoconferencing, the embodiments described herein may be integrated with those applications to enhance the audio clarity. In one embodiment, the television may be a web enabled television, such as the GOOGLE TV application and the remote may be a smart phone.
[0044] The embodiments described herein may make use of the Flow Logic Array semiconductor technology described in commonly owned US Patent Serial
Applications 11/426,887, 11/426,882, and 11/426,880. That is, the processing techniques defined in these references may be used to generate the processing logic described herein, in one embodiment.
[0045] Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated, implemented, or configured by a computer program stored in the computer. In particular, various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
[0046] Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications can be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims. It should be appreciated that exemplary claims are provided below and these claims are not meant to be limiting for future applications claiming priority from this application. The exemplary claims are meant to be illustrative and not restrictive.
[0047] Certain embodiments described herein employ a proximity filter that functions to suppress noise in an audio signal. It will be obvious, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
[0048] Any sound originating from a point source in space radiates from the point in a spherical pattern. The wave of acoustic energy originating at this point moves outward in a spherical wavefront, whose size increases with time. The intensity of sound decreases as the wavefront moves farther from the point source. This decrease is proportional to the square of the radius of the sphere. The region very close to the sound source is called the "near-field" of the sound source and in this region a spherical propagating wavefront appears spherical to the sound capturing microphone. However, as one moves away from the sound source, the wavefront becomes larger in radius and appears planar to a sound capturing microphone. This region is called the "far-field of the sound source. This region extends in space beyond a radius of |R| > 2D2 1 λ , where D is the diameter of the smallest sphere that can enclose all the sound sources and λ is the wavelength of the sound source. For a sound wave of frequency 1 KHz this radius is approximately 54 cm beyond the sound source in space, where the value of D is
[0049] 20 assumed to be 30 cm. For |R| > 2D2 1 λ , the near-field of the source is experienced. For extended sound sources, like the mouth of a speaker, there is a region relatively close to the sound source that experiences a turbulent pressure behavior. This region is analogous to that in immediate proximity of a pebble hitting still water where water movement is turbulent, but at a farther distance gives rise to more regular spherical energy waves. This region is referred to as the "proximity- field" of the source. The size of the proximity-field is generally a function of the size of the extended sound source and for human speakers, extends to a distance several tens of centimeters from the mouth. An increase in the size of the proximity field leads to the shrinkage of the near-field and for very large sound sources, the near-field might disappear by virtue of the sound capturing device being far off from the emitting source.
[0050] The acoustic pressure gradient, which is the pressure level difference between two points in space, is largest if these points are located in the "proximity- field" and decreases as one move from the "near-field to the "far-field. Noise canceling microphones make use of a large pressure gradient when placed in the "proximity-field" of a sound source. The pressure difference due to the speaker between the front and the rear ports of a noise canceling microphone is large, giving rise to a significant resultant target signal. However, noise sources that are located in the "far-field" of the noise canceling microphones have very small pressure gradients across their ports, giving rise to a very weak resultant noise signal and hence, a weaker impact on the signal of interest being captured by the microphones.
[0051] The embodiments described below describe a method and apparatus for providing a clean audio signal generated from a relatively close by signal source in a noisy environment. Microphone pairs, either in a single configuration or in an array, are placed back to back, or facing in different directions, on a suitable device to be operated in the proximity-field of a target speaker. The microphone pairs receive a noise corrupted target signal, and the proximity filter amplifies one of the outputs of the microphones and subtracts this result from the output of the second microphone to yield a pre-target estimate. A proximity indicator is then created to control further signal enhancement. The pre-target estimate signal and the output from the second microphone of the microphone pair, along with the proximity indicator, are combined to generate a noise estimate. This noise estimate is then combined with the output of the first microphone and the proximity indicator to obtain a target-estimate
substantially free from noise. The target-estimate is further processed along with the noise-estimate to yield a clear target estimate as described in more detail below. [0052] Figure 1A is a simplified schematic diagram illustrating a possible placement of microphones where the receiving surfaces of the two microphones make an angle that is other than zero in accordance with one embodiment of the invention. Figure 1 B is a simplified schematic diagram illustrating the placement of the first and the second microphones of the proximity filter in accordance with one embodiment of the invention. The first microphone's receiving surface 200a-1 faces the most preferred direction of incoming speech signal of interest. First microphone 200a and second microphone 200b are placed back-to-back as close 10 together as possible, with their receiving surfaces 200a-1 and 200b-1 facing in opposing directions, in
accordance with one embodiment of the invention. In another embodiment, the receiving surfaces are placed in a manner relative to each other where angle 201 , which represents an angle between an axis of receiving surfaces 200a-1 and 200b-1 , can be any angle between 0 degrees and 180 degrees. It should be appreciated that the spacing between the first microphone and the second microphone is governed by the thickness of the device on which the microphones are mounted and may be as small as tens of microns and as large as tens of millimeters. Figure 1C shows the concept of near-field and far-field for a point source where the point source does not generate turbulence and hence does not generate a proximity-field in accordance with one embodiment of the invention. Point source 204 has associated with it near field 206 and far field 208. Microphones 200a and 200b are illustrated as being placed within far field 208 and near field 206 for exemplary purposes. It should be noted that real world sound sources, such as the human head which may be referred to as an extended source, are not point sources.
[0053] Figure ID is a simplified schematic diagram illustrating a near-field, far-field, and a proximity-field for an extended source in accordance with one embodiment of the invention. An extended source generates turbulence and hence exhibits
proximity-field 210 in proximity to extended source 212. For example, in close proximity of the mouth of a speaker the pressure variation is turbulent. Near-field 206 shrinks for extended source 212 and might be altogether absent in one embodiment. In Figure ID the acoustic pressure gradient of the primary sound source between the receiving surfaces of 200a and 200b is much greater in proximity-field 210 than in near-field 206 or far-field 208. It should be appreciated that the acoustic pressure gradient of a noise source, not in the proximity-field of the microphones, between the receiving surfaces of 200a and 200b is relatively small compared to the acoustic pressure gradient within proximity field 210.
[0054] Figure 2A is a simplified schematic diagram illustrating a mobile phone 100a having microphones 200a and 200b in back to back configuration for enhancing audio signals of a phone conversation in the proximity-field of the target speaker in accordance with one embodiment of the invention. Mobile phone 100a includes loudspeakers 300a and 300b that are positioned so that their transmitting surfaces are maximally orthogonal to the receiving surfaces of microphones 200a and 200b. Such placement of loudspeakers 300a and 300b enables cancellation of echo by the proposed proximity filter as discussed in further detail below.
[0055] Figure 2B is a simplified schematic diagram of a laptop having microphones 200a and 200b in back to back configuration in accordance with one embodiment of the invention. Laptop 100b includes loudspeakers 300a and 300b that are placed in such a way so that their transmitting surfaces are maximally orthogonal to the receiving surfaces of microphones 200a and 200b. Such placement of loudspeakers 300a and 300b guarantees cancellation of echo by the proposed proximity filter. Microphones 200a and 200b provide a noise corrupted audio signal to the proximity filter that generates a clean audio signal. Another embodiment of the proposed proximity filter makes use of multiple pairs (200a, 200b, and 200c, 200d) of back to back microphones as shown in Figure 2B. Each of these pairs captures noise corrupted target signal that is processed by an embodiment of the proximity filter shown in Figure 7 that can accept inputs from multiple pairs of back to back microphones and outputs final clear target estimate.
[0056] Figure 2C is simplified schematic diagram of a wireless headset 100c having microphones 200a and 200b in back to back configuration in the proximity-field of the target speaker in accordance with one embodiment of the invention. Wireless headset 100c also has loudspeaker 300a placed in such a way so that a transmitting surface is maximally orthogonal to the receiving surfaces of microphones 200a and 200b. Such placement of 300a enables cancellation of echo by the proposed proximity filter. Microphones 200a and 200b provide noise corrupted audio signal to the proximity filter that generates a clean audio signal. Figure 2D is a simplified schematic diagram illustrating a side view of the wireless headset of Figure 2C. In one embodiment, the wireless headset is hooked to the collar or pocket of the user in the proximity field of his mouth and performs noise and echo suppression in similar fashion as the device shown in 100c.
[0057] Figure 3A is a block diagram of the components of the proximity filter capable of suppressing noise from an audio signal of interest in accordance with one embodiment of the invention. The audio signals captured from the first microphone 200a, and the second microphone 200b, are provided to differential amplification and proximity indicator block 400a. It should be appreciated that the differential amplification portion of block 400a applies differential amplification techniques to balance gains of microphones 200a and 200b as well as balanced differential subtraction between the outputs of the 200a and 200b to provide a pre-target estimate. The balanced differential subtraction is further described in more detail with reference to flowchart in Figure 3C. The proximity indicator portion of block 400a is configured to detect an audio signal of interest that is in proximity of 200a and 200b. One skilled in the art will appreciate that the proximity indicator detects non-diffused proximity speech, i.e., the audio signal of interest, and separates the audio signal of interest from diffused noise sources that are not in proximity of the microphones. In one embodiment, the proximity indicator provides an indication of speech presence in order to facilitate speech processing, as well as possibly providing the limiters for the beginning and end of speech segment. The proximity indicator provides the percentage of the signal that is voice, i.e., proximity voice, which enables some of the adaptation techniques described herein. In another embodiment, the proximity indicator extracts some measured features or quantities from the input signal and compares these values with thresholds, usually extracted from the characteristics of the noise and speech signals.
[0058] The output from differential amplification and proximity indicator block 400a is then provided to noise estimating adaptive filter 400b and target estimating adaptive filter 400c. More specifically, the balanced rear microphone signal, which is the balanced output of microphone 200b, is inverted in block 500a and this inverted signal is added to the output of microphone 200a, the first microphone output, in block 500b. The output of first microphone 200a is also provided to adaptive filers 400b and 400c along with the proximity indicator signal. One skilled in the art will appreciate that adaptive filters 400b and 400c are used to remove background noise from the target signal which in one embodiment is speech. In another embodiment, the adaptive filters perform adaptive noise cancellation in the time domain. Typically, adaptive noise cancellation algorithms pass a corrupted signal through a filter that tends to suppress the noise, while leaving the signal unchanged. Thus, two inputs into each of adaptive filter 400b and 400c are provided. One input into each of adaptive filter 400b and 400c is the signal corrupted by noise, and the other input contains noise correlated to the noise in the first input, but not correlated to the audio signal of interest. It should be appreciated that the filter readjusts itself continuously to minimize error, thus, the adaptive label. This adjustment is assisted by providing a third input, the proximity indicator signal, to each of the adaptive filters 400b and 400c. Accordingly, based on a certain percentage of the proximity voice in the signal, as indicated by the proximity indicator signal, the processing is adjusted. For example, one aspect of the adaptive nature of the filters is related to the proximity indicator signal. The time interval over which the adaptive filters are adapted, as well as the speed of adaptation is governed by the proximity indicator signal. The output of the adaptive noise cancellation block 400c is provided to post-processing block 400d. Post processing block 400d processes the noise estimate input and the target estimate to provide a clean speech signal for output. The output of post-processing block 400d is the final clear target estimate provided through the proximity filtering described herein. Thus, having a first and a second microphone in a back to back configuration provides a final clear target speech signal from a source that is relatively close to the proximity filter. The embodiments described herein operate optimally when the audio signal of interest has more differential impact on the front and the rear microphones as compared to the interfering noise. This condition more or less holds as long as the user is within the proximity field of the microphones. Exemplary devices that the microphones may be attached to include a cell phone, a pocket personal computer, a web tablet, a laptop, a video game console, a digital voice recorder, and any other hand-held device in which voice related applications may be integrated therein.
[0059] Figure 3B is a flow chart diagram illustrating the method operations for proximity filtering to provide a relatively noise-free and enhanced signal from an audio source within a noisy environment in accordance with one embodiment of the invention. The method initiates with operation 600 where an audio signal of interest is captured along with interfering noise using the first and the second microphones in a back to back configuration. In one embodiment, the back to back configuration includes the receiving surfaces being angled relative to each other rather than directly opposing each other. Exemplary configurations for the first and the second microphones are provided in Figures IA, IB, and 2A-2D. The user is in proximity to a device having the microphone configuration described herein, and the user's voice, i.e., the source, is captured by the receiving surfaces of the microphones. One skilled in the art will appreciate that the microphones may be any commercially available microphones, such as, micro electro-mechanical system (MEMS) type microphones, electret microphones, etc. In one embodiment, the MEMS
microphones are disposed on the same substrate or package. The method then proceeds to operation 602 where differential amplification and balanced differential subtraction are utilized between the outputs of the first and the second microphones to produce a pre-target estimate, which may be referred to as a good audio estimate. It should be noted that the differential amplification and balanced differential subtraction take place in the 10 differential amplification and proximity indicator block 400a of Figure 3A.
[0060] The method of Figure 3B then advances to operation 604 where the balanced first and balanced second microphone outputs are used to create a proximity indicator signal to detect the audio signal of interest. The proximity indicator signal provides a measure of the proximity of the target speaker from the first and the second microphones. Here again, the processing takes place in block 400a of Figure 3A. The method of Figure 3B then moves to operation 606, where the pre-target estimate provided from operation 602 and the output of the first
microphone, as well as the output of the proximity indicator are processed by an adaptive filter, e.g., the adaptive filter in block 400b of Figure 3A, are combined to obtain a noise estimate. The proximity indicator signal assists the adaptive filter to adapt to the correct solution in an efficient way. The method then advances to operation 608 where the noise estimate from operation 606 and the output of the frrst microphone, along with the output of the proximity indicator, are combined to obtain a target estimate, which is the source signal of interest substantially free from any noise. Finally, in operation 610, the target estimate and the noise estimate are processed by the post processing block 400d in Figure 3A to yield final clear target estimate.
[0061] Figure 3C is a flow chart diagram illustrating further details of the balanced differential subtraction in accordance with one embodiment of the invention. The method starts with operation 612 where an audio signal of interest is captured, along with the interfering noise, using a back to back configuration of the first and second microphones in accordance with one embodiment of the invention. The method then moves to operation 614 where the energy in the outputs of the first and the second microphones are calculated. The energy output may be characterized as a function of the amplitude of the outputs of the first and the second microphones in one embodiment. From operation 614, the method advances to operation 616 where the time indices when only noise is present in the outputs of the first and the second microphones are determined. In one embodiment, suitable thresholds which are a function of energy statistics, are used to determine time indices when only noise from outside the proximity field exists in the output of each of the microphones. The method then proceeds to operation 61 8 where for the time indices found above in operation 616, the ratio of energy between the 1
[0062] first and the second microphones is determined. That is, at the time indices when noise is predominantly present, the corresponding ratio of energy between the first and the second microphones is calculated. The method then advances to operation 620 where the ratio calculated in operation 618 is analyzed to determine the value of the ratio assumed the most number of times, i.e., the maximally assumed ratio, and the maximally assumed ratio is used to calculate the
amplification factor. In operation 622, the value of the amplification factor from operation 620 is used to amplify the output of the second microphone which is then subtracted from the output of the first microphone to obtain a pre-target estimate.
[0063] Figure 4A is a simplified schematic diagram illustrating the noise-estimating adaptive filter in accordance with one embodiment of the invention. Causality delay 700 functions to delay the first microphone output to enable adaptive filter 701 to converge faster to the optimum solution by utilizing information ahead in time. The signal component in the output of the first microphone that is correlated with the pre- target-estimate is adaptively subtracted by the filter 701 to yield the noise-estimate.
[0064] Figure 4B is a simplified schematic diagram illustrating the target-estimating adaptive filter in accordance with one embodiment of the invention. Causality delay 702 delays the first microphone output to enable adaptive filter 703 to converge faster to the optimum solution by utilizing information ahead in time. The noise component in the output of the first microphone that is correlated with the noise- estimate is adaptively subtracted by the filter 703 to yield the target-estimate. [0065] Figure 4C is a simplified schematic diagram of the post-processing block in Figure 3A in accordance with one embodiment of the invention. Blocks 704a and 704b calculate the Fast Fourier Transform of the target-estimate and the noise- estimate, respectively. The outputs of 704a and 704b are fed to block 705 that adaptively remove the remaining noise from the target-estimate, in the frequency domain, to yield the final clean target-estimate. Block 707 transforms the final clean target-estimate into the time domain. Block 706 takes the outputs of blocks 704% 704b and 707 to adaptively select a smoothing parameter that helps the adaptive filtering in block 705.
[0066] Figure 5A is a simplified schematic diagram of a proximity filter having a cylindrical shape in accordance with one embodiment of the invention. Proximity filter 500 has a plurality of microphones 502 disposed over the cylindrical surface. As illustrated, microphones 502 are spatially arranged as columns of five microphones disposed along the cylindrical surface. It should be noted that the embodiments are not limited to this configuration. That is, any configuration may be utilized where pairs of microphones are diametrically opposed to each other to achieve the processing through the proximity filter described above. Figure 5B is a simplified schematic diagram of multiple proximity filters where pairs of microphones are diametrically opposed to each other in accordance with one embodiment of the invention. In this embodiment, microphones are disposed at top surface 504 and bottom surface 506 in the back to back manner. It should be appreciated that the arrangement of Figure 5A enables efficient capture of a source within a range of the perimeter of the cylindrical surface of proximity filter 500. Thus, the cylindrical configuration allows for improved spatial resolution as the microphones are disposed on a cylindrical surface rather than a planar surface. In addition, proximity filter 500 can obtain multiple noise estimates to be used to further enhance a voice signal. For example, a signal is captured through the microphones of column 502a and corresponding opposing column (not shown). This signal may be enhanced through the processing described above with respect to Figure 3A. In addition, signals that are captured through the microphones of columns 502b and 502c may be used to provide noise estimates to further enhance the processing and achieve a better voice signal.
[0067] Figures 6A-6C illustrate proximity filter configurations including equidistant loud speaker placement in accordance with one embodiment of the invention. In Figure 6A, attachment device 600 has a front microphone attached to a top surface of the attachment device. Speakers 602a and 602b are attached to opposing side surfaces. The placement of speakers 602a and 602b are such that the speakers are equidistant from each microphone of microphone pair in the back-to-back
configuration. By placing speakers 602a and 602b in an equidistant/symmetrical manner, acoustic echo cancellation is provided through this placement configuration. For example, if the attachment device is a cell phone, the structure of Figures 6A-C provides for acoustic echo cancellation for operating the cell phone in full duplex mode. It should be noted that the attachment device described herein may be any of the above mentioned portable devices shown in Figures 2A through 2D.
[0068] Figure 6B illustrates a side view of the microphone and speaker
arrangement of Figure 6A. It should be appreciated that speakers 602a and 602b can be placed anywhere along the corresponding side surface of attachment device 600 as long as speakers 602a and 602b are equidistantly placed and symmetrically located relative to microphones 200a and 200b. One way of describing the
configuration of Figures 6A-6C is that microphones 200a and 200b share a planar axis and speakers 602a and 602b share a planar axis that is orthogonal to the planar axis of microphones 200a and 200b. Figure 6C illustrates an alternative embodiment to the speaker configuration of Figures 6A and 6B. In Figure 6C, a single speaker is symmetrically placed relative to microphones 200a and 200b. While speaker 602c is disposed on a different side of device 600 than the speakers of Figures 6A and 6B, the speaker is equidistant to each of the microphones. In addition, an axis of speaker 602c is orthogonal to an axis shared by microphones 200a and 200b. Thus, the output of the speakers of Figures 6A-C, which have equal impact on each of the corresponding microphones, delivers noise to the microphones. This noise can then be filtered out or cancelled through the processing described above with reference to Figure 3A.
[0069] Figure 7 is a simplified schematic diagram illustrating the data flow path that uses plurality of pairs of the first 200a and the second 200b microphones in accordance with one embodiment of the invention. One of the first microphones from the plurality of pairs, e.g., one which is closest to the target signal, is designated as the primary sensor of the constituent device. The outputs microphones of 200a and 200b in each of multiple pairs are processed by differential amplification and proximity indicator block to generate a localized pre-tarqet estimate 71 for each pair. Each of the localized pre-target estimates is array processed to generate a pre- target estimate 73. The array processor 74 may be a broadside beamformer, an endfire beamformer or an independent component analysis unit in exemplary embodiments. The pre-target estimate 73 and the balanced output of the first microphone 200a from each pair are passed through an adaptive filter 400b to generate the localized noise estimate 75 as perceived by each pair of microphones. The plurality of localized noise estimates are passed as reference to the adaptive filter 76 whose primary signal is the output of the primary sensor. The output of adaptive filter 76 is a target estimate. The plurality of noise estimates are also array processed by an array processor 77 to yield a noise estimate. Finally, the target estimate and the noise estimate are processed by a frequency domain adaptive filter 78 to yield a clear target estimate.
[0070] The embodiments described herein may make use of the Flow Logic Array semiconductor technology described in commonly owned US Patent Serial
Applications 11/426,887, 11/426,882, and 11/426,880, which are hereby
incorporated by reference for all purposes. That is, the processing techniques defined in these references may be used to generate the processing logic described herein, in one embodiment.
[0071] Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus can be specially constructed for the required purpose, or the apparatus can be a general-purpose computer selectively activated, implemented, or configured by a computer program stored in the computer. In particular, various general-purpose machines can be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
Additional Descriptions of Certain Aspects of the Invention
[0072] The foregoing descriptions of the invention are intended to be illustrative and not limiting. For example, those skilled in the art will appreciate that the invention can be practiced with various combinations of the functionalities and capabilities described above, and can include fewer or additional components than described above. Certain additional aspects and features of the invention are further set forth below, and can be obtained using the functionalities and components described in more detail above, as will be appreciated by those skilled in the art after being taught by the present disclosure.
[0073] Certain embodiments of the invention provide systems and methods for separating a signal of interest from noise. Certain embodiments comprise a first proximity filter adapted to receive an audio signal representative of sound from a microphone and an outputs from one or more other proximity filters. In certain embodiments, each of the first proximity filter and the other proximity filters
generates an estimate of a voice portion in the sound and an estimate of noise in the sound. Certain embodiments comprise a differential processor configured to receive the estimates of the voice portions and the estimates of the noise from each proximity filter and to produce an output representative of the sound with noise extracted.
[0074] In certain embodiments, the first proximity filter receives audio signals representative of the sound from a plurality of microphones. In certain embodiments, each of the other proximity filters receives audio signals representative of the sound from a plurality of microphones. In certain embodiments, the each of the first proximity filter and the other proximity filters are interconnected in a network proximity filters. In certain embodiments, the network proximity filters receives information related to an excitation source of ambient noise in the sound. In certain embodiments, the excitation source comprises a reference signal that excites a loudspeaker. In certain embodiments, each filter of the network of proximity filters effectively creates a bubble wherein the estimate of the voice portion comprises all voice and noise sources that fall within the bubble and the noise estimate comprises all voice and noise sources that fall outside the bubble.
[0075] In certain embodiments, the differential processor decomposes the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands and suppresses certain components of the noise estimate found in the voice estimate using a time-varying adaptive filter. In certain embodiments, the time-varying adaptive filter comprises a time domain adaptive Wiener filter.
[0076] Certain embodiments of the invention provide a non-transitory computer- readable media having instructions and data encoded thereon, the instructions and data causing a processing system to perform a method. In certain embodiments the method comprises receiving audio signals representative of sound detected by one or more microohones associated with the each filter at each filter of a network of proximity filters. In certain embodiments the method comprises receiving an output from one or more of the other proximity filters at each filter of the network of proximity filters. In certain embodiments the method comprises generating an estimate of a voice portion of the sound and an estimate of noise in the sound at each filter of the network of proximity filters. In certain embodiments the method comprises producing an output representative of the voice portion with the noise suppressed using a differential processor configured to receive the estimates of the voice portions and the estimates of the noise from the network of proximity filters.
[0077] In certain embodiments, the network of proximity filters receives information related to a reference signal that excites a loudspeaker. In certain embodiments, the method further comprises causing each proximity filter to create an effective bubble. In certain embodiments, the estimate of the voice portion comprises all voice and noise sources that fall within the bubble and the noise estimate comprises all voice and noise sources that fall outside the bubble. In certain embodiments, producing an output using the differential processor includes decomposing the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands. In certain embodiments, producing an output using the differential processor includes decomposing the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands suppressing certain components of the noise estimate found in the voice estimate. In certain
embodiments, suppressing certain components of the noise estimate includes using a time-varying adaptive filter. In certain embodiments, the time-varying adaptive filter comprises a time domain adaptive Wiener filter. In certain embodiments, the differential processor performs at least a portion of the process shown in Figure 9.
[0078] Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications can be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims. It should be appreciated that exemplary claims are provided below and these claims are not meant to be limiting for future applications claiming priority from this application. The exemplary claims are meant to be illustrative and not restrictive.

Claims

WHAT IS CLAIMED IS:
1. A system for separating a signal of interest from noise comprising:
a first proximity filter adapted to receive an audio signal representative of sound from a microphone and an output from one or more other proximity filters, wherein each of the first proximity filter and the other proximity filters generates an estimate of a voice portion in the sound and an estimate of noise in the sound;
a differential processor configured to receive the estimates of the voice portions and the estimates of the noise from each proximity filter and to produce an output representative of the voice portion in the sound, wherein noise in the output is suppressed.
2. The system of claim 1 , wherein the first proximity filter receives audio signals representative of the sound from a plurality of microphones.
3. The system of claim 2, wherein each of the other proximity filters receives audio signals representative of the sound from a plurality of microphones.
4. The system of claim 2, wherein the each of the first proximity filter and the other proximity filters are interconnected in a network of proximity filters.
5. The system of claim 4, wherein the network of proximity filters receives information related to an source of ambient noise in the sound.
6. The system of claim 5, wherein the source of ambient noise comprises a reference signal that excites a loudspeaker.
7. The system of claim 4, wherein each filter of the network of proximity filters effectively creates a bubble, wherein the estimate of the voice portion comprises all voice and noise sources that fall within the bubble and the noise estimate comprises all voice and noise sources that fall outside the bubble.
8. The system of claim 4, wherein the differential processor decomposes the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands and suppresses certain components of the noise estimate found in the voice estimate using a time-varying adaptive filter.
9. The system of claim 8, wherein the time-varying adaptive filter comprises a time domain adaptive Wiener filter.
10. A non-transitory computer-readable media having instructions and data encoded thereon, the instructions and data causing a processing system to perform a method comprising:
receiving, at each filter of a network of proximity filters, audio signals
representative of sound detected by one or more microphones associated with the each filter;
receiving, at each filter of the network of proximity filters, an output from one or more of the other proximity filters;
generating at each filter of the network of proximity filters, an estimate of a voice portion of the sound and an estimate of noise in the sound;
suppressing the noise; and
producing an output representative of the voice portion using a differential processor configured to receive the estimates of the voice portions and the estimates of the noise from the network of proximity filters.
11. The non-transitory computer-readable media of claim 10, wherein the network of proximity filters receives information related to a reference signal that excites a loudspeaker.
12. The non-transitory computer-readable media of claim 10, wherein the method further comprises causing each proximity filter to create an effective bubble, wherein the estimate of the voice portion comprises all voice and noise sources that fall within the bubble and the noise estimate comprises all voice and noise sources that fall outside the bubble.
13. The non-transitory computer-readable media of claim 10, wherein suppressing the noise includes suppressing certain components of the noise estimate found in the voice estimate, and wherein producing an output using the differential processor includes decomposing the voice estimate and the noise estimate from each proximity filter in the network into different frequency bands.
14. The non-transitory computer-readable media of claim 13, wherein suppressing certain components of the noise estimate includes using a time-varying adaptive filter.
15. The non-transitory computer-readable media of claim 14, wherein the time- varying adaptive filter comprises a time domain adaptive Wiener filter.
PCT/US2011/037781 2010-05-27 2011-05-24 Separating voice from noise using a network of proximity filters WO2011149969A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US34916410P 2010-05-27 2010-05-27
US61/349,164 2010-05-27

Publications (2)

Publication Number Publication Date
WO2011149969A2 true WO2011149969A2 (en) 2011-12-01
WO2011149969A3 WO2011149969A3 (en) 2012-04-05

Family

ID=45004723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/037781 WO2011149969A2 (en) 2010-05-27 2011-05-24 Separating voice from noise using a network of proximity filters

Country Status (1)

Country Link
WO (1) WO2011149969A2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2840571A3 (en) * 2013-08-23 2015-03-25 Samsung Electronics Co., Ltd Display apparatus and control method thereof
US9378753B2 (en) 2014-10-31 2016-06-28 At&T Intellectual Property I, L.P Self-organized acoustic signal cancellation over a network
US11218796B2 (en) 2015-11-13 2022-01-04 Dolby Laboratories Licensing Corporation Annoyance noise suppression

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100002892A1 (en) * 2007-03-30 2010-01-07 Fujitsu Limited Active noise reduction system and active noise reduction method
US20100024630A1 (en) * 2008-07-29 2010-02-04 Teie David Ernest Process of and apparatus for music arrangements adapted from animal noises to form species-specific music
US20100061564A1 (en) * 2007-02-07 2010-03-11 Richard Clemow Ambient noise reduction system
US20100103776A1 (en) * 2008-10-24 2010-04-29 Qualcomm Incorporated Audio source proximity estimation using sensor array for noise reduction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100061564A1 (en) * 2007-02-07 2010-03-11 Richard Clemow Ambient noise reduction system
US20100002892A1 (en) * 2007-03-30 2010-01-07 Fujitsu Limited Active noise reduction system and active noise reduction method
US20100024630A1 (en) * 2008-07-29 2010-02-04 Teie David Ernest Process of and apparatus for music arrangements adapted from animal noises to form species-specific music
US20100103776A1 (en) * 2008-10-24 2010-04-29 Qualcomm Incorporated Audio source proximity estimation using sensor array for noise reduction

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2840571A3 (en) * 2013-08-23 2015-03-25 Samsung Electronics Co., Ltd Display apparatus and control method thereof
US9402094B2 (en) 2013-08-23 2016-07-26 Samsung Electronics Co., Ltd. Display apparatus and control method thereof, based on voice commands
US9378753B2 (en) 2014-10-31 2016-06-28 At&T Intellectual Property I, L.P Self-organized acoustic signal cancellation over a network
US9842582B2 (en) 2014-10-31 2017-12-12 At&T Intellectual Property I, L.P. Self-organized acoustic signal cancellation over a network
US10242658B2 (en) 2014-10-31 2019-03-26 At&T Intellectual Property I, L.P. Self-organized acoustic signal cancellation over a network
US11218796B2 (en) 2015-11-13 2022-01-04 Dolby Laboratories Licensing Corporation Annoyance noise suppression

Also Published As

Publication number Publication date
WO2011149969A3 (en) 2012-04-05

Similar Documents

Publication Publication Date Title
US20080175408A1 (en) Proximity filter
US20100098266A1 (en) Multi-channel audio device
US10535362B2 (en) Speech enhancement for an electronic device
US10339952B2 (en) Apparatuses and systems for acoustic channel auto-balancing during multi-channel signal extraction
CN110741654B (en) Earplug voice estimation
KR101470262B1 (en) Systems, methods, apparatus, and computer-readable media for multi-microphone location-selective processing
KR101340215B1 (en) Systems, methods, apparatus, and computer-readable media for dereverberation of multichannel signal
US10269369B2 (en) System and method of noise reduction for a mobile device
US9313572B2 (en) System and method of detecting a user's voice activity using an accelerometer
US9438985B2 (en) System and method of detecting a user's voice activity using an accelerometer
US9031256B2 (en) Systems, methods, apparatus, and computer-readable media for orientation-sensitive recording control
US9094749B2 (en) Head-mounted sound capture device
US9197974B1 (en) Directional audio capture adaptation based on alternative sensory input
US8488803B2 (en) Wind suppression/replacement component for use with electronic systems
US8942383B2 (en) Wind suppression/replacement component for use with electronic systems
US9100734B2 (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
US8542843B2 (en) Headset with integrated stereo array microphone
US20150245129A1 (en) System and method of improving voice quality in a wireless headset with untethered earbuds of a mobile device
CN109195042B (en) Low-power-consumption efficient noise reduction earphone and noise reduction system
JP2008512888A (en) Telephone device with improved noise suppression
JP2009522942A (en) System and method using level differences between microphones for speech improvement
US20140341386A1 (en) Noise reduction
CN113544775B (en) Audio signal enhancement for head-mounted audio devices
CN116569564A (en) Bone conduction headset speech enhancement system and method
WO2011149969A2 (en) Separating voice from noise using a network of proximity filters

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11787269

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11787269

Country of ref document: EP

Kind code of ref document: A2