US8447595B2 - Echo-related decisions on automatic gain control of uplink speech signal in a communications device - Google Patents

Echo-related decisions on automatic gain control of uplink speech signal in a communications device Download PDF

Info

Publication number
US8447595B2
US8447595B2 US12/793,360 US79336010A US8447595B2 US 8447595 B2 US8447595 B2 US 8447595B2 US 79336010 A US79336010 A US 79336010A US 8447595 B2 US8447595 B2 US 8447595B2
Authority
US
United States
Prior art keywords
signal
speech
gain
end user
uplink
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/793,360
Other versions
US20110301948A1 (en
Inventor
Shaohai Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apple Inc
Original Assignee
Apple Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apple Inc filed Critical Apple Inc
Priority to US12/793,360 priority Critical patent/US8447595B2/en
Assigned to APPLE INC. reassignment APPLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CHEN, SHAOHAI
Publication of US20110301948A1 publication Critical patent/US20110301948A1/en
Application granted granted Critical
Publication of US8447595B2 publication Critical patent/US8447595B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Definitions

  • An embodiment of the invention relates to automatic gain control techniques applied to an uplink speech signal within a communications device such as a smart phone or a cellular phone. Other embodiments are also described.
  • a downlink speech signal contains the far-end user's speech. This may be playing through either a loudspeaker (speakerphone mode) or an earpiece speaker of the near-end user's device, and is inadvertently picked up by the primary microphone. This may be due to acoustic leakage within the near-end user's device or, especially in speakerphone mode, it may be due to reverberations from external objects that are near the loudspeaker.
  • An echo cancellation process takes samples of the far-end user's speech from the downlink signal and uses it to reduce the amount of the far-end user's speech that has been inadvertently picked up by the near-end user's microphone, thus reducing the likelihood that the far-end user will hear an echo of his own voice during the call.
  • AGC automatic gain control
  • AGC of an uplink signal in the near-end user's device is controlled so that its gain is “frozen” during time intervals (also referred to as frames) where the near-user is not speaking and there is apparent silence at the near-end user side of the conversation.
  • time intervals also referred to as frames
  • a decision is made to unfreeze the AGC, thereby allowing it to resume its adaptation of the gain during a speech frame. This is done in order to avoid undesired gain changes or noise amplification during silence frames, which the far-end user might find strange as he hears strongly varying background noise levels during silence frames.
  • a voice activity detector (VAD) circuit or algorithm is used, to determine whether a given frame of the uplink signal is a speech frame or a non-speech (silence) frame, and then on that basis a decision is made as to whether the AGC gain updating for the uplink signal should be frozen or not.
  • VAD voice activity detector
  • a method for performing a call between a near-end user and a far-end user may include the following operations (performed during the call by the near-end user's communications device).
  • a downlink speech signal is received from the far-end user's communications device.
  • An AGC process is performed to update a gain applied to an uplink speech signal, and the gain-updated uplink signal is transmitted to the far-end user's device.
  • a frame in the downlink signal that contains speech is detected, and in response the updating of the gain during a frame in the uplink signal is frozen.
  • the method continues with detecting a subsequent frame in the downlink signal that contains no speech; in response, the updating of the gain is unfrozen during a subsequent frame in the uplink signal.
  • FIG. 1 shows a human user holding different types of a multi-function communications device, namely handheld or mobile devices such as a smart phone and a laptop or notebook computer, during a call.
  • a multi-function communications device namely handheld or mobile devices such as a smart phone and a laptop or notebook computer, during a call.
  • FIG. 2 is a block diagram of some of the functional unit blocks and hardware components in an example communications device.
  • FIG. 3 depicts an example downlink and uplink frame sequence in which gain updating by AGC in the uplink sequence is frozen and unfrozen.
  • FIG. 1 shows a human user holding different types of a communications device, in this example a multi-function handheld mobile device referred to here as a personal mobile device 2 .
  • the mobile device is a smart phone or a multi-function cellular phone, shown in this example as being used in its speakerphone mode (as opposed to against the ear or handset mode).
  • a near-end user is in the process of a call with a far-end user (depicted in this case as using a tablet-like computer also in speakerphone mode).
  • the terms “call” and “telephony” are used here generically to refer to any two-way real-time or live communications session with a far-end user.
  • the call is being conducted through one or more communication networks 3 , e.g.
  • the far-end user need not be using a mobile device 2 , but instead may be using a landline based POTS or Internet telephony station.
  • POTS plain old telephone system
  • FIG. 2 a functional unit block diagram and some constituent hardware components of the mobile device 2 , such as found in, for instance, an iPhoneTM device by Apple Inc. are shown.
  • the device 2 has a housing in which the primary mechanism for visual and tactile interaction with its user is a touch sensitive display screen (referred to here as a touch screen 6 ).
  • a physical keyboard may be provided together with a display-only screen.
  • the housing may be essentially a solid volume, often referred to as a candy bar or chocolate bar type, as in the iPhoneTM device.
  • An alternative is one that has a moveable, multi-piece housing such as a clamshell design, or one with a sliding, physical keypad as used by other cellular and mobile handset or smart phone manufacturers.
  • the touch screen 6 displays typical features of visual voicemail, web browser, email, and digital camera viewfinder, as well as telephony features such as a virtual telephone number keypad (which may receive input from the user via virtual buttons and touch commands, as opposed to the physical keyboard option).
  • the user-level functions of the device are implemented under control of an applications processor 4 that has been programmed in accordance with instructions (code and data) stored in memory 5 , e.g. microelectronic, non-volatile random access memory.
  • memory 5 e.g. microelectronic, non-volatile random access memory.
  • the processor and memory are generically used here to refer to any suitable combination of programmable data processing components and data storage that can implement the operations needed for the various functions of the device described here.
  • An operating system may be stored in the memory 5 , along with application programs to perform specific functions of the device (when they are being run or executed by the processor 4 ).
  • a telephony application that (when launched, unsuspended, or brought to foreground) enables the near-end user to “dial” a telephone number or address of a communications device of the far-end user to initiate a call using, for instance, a cellular protocol, and then to “hang-up” the call when finished.
  • a cellular phone protocol may be implemented, using a cellular radio portion that includes a baseband processor 20 together with a cellular transceiver (not shown) and its associated antenna.
  • the baseband processor 20 may be designed to perform various communication functions needed for conducting a call. Such functions may include speech coding and decoding and channel coding and decoding (e.g., in accordance with cellular GSM, and cellular CDMA).
  • the device 2 offers the capability of conducting a call over a wireless local area network (WLAN) connection.
  • WLAN/Bluetooth transceiver 8 may be used for this purpose, with the added convenience of an optional wireless Bluetooth headset link. Packetizing of the uplink signal, and depacketizing of the downlink signal, may be performed by the applications processor 4 .
  • the applications processor 4 while running the telephony application program, may conduct the call by enabling the transfer of uplink and downlink digital audio signals (also referred to here as voice or speech signals) between the applications processor 4 or the baseband processor 20 on the network side, and any user-selected combination of acoustic transducers on the acoustic side.
  • the downlink signal carries speech of the far-end user during a call, while the uplink signal contains speech of the near-end user that has been picked up by the primary microphone.
  • the acoustic transducers include an earpiece speaker 12 , a loudspeaker (speakerphone) 14 , one or more microphones 16 including a primary microphone that is intended to pick-up the near-end user's speech primarily, and a wired headset 18 with a built-in microphone.
  • the analog-digital conversion interface between these acoustic transducers and the digital downlink and uplink signals is accomplished by an analog codec 9 .
  • the latter may also provide coding and decoding functions for preparing any data that is to be transmitted out of the device 2 through a connector 10 , and data that is received into the device 2 through the connector 10 .
  • This may be a conventional docking connector, used to perform a docking function that synchronizes the user's personal data stored in the memory 5 with the user's personal data stored in memory of an external computing system, such as a desktop computer or a laptop computer.
  • an uplink and downlink digital signal processor 21 is provided to perform a number of signal enhancement and noise reduction operations upon the digital audio uplink and downlink signals, to improve the experience of both near-end and far-end users during the call.
  • the processor 21 may be a separate integrated circuit die or package, and may have at least two digital audio bus interfaces (DABIs) 30 , 31 . These are used for transferring digital audio sequences to and from the baseband processor 20 , applications processor 4 , and analog codec 9 .
  • the digital audio bus interfaces may be in accordance with the I 2 S electrical serial bus interface specification, which is currently popular for connecting digital audio components and carrying pulse code modulated audio.
  • Various types of audio processing functions may be implemented in the downlink and uplink signal paths of the processor 21 .
  • the downlink signal path receives a downlink digital signal from either the baseband processor 20 or the applications processor 4 (originating as either a cellular network signal or a WLAN packet sequence) through the digital audio bus interface 30 .
  • the signal is buffered and is then subjected to various functions (also referred to here as a chain or sequence of functions), including some in downlink processing block 26 and perhaps others in downlink processing block 29 .
  • processing blocks 26 , 29 may include one or more of the following: a side tone mixer, a noise suppressor, a voice equalizer, an automatic gain control unit, and a compressor or limiter.
  • the downlink signal as a data stream or sequence is modified by each of these blocks, as it progresses through the signal path shown, until arriving at the digital audio bus interface 31 , which transfers the data stream to the analog codec 9 (for playback through the speaker 12 , 14 , or headset 18 ).
  • the uplink signal path of the processor 21 passes through a chain of several audio signal processors, including uplink processing block 24 , acoustic echo canceller (EC) 23 , and automatic gain control (AGC) block 32 .
  • the uplink processing block 24 may include at least one of the following: an equalizer, a compander or expander, and another uplink signal enhancement of noise reduction function.
  • the uplink data sequence is passed to the digital audio bus interface 30 which in turn transfers the data sequence to the baseband processor 20 for speech coding and channel coding prior, or to the applications processor 4 for Internet packetization (prior to being transmitted to the far-end user's device).
  • the signal processor 21 also includes a voice activity detector (VAD) 26 .
  • VAD voice activity detector
  • the VAD 26 has an input through which it obtains the downlink speech data sequence and then analyzes it, looking for time intervals or frames that contain speech (which is that of the far-end user during the call). For instance, the VAD 26 may classify or make a decision on each frame of the downlink sequence that it has analyzed, into one that either has speech or does not have speech, i.e. a silence or pause segment of the far-end user's speech.
  • the VAD 26 may provide, at its output, an identification of this time interval frame together with classification as speech or non-speech.
  • the AGC process will even out large amplitude variations in the uplink speech signal, by automatically reducing a gain that it applies to the speech signal if the signal is strong, and raising the gain when the signal is weak.
  • AGC block 32 continuously adapts its gain to the strength of its input signal, during a call.
  • the signal processor 21 freezes the updating of the gain that is being applied to the uplink signal (by the AGC block 32 ), during one or more incoming frames of the uplink data sequence that have been determined to be likely to contain some amount of echo of the far-end user's speech. For example, the last gain update computed by the AGC block 32 is applied but kept unchanged during the selected frames.
  • the decision to freeze (and then unfreeze) is made by a gain update controller 28 .
  • the controller 28 may receive from the VAD 27 an identification of a frame that has just been identified as a downlink speech frame. Next, following a predetermined time delay or frame delay in the uplink signal (in response to the indication from the VAD 27 ), the controller causes the gain updating of the AGC 32 to be frozen during the next incoming frame to the AGC 32 . This is depicted in the diagram of FIG. 3 . In that example, the delay is two frames, however in general it may be fewer or greater.
  • the predetermined delay may be estimated or set in advance, by determining the elapsed time or equivalent number of frames, for sending a given downlink frame through the following path: starting with the VAD 27 , then through the downlink signal processing block 29 , then through the analog codec 9 and out of a speaker (e.g., earpiece speaker 12 or loudspeaker 14 ), then reverberating or leaking into the microphone 16 , then through the uplink processing block 24 , then through the echo canceller 23 , and then arriving at the AGC block 32 .
  • a speaker e.g., earpiece speaker 12 or loudspeaker 14
  • the gain updating is unfrozen for the next incoming frame to the AGC block 32 .
  • the sequence in FIG. 3 depicts the following example: downlink frames 1 - 2 are Speech frames that result in corresponding uplink frames 1 - 2 having the applied gain by AGC block 32 frozen; downlink frames 3 - 9 are Non-Speech frames resulting in corresponding uplink frames 3 - 9 having the applied gain by AGC block 32 being updated, etc.
  • the “correspondence” between the downlink and uplink frames in this example is a two-frame delay (from the point in the downlink signal at which the speech or non-speech was detected).
  • the depictions therein may also be used to refer to certain operations of an algorithm or process for performing a call between a near-end user and a far-end user.
  • the process would include the following digital audio operations performed during the call by the near-end user's communications device: receiving a downlink speech signal from the far-end user's communications device (e.g., in downlink signal processing block 26 or in VAD 27 ); performing automatic gain control (AGC) to update a gain applied to an uplink speech signal (in AGC block 32 ) and then transmitting the uplink signal to the far-end user's device (e.g., by a cellular network transceiver associated with the baseband processor 20 , or by the WLAN/Bluetooth transceiver 8 ); and detecting a frame in the downlink signal that contains speech (e.g., by the VAD 27 ) and in response freezing the updating of the gain during a frame in the
  • AGC automatic gain control
  • the gain update controller 28 may be programmed at the factory with this delay or it may be dynamically updated during in-the-field use of the device 2 );
  • VAD 27 indicates the detection to the gain update controller 28 which then responds by allowing gain updates to be applied to the subsequent frame
  • the gain update controller 28 may use the same delay as it used before it froze the gain updating).
  • an embodiment of the invention may be a machine-readable medium (such as microelectronic memory) having stored thereon instructions, which program one or more data processing components (generically referred to here as a “processor”) to perform the digital domain operations described above including filtering, mixing, adding, subtracting, comparisons, and decision making.
  • data processing components generically referred to here as a “processor”
  • some of these operations might be performed in the analog domain, or by specific hardware components that contain hardwired logic (e.g., dedicated digital filter blocks). Those operations might alternatively be performed by any combination of programmed data processing components and fixed hardwired circuit components.
  • FIG. 2 is for a mobile communications device in which a wireless call is performed
  • the network connection for a call may alternatively be made using a wired Ethernet port (e.g., using an Internet telephony application that does not use the baseband processor 20 and its associated cellular radio transceiver and antenna).
  • the downlink and uplink signal processors depicted in FIG. 2 may thus be implemented in a desktop personal computer or in a land-line based Internet telephony station having a high speed land-lined based Internet connection. The description is thus to be regarded as illustrative instead of limiting.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)

Abstract

A method for performing a call between a near-end user and a far-end user, which includes the following operations performed during the call by the near-end user's communications device. Automatic gain control (AGC) is performed to update a gain applied to an uplink speech signal. A frame is detected in a downlink signal that contains speech; in response, the updating of the gain is frozen. Other embodiments are also described and claimed.

Description

An embodiment of the invention relates to automatic gain control techniques applied to an uplink speech signal within a communications device such as a smart phone or a cellular phone. Other embodiments are also described.
BACKGROUND
In-the-field of mobile communications using devices such as a smart phones and cellular phones, there are many audio signal processing operations that can impact how well a far-side user hears a conversation with a mobile phone user. For instance, there is active noise cancellation, which is an operation that estimates or detects such background noise, and then adds an appropriate anti-noise signal to an “uplink” speech signal of the near-end user, before transmitting the uplink signal to the far-end user's device during a call. This helps reduce the amount of the near-end user's background noise that might be heard by the far-end user.
Another problem that often appears during a call is that of acoustic echo. A downlink speech signal contains the far-end user's speech. This may be playing through either a loudspeaker (speakerphone mode) or an earpiece speaker of the near-end user's device, and is inadvertently picked up by the primary microphone. This may be due to acoustic leakage within the near-end user's device or, especially in speakerphone mode, it may be due to reverberations from external objects that are near the loudspeaker. An echo cancellation process takes samples of the far-end user's speech from the downlink signal and uses it to reduce the amount of the far-end user's speech that has been inadvertently picked up by the near-end user's microphone, thus reducing the likelihood that the far-end user will hear an echo of his own voice during the call.
Some users of a mobile phone tend to speak softly, whether intentional or not, while others speak loudly. The dynamic range of the speech signal in a mobile device, however, is limited (for practical reasons). In addition, it is generally accepted that one would prefer a fairly steady volume during a conversation with another person. A process known as automatic gain control (AGC) will even out large amplitude variations in the uplink speech signal, by automatically reducing a gain that is applied to the speech signal if the signal is strong, and raising the gain when the signal is weak. In other words, AGC continuously adapts its gain to the strength of its input signal during a call. It may be used separately for both uplink and downlink signals.
To further enhance acoustic experience for the far-end user, AGC of an uplink signal in the near-end user's device is controlled so that its gain is “frozen” during time intervals (also referred to as frames) where the near-user is not speaking and there is apparent silence at the near-end user side of the conversation. Once speech resumes, a decision is made to unfreeze the AGC, thereby allowing it to resume its adaptation of the gain during a speech frame. This is done in order to avoid undesired gain changes or noise amplification during silence frames, which the far-end user might find strange as he hears strongly varying background noise levels during silence frames. A voice activity detector (VAD) circuit or algorithm is used, to determine whether a given frame of the uplink signal is a speech frame or a non-speech (silence) frame, and then on that basis a decision is made as to whether the AGC gain updating for the uplink signal should be frozen or not.
SUMMARY
In accordance with an embodiment of the invention, decisions on whether or not to freeze the AGC gain updating for the uplink signal are made based on the possibility of far-end user speech echo being present in the uplink signal. Thus, a method for performing a call between a near-end user and a far-end user may include the following operations (performed during the call by the near-end user's communications device). A downlink speech signal is received from the far-end user's communications device. An AGC process is performed to update a gain applied to an uplink speech signal, and the gain-updated uplink signal is transmitted to the far-end user's device. A frame in the downlink signal that contains speech is detected, and in response the updating of the gain during a frame in the uplink signal is frozen.
In a further aspect of the invention, the method continues with detecting a subsequent frame in the downlink signal that contains no speech; in response, the updating of the gain is unfrozen during a subsequent frame in the uplink signal.
The above summary does not include an exhaustive list of all aspects of the present invention. It is contemplated that the invention includes all systems and methods that can be practiced from all suitable combinations of the various aspects summarized above, as well as those disclosed in the Detailed Description below and particularly pointed out in the claims filed with the application. Such combinations have particular advantages not specifically recited in the above summary.
BRIEF DESCRIPTION OF THE DRAWINGS
The embodiments of the invention are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that references to “an” or “one” embodiment of the invention in this disclosure are not necessarily to the same embodiment, and they mean at least one.
FIG. 1 shows a human user holding different types of a multi-function communications device, namely handheld or mobile devices such as a smart phone and a laptop or notebook computer, during a call.
FIG. 2 is a block diagram of some of the functional unit blocks and hardware components in an example communications device.
FIG. 3 depicts an example downlink and uplink frame sequence in which gain updating by AGC in the uplink sequence is frozen and unfrozen.
DETAILED DESCRIPTION
Several embodiments of the invention with reference to the appended drawings are now explained. While numerous details are set forth, it is understood that some embodiments of the invention may be practiced without these details. In other instances, well-known circuits, structures, and techniques have not been shown in detail so as not to obscure the understanding of this description.
FIG. 1 shows a human user holding different types of a communications device, in this example a multi-function handheld mobile device referred to here as a personal mobile device 2. In one instance, the mobile device is a smart phone or a multi-function cellular phone, shown in this example as being used in its speakerphone mode (as opposed to against the ear or handset mode). A near-end user is in the process of a call with a far-end user (depicted in this case as using a tablet-like computer also in speakerphone mode). The terms “call” and “telephony” are used here generically to refer to any two-way real-time or live communications session with a far-end user. The call is being conducted through one or more communication networks 3, e.g. a wireless cellular network, a wireless local area network, a wide area network such as the Internet, and a public switch telephone network such as the plain old telephone system (POTS). The far-end user need not be using a mobile device 2, but instead may be using a landline based POTS or Internet telephony station.
Turning now to FIG. 2, a functional unit block diagram and some constituent hardware components of the mobile device 2, such as found in, for instance, an iPhone™ device by Apple Inc. are shown. Although not shown, the device 2 has a housing in which the primary mechanism for visual and tactile interaction with its user is a touch sensitive display screen (referred to here as a touch screen 6). As an alternative, a physical keyboard may be provided together with a display-only screen. The housing may be essentially a solid volume, often referred to as a candy bar or chocolate bar type, as in the iPhone™ device. An alternative is one that has a moveable, multi-piece housing such as a clamshell design, or one with a sliding, physical keypad as used by other cellular and mobile handset or smart phone manufacturers. The touch screen 6 displays typical features of visual voicemail, web browser, email, and digital camera viewfinder, as well as telephony features such as a virtual telephone number keypad (which may receive input from the user via virtual buttons and touch commands, as opposed to the physical keyboard option).
The user-level functions of the device are implemented under control of an applications processor 4 that has been programmed in accordance with instructions (code and data) stored in memory 5, e.g. microelectronic, non-volatile random access memory. The processor and memory are generically used here to refer to any suitable combination of programmable data processing components and data storage that can implement the operations needed for the various functions of the device described here. An operating system may be stored in the memory 5, along with application programs to perform specific functions of the device (when they are being run or executed by the processor 4). In particular, there is a telephony application that (when launched, unsuspended, or brought to foreground) enables the near-end user to “dial” a telephone number or address of a communications device of the far-end user to initiate a call using, for instance, a cellular protocol, and then to “hang-up” the call when finished.
For wireless telephony, several options are available in the device depicted in FIG. 2. For instance, a cellular phone protocol may be implemented, using a cellular radio portion that includes a baseband processor 20 together with a cellular transceiver (not shown) and its associated antenna. The baseband processor 20 may be designed to perform various communication functions needed for conducting a call. Such functions may include speech coding and decoding and channel coding and decoding (e.g., in accordance with cellular GSM, and cellular CDMA). As an alternative to a cellular protocol, the device 2 offers the capability of conducting a call over a wireless local area network (WLAN) connection. A WLAN/Bluetooth transceiver 8 may be used for this purpose, with the added convenience of an optional wireless Bluetooth headset link. Packetizing of the uplink signal, and depacketizing of the downlink signal, may be performed by the applications processor 4.
The applications processor 4, while running the telephony application program, may conduct the call by enabling the transfer of uplink and downlink digital audio signals (also referred to here as voice or speech signals) between the applications processor 4 or the baseband processor 20 on the network side, and any user-selected combination of acoustic transducers on the acoustic side. The downlink signal carries speech of the far-end user during a call, while the uplink signal contains speech of the near-end user that has been picked up by the primary microphone. The acoustic transducers include an earpiece speaker 12, a loudspeaker (speakerphone) 14, one or more microphones 16 including a primary microphone that is intended to pick-up the near-end user's speech primarily, and a wired headset 18 with a built-in microphone. The analog-digital conversion interface between these acoustic transducers and the digital downlink and uplink signals is accomplished by an analog codec 9. The latter may also provide coding and decoding functions for preparing any data that is to be transmitted out of the device 2 through a connector 10, and data that is received into the device 2 through the connector 10. This may be a conventional docking connector, used to perform a docking function that synchronizes the user's personal data stored in the memory 5 with the user's personal data stored in memory of an external computing system, such as a desktop computer or a laptop computer.
Still referring to FIG. 2, an uplink and downlink digital signal processor 21 is provided to perform a number of signal enhancement and noise reduction operations upon the digital audio uplink and downlink signals, to improve the experience of both near-end and far-end users during the call. The processor 21 may be a separate integrated circuit die or package, and may have at least two digital audio bus interfaces (DABIs) 30, 31. These are used for transferring digital audio sequences to and from the baseband processor 20, applications processor 4, and analog codec 9. The digital audio bus interfaces may be in accordance with the I2S electrical serial bus interface specification, which is currently popular for connecting digital audio components and carrying pulse code modulated audio. Various types of audio processing functions may be implemented in the downlink and uplink signal paths of the processor 21.
The downlink signal path receives a downlink digital signal from either the baseband processor 20 or the applications processor 4 (originating as either a cellular network signal or a WLAN packet sequence) through the digital audio bus interface 30. The signal is buffered and is then subjected to various functions (also referred to here as a chain or sequence of functions), including some in downlink processing block 26 and perhaps others in downlink processing block 29. Each of these may be viewed as an audio signal processor. For instance, processing blocks 26, 29 may include one or more of the following: a side tone mixer, a noise suppressor, a voice equalizer, an automatic gain control unit, and a compressor or limiter. The downlink signal as a data stream or sequence is modified by each of these blocks, as it progresses through the signal path shown, until arriving at the digital audio bus interface 31, which transfers the data stream to the analog codec 9 (for playback through the speaker 12, 14, or headset 18).
The uplink signal path of the processor 21 passes through a chain of several audio signal processors, including uplink processing block 24, acoustic echo canceller (EC) 23, and automatic gain control (AGC) block 32. The uplink processing block 24 may include at least one of the following: an equalizer, a compander or expander, and another uplink signal enhancement of noise reduction function. After passing through the AGC block 32, the uplink data sequence is passed to the digital audio bus interface 30 which in turn transfers the data sequence to the baseband processor 20 for speech coding and channel coding prior, or to the applications processor 4 for Internet packetization (prior to being transmitted to the far-end user's device).
The signal processor 21 also includes a voice activity detector (VAD) 26. The VAD 26 has an input through which it obtains the downlink speech data sequence and then analyzes it, looking for time intervals or frames that contain speech (which is that of the far-end user during the call). For instance, the VAD 26 may classify or make a decision on each frame of the downlink sequence that it has analyzed, into one that either has speech or does not have speech, i.e. a silence or pause segment of the far-end user's speech. The VAD 26 may provide, at its output, an identification of this time interval frame together with classification as speech or non-speech.
Echo-Related Decisions on AGC Gain Updating
Still referring to FIG. 2, as explained above, the AGC process will even out large amplitude variations in the uplink speech signal, by automatically reducing a gain that it applies to the speech signal if the signal is strong, and raising the gain when the signal is weak. In other words, AGC block 32 continuously adapts its gain to the strength of its input signal, during a call. Now, assuming the AGC block 32 is active during the call, the signal processor 21 freezes the updating of the gain that is being applied to the uplink signal (by the AGC block 32), during one or more incoming frames of the uplink data sequence that have been determined to be likely to contain some amount of echo of the far-end user's speech. For example, the last gain update computed by the AGC block 32 is applied but kept unchanged during the selected frames.
In one embodiment, the decision to freeze (and then unfreeze) is made by a gain update controller 28. The controller 28 may receive from the VAD 27 an identification of a frame that has just been identified as a downlink speech frame. Next, following a predetermined time delay or frame delay in the uplink signal (in response to the indication from the VAD 27), the controller causes the gain updating of the AGC 32 to be frozen during the next incoming frame to the AGC 32. This is depicted in the diagram of FIG. 3. In that example, the delay is two frames, however in general it may be fewer or greater.
In one embodiment, the predetermined delay may be estimated or set in advance, by determining the elapsed time or equivalent number of frames, for sending a given downlink frame through the following path: starting with the VAD 27, then through the downlink signal processing block 29, then through the analog codec 9 and out of a speaker (e.g., earpiece speaker 12 or loudspeaker 14), then reverberating or leaking into the microphone 16, then through the uplink processing block 24, then through the echo canceller 23, and then arriving at the AGC block 32.
If the VAD 27 indicates that it has detected a non-speech (NS) frame, then in response, and optionally after waiting out the predetermined time interval or frame delay in the uplink signal, the gain updating is unfrozen for the next incoming frame to the AGC block 32. The sequence in FIG. 3 depicts the following example: downlink frames 1-2 are Speech frames that result in corresponding uplink frames 1-2 having the applied gain by AGC block 32 frozen; downlink frames 3-9 are Non-Speech frames resulting in corresponding uplink frames 3-9 having the applied gain by AGC block 32 being updated, etc. The “correspondence” between the downlink and uplink frames in this example is a two-frame delay (from the point in the downlink signal at which the speech or non-speech was detected).
While the block diagram of FIG. 2 refers to circuit or hardware components and/or specially programmed processors, the depictions therein may also be used to refer to certain operations of an algorithm or process for performing a call between a near-end user and a far-end user. In one embodiment, the process would include the following digital audio operations performed during the call by the near-end user's communications device: receiving a downlink speech signal from the far-end user's communications device (e.g., in downlink signal processing block 26 or in VAD 27); performing automatic gain control (AGC) to update a gain applied to an uplink speech signal (in AGC block 32) and then transmitting the uplink signal to the far-end user's device (e.g., by a cellular network transceiver associated with the baseband processor 20, or by the WLAN/Bluetooth transceiver 8); and detecting a frame in the downlink signal that contains speech (e.g., by the VAD 27) and in response freezing the updating of the gain during a frame in the uplink signal (by gain update controller 28).
The following additional process operations may be performed during the call:
waiting a predetermined delay (a given time interval or a given number of one or more frames) in response to detecting the frame in the downlink signal, before freezing the updating of the gain (the gain update controller 28 may be programmed at the factory with this delay or it may be dynamically updated during in-the-field use of the device 2);
detecting a subsequent frame in the downlink signal that contains no speech (e.g., by the VAD 27) and in response unfreezing the updating of the gain during a subsequent frame in the uplink signal (VAD 27 indicates the detection to the gain update controller 28 which then responds by allowing gain updates to be applied to the subsequent frame); and
waiting a predetermined delay in response to detecting the subsequent frame in the downlink signal, before unfreezing the updating of the gain (the gain update controller 28 may use the same delay as it used before it froze the gain updating).
As explained above, an embodiment of the invention may be a machine-readable medium (such as microelectronic memory) having stored thereon instructions, which program one or more data processing components (generically referred to here as a “processor”) to perform the digital domain operations described above including filtering, mixing, adding, subtracting, comparisons, and decision making. In other embodiments, some of these operations might be performed in the analog domain, or by specific hardware components that contain hardwired logic (e.g., dedicated digital filter blocks). Those operations might alternatively be performed by any combination of programmed data processing components and fixed hardwired circuit components.
While certain embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative of and not restrictive on the broad invention, and that the invention is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those of ordinary skill in the art. For example, although the block diagram of FIG. 2 is for a mobile communications device in which a wireless call is performed, the network connection for a call may alternatively be made using a wired Ethernet port (e.g., using an Internet telephony application that does not use the baseband processor 20 and its associated cellular radio transceiver and antenna). The downlink and uplink signal processors depicted in FIG. 2 may thus be implemented in a desktop personal computer or in a land-line based Internet telephony station having a high speed land-lined based Internet connection. The description is thus to be regarded as illustrative instead of limiting.

Claims (14)

What is claimed is:
1. A method for performing a call in a near-end user's communications device, comprising the following operations during the call:
receiving a downlink signal containing speech of a far-end user;
activating automatic gain control (AGC) for an uplink signal containing speech of the near-end user, the AGC to update a gain applied to the uplink signal by automatically reducing the gain where the uplink signal is strong, and raising the gain where the uplink signal is weak;
detecting when the uplink signal contains echo of far-end user speech and in response freezing the updating of the gain; and then
unfreezing the updating of the gain in response to detecting the uplink signal contains no far-end user speech echo.
2. The method of claim 1 wherein detecting when the uplink signal contains echo of far-end user speech comprises:
detecting speech in the downlink signal.
3. The method of claim 2 wherein detecting the uplink signal contains no far-end user speech echo comprises:
detecting silence in the downlink signal.
4. The method of claim 1 wherein detecting when the uplink signal contains echo of far-end user speech comprises identifying a frame in the downlink signal that contains speech, wherein the gain updating is frozen during a corresponding frame in the uplink signal.
5. The method of claim 4 wherein the corresponding frame in the uplink signal is one that contains echo of the speech in the identified frame in the downlink signal.
6. The method of claim 4 further comprising:
in response to the frame in the downlink signal being identified, waiting a predetermined delay until the start of the corresponding frame in the uplink signal.
7. A communications device comprising:
a downlink signal processor to process a downlink audio signal received from a far-end user's communications device, the downlink signal causes speech of the far-end user to be heard by the near-end user from a speaker;
an uplink signal processor to process an uplink audio signal picked up by a microphone and to be transmitted to the far-end user's device, the uplink signal processor having an automatic gain control (AGC) block that is to even out large amplitude variations in the uplink audio signal;
a voice activity detector (VAD) to detect a speech frame in the downlink audio signal; and
a gain update controller having an input coupled to an output of the VAD to receive indication of a detected downlink speech frame and in response make a decision to freeze gain updating by the AGC block.
8. The device of claim 7 wherein the VAD is to detect a non-speech frame in the downlink audio signal, and the gain update controller is to receive indication of the detected downlink non-speech frame and in response make a decision to un-freeze the gain updating by the AGC block.
9. The device of claim 7 wherein the downlink processor comprises a chain of audio signal processors including at least one of the group consisting of a noise suppresser, a voice equalizer, and an automatic gain control unit.
10. The device of claim 7 wherein the uplink processor comprises a chain of audio signal processors including at least one of the group consisting of an equalizer, an acoustic echo canceller, and a compander or expander.
11. A method for performing a call between a near-end user and a far-end user, the method comprising the following operations performed during the call by the near-end user's communications device:
receiving a downlink speech signal from the far-end user's communications device;
performing automatic gain control (AGC) to update a gain applied to an uplink speech signal by automatically reducing the gain where the uplink speech signal is strong, and raising the gain where the uplink speech signal is weak and then transmitting the uplink signal to the far-end user's device; and
detecting a frame in the downlink signal that contains speech and in response freezing the updating of the gain during a frame in the uplink signal.
12. The method of claim 11 further comprising:
waiting a predetermined delay in response to detecting the frame in the downlink signal, before freezing the updating of the gain.
13. The method of claim 11 further comprising:
detecting a subsequent frame in the downlink signal that contains no speech and in response unfreezing the updating of the gain during a subsequent frame in the uplink signal.
14. The method of claim 13 further comprising:
waiting a predetermined delay in response to detecting the subsequent frame in the downlink signal, before unfreezing the updating of the gain.
US12/793,360 2010-06-03 2010-06-03 Echo-related decisions on automatic gain control of uplink speech signal in a communications device Active 2031-06-16 US8447595B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/793,360 US8447595B2 (en) 2010-06-03 2010-06-03 Echo-related decisions on automatic gain control of uplink speech signal in a communications device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/793,360 US8447595B2 (en) 2010-06-03 2010-06-03 Echo-related decisions on automatic gain control of uplink speech signal in a communications device

Publications (2)

Publication Number Publication Date
US20110301948A1 US20110301948A1 (en) 2011-12-08
US8447595B2 true US8447595B2 (en) 2013-05-21

Family

ID=45065169

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/793,360 Active 2031-06-16 US8447595B2 (en) 2010-06-03 2010-06-03 Echo-related decisions on automatic gain control of uplink speech signal in a communications device

Country Status (1)

Country Link
US (1) US8447595B2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307251A1 (en) * 2010-06-15 2011-12-15 Microsoft Corporation Sound Source Separation Using Spatial Filtering and Regularization Phases
US20140066134A1 (en) * 2011-05-19 2014-03-06 Nec Corporation Audio processing device, audio processing method, and recording medium recording audio processing program
US20220329940A1 (en) * 2019-08-06 2022-10-13 Nippon Telegraph And Telephone Corporation Echo cancellation device, echo cancellation method, and program
US20230178090A1 (en) * 2021-12-08 2023-06-08 Nokia Technologies Oy Conversational Service

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8538035B2 (en) 2010-04-29 2013-09-17 Audience, Inc. Multi-microphone robust noise suppression
US8473287B2 (en) 2010-04-19 2013-06-25 Audience, Inc. Method for jointly optimizing noise reduction and voice quality in a mono or multi-microphone system
US8781137B1 (en) 2010-04-27 2014-07-15 Audience, Inc. Wind noise detection and suppression
US8447596B2 (en) 2010-07-12 2013-05-21 Audience, Inc. Monaural noise suppression based on computational auditory scene analysis
US9521263B2 (en) 2012-09-17 2016-12-13 Dolby Laboratories Licensing Corporation Long term monitoring of transmission and voice activity patterns for regulating gain control
CN108476555B (en) * 2016-10-31 2021-05-11 华为技术有限公司 Audio processing method and terminal equipment
EP3324406A1 (en) 2016-11-17 2018-05-23 Fraunhofer Gesellschaft zur Förderung der Angewand Apparatus and method for decomposing an audio signal using a variable threshold
EP3324407A1 (en) * 2016-11-17 2018-05-23 Fraunhofer Gesellschaft zur Förderung der Angewand Apparatus and method for decomposing an audio signal using a ratio as a separation characteristic
EP3823315B1 (en) * 2019-11-18 2024-01-10 Panasonic Intellectual Property Corporation of America Sound pickup device, sound pickup method, and sound pickup program

Citations (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4514703A (en) * 1982-12-20 1985-04-30 Motrola, Inc. Automatic level control system
US5016271A (en) * 1989-05-30 1991-05-14 At&T Bell Laboratories Echo canceler-suppressor speakerphone
US5099472A (en) * 1989-10-24 1992-03-24 Northern Telecom Limited Hands free telecommunication apparatus and method
US5548616A (en) * 1994-09-09 1996-08-20 Nokia Mobile Phones Ltd. Spread spectrum radiotelephone having adaptive transmitter gain control
US5566201A (en) * 1994-09-27 1996-10-15 Nokia Mobile Phones Ltd. Digital AGC for a CDMA radiotelephone
US5809463A (en) * 1995-09-15 1998-09-15 Hughes Electronics Method of detecting double talk in an echo canceller
US5901234A (en) * 1995-02-14 1999-05-04 Sony Corporation Gain control method and gain control apparatus for digital audio signals
US5907823A (en) * 1995-09-13 1999-05-25 Nokia Mobile Phones Ltd. Method and circuit arrangement for adjusting the level or dynamic range of an audio signal
US6148078A (en) * 1998-01-09 2000-11-14 Ericsson Inc. Methods and apparatus for controlling echo suppression in communications systems
US6169971B1 (en) 1997-12-03 2001-01-02 Glenayre Electronics, Inc. Method to suppress noise in digital voice processing
US6212273B1 (en) * 1998-03-20 2001-04-03 Crystal Semiconductor Corporation Full-duplex speakerphone circuit including a control interface
US6363343B1 (en) 1997-11-04 2002-03-26 Nokia Mobile Phones Limited Automatic gain control
US20020044666A1 (en) * 1999-03-15 2002-04-18 Vocaltec Communications Ltd. Echo suppression device and method for performing the same
US6453289B1 (en) * 1998-07-24 2002-09-17 Hughes Electronics Corporation Method of noise reduction for speech codecs
US6487178B1 (en) * 1999-05-12 2002-11-26 Ericsson Inc. Methods and apparatus for providing volume control in communicating systems including a linear echo canceler
US6526139B1 (en) * 1999-11-03 2003-02-25 Tellabs Operations, Inc. Consolidated noise injection in a voice processing system
US6563803B1 (en) * 1997-11-26 2003-05-13 Qualcomm Incorporated Acoustic echo canceller
US6618701B2 (en) 1999-04-19 2003-09-09 Motorola, Inc. Method and system for noise suppression using external voice activity detection
US20030228023A1 (en) * 2002-03-27 2003-12-11 Burnett Gregory C. Microphone and Voice Activity Detection (VAD) configurations for use with communication systems
US20030235312A1 (en) * 2002-06-24 2003-12-25 Pessoa Lucio F. C. Method and apparatus for tone indication
US6771701B1 (en) * 1999-05-12 2004-08-03 Infineon Technologies North America Corporation Adaptive filter divergence control in echo cancelers by means of amplitude distribution evaluation with configurable hysteresis
US6804203B1 (en) * 2000-09-15 2004-10-12 Mindspeed Technologies, Inc. Double talk detector for echo cancellation in a speech communication system
US20050004796A1 (en) * 2003-02-27 2005-01-06 Telefonaktiebolaget Lm Ericsson (Publ), Audibility enhancement
US6912209B1 (en) * 1999-04-13 2005-06-28 Broadcom Corporation Voice gateway with echo cancellation
US20060018460A1 (en) * 2004-06-25 2006-01-26 Mccree Alan V Acoustic echo devices and methods
US20060018457A1 (en) * 2004-06-25 2006-01-26 Takahiro Unno Voice activity detectors and methods
US20060217974A1 (en) * 2005-03-28 2006-09-28 Tellabs Operations, Inc. Method and apparatus for adaptive gain control
US20060247927A1 (en) * 2005-04-29 2006-11-02 Robbins Kenneth L Controlling an output while receiving a user input
US7155385B2 (en) * 2002-05-16 2006-12-26 Comerica Bank, As Administrative Agent Automatic gain control for adjusting gain during non-speech portions
US20070121021A1 (en) * 2005-11-29 2007-05-31 Kaehler John W Display device with acoustic noise suppression
US7231234B2 (en) * 2003-11-21 2007-06-12 Octasic Inc. Method and apparatus for reducing echo in a communication system
US7379866B2 (en) * 2003-03-15 2008-05-27 Mindspeed Technologies, Inc. Simple noise suppression model
US20080161064A1 (en) * 2006-12-29 2008-07-03 Motorola, Inc. Methods and devices for adaptive ringtone generation
US7433462B2 (en) * 2002-10-31 2008-10-07 Plantronics, Inc Techniques for improving telephone audio quality
US7440891B1 (en) * 1997-03-06 2008-10-21 Asahi Kasei Kabushiki Kaisha Speech processing method and apparatus for improving speech quality and speech recognition performance
US7464029B2 (en) 2005-07-22 2008-12-09 Qualcomm Incorporated Robust separation of speech signals in a noisy environment
US20090010453A1 (en) * 2007-07-02 2009-01-08 Motorola, Inc. Intelligent gradient noise reduction system
US20090010452A1 (en) * 2007-07-06 2009-01-08 Texas Instruments Incorporated Adaptive noise gate and method
US20090070106A1 (en) * 2006-03-20 2009-03-12 Mindspeed Technologies, Inc. Method and system for reducing effects of noise producing artifacts in a speech signal
US7555117B2 (en) * 2005-07-12 2009-06-30 Acoustic Technologies, Inc. Path change detector for echo cancellation
US7558729B1 (en) * 2004-07-16 2009-07-07 Mindspeed Technologies, Inc. Music detection for enhancing echo cancellation and speech coding
US20090281803A1 (en) * 2008-05-12 2009-11-12 Broadcom Corporation Dispersion filtering for speech intelligibility enhancement
US7630887B2 (en) * 2000-05-30 2009-12-08 Marvell World Trade Ltd. Enhancing the intelligibility of received speech in a noisy environment
US20100017205A1 (en) * 2008-07-18 2010-01-21 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for enhanced intelligibility
US20100086122A1 (en) * 2008-10-02 2010-04-08 Oki Electric Industry Co., Ltd. Echo canceller and echo cancelling method and program
US7773691B2 (en) * 2005-04-25 2010-08-10 Rf Micro Devices, Inc. Power control system for a continuous time mobile transmitter
US20110066428A1 (en) * 2009-09-14 2011-03-17 Srs Labs, Inc. System for adaptive voice intelligibility processing
US20120065967A1 (en) * 2009-05-27 2012-03-15 Panasonic Corporation Communication device and signal processing method
US8306215B2 (en) * 2009-12-17 2012-11-06 Oki Electric Industry Co., Ltd. Echo canceller for eliminating echo without being affected by noise

Patent Citations (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4514703A (en) * 1982-12-20 1985-04-30 Motrola, Inc. Automatic level control system
US5016271A (en) * 1989-05-30 1991-05-14 At&T Bell Laboratories Echo canceler-suppressor speakerphone
US5099472A (en) * 1989-10-24 1992-03-24 Northern Telecom Limited Hands free telecommunication apparatus and method
US5548616A (en) * 1994-09-09 1996-08-20 Nokia Mobile Phones Ltd. Spread spectrum radiotelephone having adaptive transmitter gain control
US5566201A (en) * 1994-09-27 1996-10-15 Nokia Mobile Phones Ltd. Digital AGC for a CDMA radiotelephone
US5901234A (en) * 1995-02-14 1999-05-04 Sony Corporation Gain control method and gain control apparatus for digital audio signals
US5907823A (en) * 1995-09-13 1999-05-25 Nokia Mobile Phones Ltd. Method and circuit arrangement for adjusting the level or dynamic range of an audio signal
US5809463A (en) * 1995-09-15 1998-09-15 Hughes Electronics Method of detecting double talk in an echo canceller
US7440891B1 (en) * 1997-03-06 2008-10-21 Asahi Kasei Kabushiki Kaisha Speech processing method and apparatus for improving speech quality and speech recognition performance
US6363343B1 (en) 1997-11-04 2002-03-26 Nokia Mobile Phones Limited Automatic gain control
US6563803B1 (en) * 1997-11-26 2003-05-13 Qualcomm Incorporated Acoustic echo canceller
US6169971B1 (en) 1997-12-03 2001-01-02 Glenayre Electronics, Inc. Method to suppress noise in digital voice processing
US6148078A (en) * 1998-01-09 2000-11-14 Ericsson Inc. Methods and apparatus for controlling echo suppression in communications systems
US6212273B1 (en) * 1998-03-20 2001-04-03 Crystal Semiconductor Corporation Full-duplex speakerphone circuit including a control interface
US6453289B1 (en) * 1998-07-24 2002-09-17 Hughes Electronics Corporation Method of noise reduction for speech codecs
US20020044666A1 (en) * 1999-03-15 2002-04-18 Vocaltec Communications Ltd. Echo suppression device and method for performing the same
US6912209B1 (en) * 1999-04-13 2005-06-28 Broadcom Corporation Voice gateway with echo cancellation
US6618701B2 (en) 1999-04-19 2003-09-09 Motorola, Inc. Method and system for noise suppression using external voice activity detection
US6487178B1 (en) * 1999-05-12 2002-11-26 Ericsson Inc. Methods and apparatus for providing volume control in communicating systems including a linear echo canceler
US6771701B1 (en) * 1999-05-12 2004-08-03 Infineon Technologies North America Corporation Adaptive filter divergence control in echo cancelers by means of amplitude distribution evaluation with configurable hysteresis
US6526139B1 (en) * 1999-11-03 2003-02-25 Tellabs Operations, Inc. Consolidated noise injection in a voice processing system
US20120101816A1 (en) * 2000-05-30 2012-04-26 Adoram Erell Enhancing the intelligibility of received speech in a noisy environment
US7630887B2 (en) * 2000-05-30 2009-12-08 Marvell World Trade Ltd. Enhancing the intelligibility of received speech in a noisy environment
US6804203B1 (en) * 2000-09-15 2004-10-12 Mindspeed Technologies, Inc. Double talk detector for echo cancellation in a speech communication system
US20030228023A1 (en) * 2002-03-27 2003-12-11 Burnett Gregory C. Microphone and Voice Activity Detection (VAD) configurations for use with communication systems
US7155385B2 (en) * 2002-05-16 2006-12-26 Comerica Bank, As Administrative Agent Automatic gain control for adjusting gain during non-speech portions
US20030235312A1 (en) * 2002-06-24 2003-12-25 Pessoa Lucio F. C. Method and apparatus for tone indication
US7433462B2 (en) * 2002-10-31 2008-10-07 Plantronics, Inc Techniques for improving telephone audio quality
US20050004796A1 (en) * 2003-02-27 2005-01-06 Telefonaktiebolaget Lm Ericsson (Publ), Audibility enhancement
US7379866B2 (en) * 2003-03-15 2008-05-27 Mindspeed Technologies, Inc. Simple noise suppression model
US7231234B2 (en) * 2003-11-21 2007-06-12 Octasic Inc. Method and apparatus for reducing echo in a communication system
US20060018460A1 (en) * 2004-06-25 2006-01-26 Mccree Alan V Acoustic echo devices and methods
US20060018457A1 (en) * 2004-06-25 2006-01-26 Takahiro Unno Voice activity detectors and methods
US7558729B1 (en) * 2004-07-16 2009-07-07 Mindspeed Technologies, Inc. Music detection for enhancing echo cancellation and speech coding
US20060217974A1 (en) * 2005-03-28 2006-09-28 Tellabs Operations, Inc. Method and apparatus for adaptive gain control
US7773691B2 (en) * 2005-04-25 2010-08-10 Rf Micro Devices, Inc. Power control system for a continuous time mobile transmitter
US20060247927A1 (en) * 2005-04-29 2006-11-02 Robbins Kenneth L Controlling an output while receiving a user input
US7555117B2 (en) * 2005-07-12 2009-06-30 Acoustic Technologies, Inc. Path change detector for echo cancellation
US7464029B2 (en) 2005-07-22 2008-12-09 Qualcomm Incorporated Robust separation of speech signals in a noisy environment
US20070121021A1 (en) * 2005-11-29 2007-05-31 Kaehler John W Display device with acoustic noise suppression
US20090070106A1 (en) * 2006-03-20 2009-03-12 Mindspeed Technologies, Inc. Method and system for reducing effects of noise producing artifacts in a speech signal
US20080161064A1 (en) * 2006-12-29 2008-07-03 Motorola, Inc. Methods and devices for adaptive ringtone generation
US20090010453A1 (en) * 2007-07-02 2009-01-08 Motorola, Inc. Intelligent gradient noise reduction system
US20090010452A1 (en) * 2007-07-06 2009-01-08 Texas Instruments Incorporated Adaptive noise gate and method
US20090281803A1 (en) * 2008-05-12 2009-11-12 Broadcom Corporation Dispersion filtering for speech intelligibility enhancement
US20100017205A1 (en) * 2008-07-18 2010-01-21 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for enhanced intelligibility
US20100086122A1 (en) * 2008-10-02 2010-04-08 Oki Electric Industry Co., Ltd. Echo canceller and echo cancelling method and program
US20120065967A1 (en) * 2009-05-27 2012-03-15 Panasonic Corporation Communication device and signal processing method
US20110066428A1 (en) * 2009-09-14 2011-03-17 Srs Labs, Inc. System for adaptive voice intelligibility processing
US8204742B2 (en) * 2009-09-14 2012-06-19 Srs Labs, Inc. System for processing an audio signal to enhance speech intelligibility
US8306215B2 (en) * 2009-12-17 2012-11-06 Oki Electric Industry Co., Ltd. Echo canceller for eliminating echo without being affected by noise

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Archibald, Fitzgerald J., "Software Implementation of Automatic Gain Controller for Speech Signal", Texas Instruments, White Paper, SPRAAL1-Jul. 2008, (pp. 1-16).
Pérez, J.P. Alegre, et al., "Automatic Gain Control", Analog Circuits and Signal Processing, Techniques and Architectures for RF Receivers, Chapter 2 AGC Fundamentals, 2011, DOI 10.1007/978-1-4614-0167-4-2, ISBN: 978-1-4614-0166-7, pp. 13-28.

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307251A1 (en) * 2010-06-15 2011-12-15 Microsoft Corporation Sound Source Separation Using Spatial Filtering and Regularization Phases
US8583428B2 (en) * 2010-06-15 2013-11-12 Microsoft Corporation Sound source separation using spatial filtering and regularization phases
US20140066134A1 (en) * 2011-05-19 2014-03-06 Nec Corporation Audio processing device, audio processing method, and recording medium recording audio processing program
US20220329940A1 (en) * 2019-08-06 2022-10-13 Nippon Telegraph And Telephone Corporation Echo cancellation device, echo cancellation method, and program
US12015902B2 (en) * 2019-08-06 2024-06-18 Nippon Telegraph And Telephone Corporation Echo cancellation device, echo cancellation method, and program
US20230178090A1 (en) * 2021-12-08 2023-06-08 Nokia Technologies Oy Conversational Service

Also Published As

Publication number Publication date
US20110301948A1 (en) 2011-12-08

Similar Documents

Publication Publication Date Title
US8447595B2 (en) Echo-related decisions on automatic gain control of uplink speech signal in a communications device
US8600454B2 (en) Decisions on ambient noise suppression in a mobile communications handset device
US9467779B2 (en) Microphone partial occlusion detector
US9100756B2 (en) Microphone occlusion detector
US9966067B2 (en) Audio noise estimation and audio noise reduction using multiple microphones
US9756422B2 (en) Noise estimation in a mobile device using an external acoustic microphone signal
US10074380B2 (en) System and method for performing speech enhancement using a deep neural network-based signal
US9058801B2 (en) Robust process for managing filter coefficients in adaptive noise canceling systems
US8744091B2 (en) Intelligibility control using ambient noise detection
US8775172B2 (en) Machine for enabling and disabling noise reduction (MEDNR) based on a threshold
US8861713B2 (en) Clipping based on cepstral distance for acoustic echo canceller
US9491545B2 (en) Methods and devices for reverberation suppression
US20090253457A1 (en) Audio signal processing for certification enhancement in a handheld wireless communications device
JP4241831B2 (en) Method and apparatus for adaptive control of echo and noise
US20070237339A1 (en) Environmental noise reduction and cancellation for a voice over internet packets (VOIP) communication device
US8744524B2 (en) User interface tone echo cancellation
JP2008543194A (en) Audio signal gain control apparatus and method
US20110300874A1 (en) System and method for removing tdma audio noise
JPH10322441A (en) Hand-free telephone set
CN106713685A (en) Hands-free communication control method
JP2005051629A (en) Voice transmitter-receiver

Legal Events

Date Code Title Description
AS Assignment

Owner name: APPLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHEN, SHAOHAI;REEL/FRAME:024496/0952

Effective date: 20100601

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8