US8369251B2 - Timestamp quality assessment for assuring acoustic echo canceller operability - Google Patents

Timestamp quality assessment for assuring acoustic echo canceller operability Download PDF

Info

Publication number
US8369251B2
US8369251B2 US12/143,625 US14362508A US8369251B2 US 8369251 B2 US8369251 B2 US 8369251B2 US 14362508 A US14362508 A US 14362508A US 8369251 B2 US8369251 B2 US 8369251B2
Authority
US
United States
Prior art keywords
timestamp
adjustments
queue
drift rate
end device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US12/143,625
Other versions
US20090316881A1 (en
Inventor
Vinod Prakash
Chao He
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US12/143,625 priority Critical patent/US8369251B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: HE, CHAO, PRAKASH, VINOD
Publication of US20090316881A1 publication Critical patent/US20090316881A1/en
Application granted granted Critical
Publication of US8369251B2 publication Critical patent/US8369251B2/en
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers

Definitions

  • FIG. 1 illustrates one end 100 of a typical two-way communication system. The other end is exactly the same.
  • the far-end voice is played through a loud speaker 160 and captured by the microphone 110 in the system and sent back to the far end.
  • the far-end user then hears his or her own voice with a certain delay.
  • AEC Acoustic Echo Cancellation
  • an Acoustic Echo Cancellation is used to remove echo of the far end user's voice.
  • the example near end 100 includes a capture stream path and a render stream path for the audio data in the two directions.
  • the far end of the two-way communication system is exactly the same.
  • an analog to digital (A/D) converter 120 converts the analog sound mic(t) captured by microphone 110 to digital audio samples continuously at a sampling rate (fs mic ).
  • the digital audio samples are saved in capture buffer 130 sample by sample.
  • the samples are retrieved from the capture buffer in frame increments (herein denoted as “mic[n]”).
  • Frame here means a number (N) of digital audio samples.
  • the index ‘n’ is used to indicate relative sampling instants for the frames.
  • samples in mic[n] are processed, including encoding via a voice encoder 170 and sent to the other end.
  • the system receives the encoded voice signal from the other end, decodes audio samples via voice decoder 180 and places the audio samples into a render buffer 140 in periodic frame increments (labeled “spk[n]” in the figure). Then the digital to analog (D/A) converter 150 reads audio samples from the render buffer sample by sample and converts them to an analog signal continuously at a sampling rate, fs spk . Finally, the analog signal is played by speaker 160 .
  • D/A digital to analog
  • the near end user's voice is captured by the microphone 110 and sent to the other end.
  • the far end user's voice is transmitted through the network to the near end, and played through the speaker 160 or headphone.
  • both users can hear each other and two-way communication is established.
  • a problem occurs if a speaker is used instead of a headphone to play the other end's voice. For example, if the near end user uses a speaker as shown in FIG. 1 , his microphone captures not only his voice but also an echo of the sound played from the speaker (labeled as “echo(t)”).
  • the mic[n] signal that is sent to the far end user includes an echo of the far end user's voice.
  • the far end user would hear a delayed echo of his or her voice, which is likely to cause annoyance and provide a poor user experience to that user.
  • the echo echo(t) can be represented by speaker signal spk(t) convolved by a linear response g(t) (assuming the room can be approximately modeled as a finite duration linear plant) as per the following equation:
  • AEC 210 is added in the system as shown in FIG. 2 .
  • a frame of samples in the mic[n] signal is retrieved from the capture buffer 130 , they are sent to the AEC 210 .
  • a frame of samples in the spk[n] signal is sent to the render buffer 140 , they are also sent to the AEC 210 .
  • the AEC 210 uses the spk[n] signal from the far end to predict the echo in the captured mic[n] signal. Then, the AEC 210 subtracts the predicted echo from the mic[n] signal. This difference or residual is the clear voice signal (voice[n]), which is theoretically echo free and very close to the near end user's voice (voice(t)).
  • FIG. 3 depicts an implementation of the AEC 210 based on an adaptive filter 310 .
  • the AEC 210 takes two inputs, the mic[n] and spk[n] signals. It uses the spk[n] signal to predict the echo in the mic[n] signal.
  • the prediction residual (difference of the mic[n] signal from the prediction based on spk[n]) is the voice[n] signal, which will be output as echo free voice and sent to the far end.
  • the actual room response (that is represented as g(t) in the above convolution equation) usually varies with time, such as due to change in position of the microphone 110 or speaker 160 , body movement of the near end user, and even room temperature.
  • the room response therefore cannot be pre-determined, and must be calculated adaptively at running time.
  • the AEC 210 commonly is based on adaptive filters such as Least Mean Square (LMS) adaptive filters 310 , which can adaptively model the varying room response.
  • LMS Least Mean Square
  • the AEC mode has to determine which samples in the speaker signal (spk[n]) are needed to predict the echo at a given sample in the microphone signal (mic[n]).
  • the AEC operates on two streams (the microphone and speaker samples), which generally are sampled by two different sampling clocks and may each be subject to delays. Accordingly, the same indices in the two stream may not be necessarily aligned in physical time.
  • timestamps are typically used to align the microphone and speaker signals, since the timestamp represents the physical time of when a sample is rendered (in the speaker stream) or captured (in the microphone stream).
  • Frames of speaker spk[n] and microphone mic[n] signals are stored in separate data queues and the timestamps are used to make adjustments to the speaker (or microphone) data queues in order to align the speaker and microphone signals.
  • a difference in render and capture sampling (clock) rates is called drift, and to compensate for this, periodic single sample adjustments commensurate with the drift rate are made to the speaker data queue.
  • a glitch occurs (i.e., data loss of one or multiple samples in the speaker or microphone streams) an adjustment of many samples of data may be made at once in the speaker data queue.
  • Voice switching is a method used for half-duplex two-way communication.
  • a typical example of such communication system has two signal channels: an incoming channel that receives the voice signal coming from the far-end, and an outgoing channel that sends the near end voice signal to the far-end.
  • the far-end may be another end user device.
  • the far end may be a server that hosts the multiple user conference. Based on voice activity being present at the two ends, the channels are selectively turned on or off. In other words, whenever there is voice activity in one channel, the other channel is turned off.
  • voice switching By selectively switching off either incoming or outgoing channels based on voice activity in this way, the echo path is broken, which effectively removes acoustic echoes.
  • voice switching is that it provides only half-duplex mode of communication, resulting in loss of easy interruptability in conversations.
  • Voice switching is commonly used on low-end desktop phones in speaker phone mode.
  • a basic voice switching algorithm simply compares the strength of near-end and far-end voices and turns on the communication channel for the end with the stronger voice. It is relatively simple to compare voice activity on a standalone or dedicated phone device, because the microphone and speaker gains are known. During double talk scenarios (i.e., in which both ends are talking simultaneously), it is easy to estimate echo strength and thus easy to compare which voice is stronger.
  • any microphone or speaker may be connected to the computer, and the gains could be adjusted by the users at any time. This complicates the ability to estimate the echo strength, and therefore to compare the voice strength on the channels to accurately determine which channel should be switched on.
  • the system utilizes acoustic echo cancellation for full duplex voice communications between two communication end devices under normal operating conditions. Additionally, the system includes a voice switching mode as a fall back for situations where acoustic echo cancellation fails or would likely fail to function properly.
  • the technique concerns ways to appropriately decide between use of the normal operation mode utilizing acoustic echo cancellation, and use of the voice switching mode.
  • One way for making the decision between these modes of operation is to enable the voice switching mode based independently on timestamp based factors that can lead to speaker (or microphone data) queue adjustments during acoustic echo cancellation operation.
  • the decision between modes relies on an overall measurement of timestamp quality. More particularly, the combined effect of all the timestamp parameter variations on the acoustic echo cancellation process occurs through adjustments to the input data queue (e.g., adjusting the relative offset between speaker and microphone queues or buffers). Hence, the overall impact of timestamp quality on the acoustic echo cancellation can be investigated by examining the rate at which adjustments are being made to the queue. In general summary, the technique therefore chooses or remains in the acoustic echo cancellation mode (i.e., does not enable voice switching) if adjustments to the queue are being made in a consistent/periodic manner, and the frequency of adjustments is within tolerance for drift of the acoustic echo cancellation process.
  • the median and median absolute deviation of the rate at which adjustments are made to the queue is used as a measure of queue update consistency.
  • This implementation of the mode decision technique decides to enable voice switching if the median drift rate or median absolute deviation of the estimated drift rate (both based on queue adjustments) or the rate at which glitches occur exceed pre-determined thresholds.
  • FIG. 1 is a block diagram illustrating one end of a typical two-way communication system in the prior art.
  • FIG. 2 is a block diagram of the two-way communication system of FIG. 1 with audio echo cancellation.
  • FIG. 3 is a block diagram of an implementation of audio echo cancellation based on an adaptive filter.
  • FIG. 4 is a flow diagram illustrating an end device of a two-way voice communication system including selection of acoustic echo cancellation or voice switching mode operation based on timestamp quality.
  • FIG. 5 is a flow diagram illustrating a decision to operate in acoustic echo cancellation mode or to enable voice switching mode based on timestamp quality in the two-way voice communication end device of FIG. 4 .
  • FIG. 6 is a flow diagram illustrating a process to evaluate timestamp quality for the acoustic echo cancellation or voice switching mode decision of FIG. 5 using measurements of consistency of the rate at which queue adjustments are made.
  • FIG. 7 is a block diagram of a generalized operating environment in conjunction with which various described embodiments may be implemented.
  • the following detailed description concerns various techniques and systems for providing acoustic echo cancellation with voice switching as a fall back mode in two-way communication systems.
  • the described techniques provide a mode decision that reliably and accurately assesses whether acoustic echo cancellation is feasible based on timestamp quality, by measuring the consistency of the frequency at which adjustments are made to input data queues due to timestamp drift rate.
  • the mode decision techniques are described with particular application in personal computer based telephony and voice chat applications, where the voice switching technique may be employed as a fall back measure in the case that acoustic echo cancellation fails to work properly (such as, due to inaccurate or “noisy” timestamps preventing alignment of microphone and speaker signals).
  • the techniques to decide whether to fall back to voice switching can be applied more broadly to other two-way voice communication systems and scenarios.
  • FIG. 4 illustrates one end of a two-way communication system that includes the improved AEC or Voice Switching Mode selection, as described more fully below.
  • the other end is typically, but not necessarily identical.
  • Each end may be a communication device, such as a phone device or a personal computer with a telephony or voice chat application, or a game console, among other examples.
  • the far end can be a communication server, such as a voice conferencing host server.
  • the illustrated near end 100 includes a capture stream path and a render stream path for the audio data in the two directions.
  • an analog to digital (A/D) converter 120 converts the analog sound captured by microphone 110 to digital audio samples continuously at a sampling rate (fs mic ).
  • the digital audio samples are saved in capture buffer 130 sample by sample.
  • the samples are retrieved from the capture buffer in frame increments (herein denoted as “mic[n]”).
  • Frame here means a number (N) of digital audio samples.
  • samples in mic[n] are processed, including encoding via a voice encoder 170 and sent to the other end.
  • the system receives the encoded voice signal from the other end, decodes audio samples via voice decoder 180 and places the audio samples into a render buffer 140 in periodic frame increments (labeled “spk[n]” in the figure). Then the digital to analog (D/A) converter 150 reads audio samples from the render buffer sample by sample and converts them to an analog signal continuously at a sampling rate, fs spk . Finally, the analog signal is played by speaker 160 .
  • D/A digital to analog
  • the capture and render buffers (with associated histories) 130 , 140 are also referred to herein as microphone and speaker data queues, respectively.
  • the illustrated communication system end device includes selective AEC or voice switching mode operation 410 to avoid or reduce acoustic echo (echo(t)) of the far end speaker's voice picked up by the microphone 110 to be sent back to the far end.
  • acoustic echo echo(t)
  • the selective AEC/voice switching mode operation 410 of the two-way communication system provides full duplex two-way communication using acoustic echo cancellation 210 , which is implemented as illustrated in FIG. 2 and discussed above.
  • the selective mode operation instead enables voice switching mode as a fall back mechanism under operating conditions where the acoustic echo cancellation 210 would fail to work properly to cancel echo or would introduce unacceptable noise or distortion effects.
  • the two-way communication end 400 analyzes the AEC mode operation via one or more quality checks to determine if the communication quality is sufficient for acoustic echo cancellation to work properly, so as to determine which operation mode to use.
  • adjustments to the speaker data queue may be required due to:
  • Glitches A glitch occurs due to data loss of one or multiple samples in the speaker or microphone streams. When a glitch occurs, an adjustment of many samples of data may be made at once in the speaker queue.
  • Drift A difference in render and capture sampling (clock) rates is called drift, and results in periodic single sample adjustments commensurate with the drift rate.
  • Timestamp Noise Additionally, the timestamp data is not always reliable or noise free. This can lead to irrelevant queue adjustments that can severely impact AEC performance.
  • a timestamp marks the time when the first sample of a data frame is captured or rendered, such as at the A/D converter 120 ( FIG. 4 ) and D/A converter 150 , respectively.
  • the timestamp should match the device's stated sampling rate perfectly. For example, assuming the two-way communication end device 400 has a sampling rate of 16000 Hz and a 10 millisecond data frame is used by the device for capturing and rendering audio signals, then an audio data frame has 160 samples. This means that for each captured or rendered frame containing 160 samples, the timestamp of the first sample of consecutive frames should increase by exactly 10 milliseconds. In other words, the frame length calculated from the timestamps of consecutive frames should be exactly 10 milliseconds.
  • timestamp drift The drift divided by the nominal frame length is called the timestamp drift rate.
  • timestamp noise the difference of the respective frame's length from the long term average frame length is termed the “timestamp noise.”
  • Audio glitches mean discontinuities in the audio stream.
  • audio glitches may occur when the application is not able to retrieve samples in the capture buffer in time so that the capture buffer becomes over full, which can result in lost audio capture samples.
  • audio glitches can mean the application does not fill the render buffer quickly enough, so the audio rendering device has no data to play.
  • the discontinuity caused by an adjustment has a negative impact on AEC performance. Beyond a point at which the periodic or transient adjustments are made at too high of a rate, the adaptive filters used for AEC stop working entirely. In such situations, the two-way communication system should instead enable voice switching mode, which is far less sensitive to timing mismatches.
  • FIG. 5 illustrates a top level decision 500 made by the two-way communication system 400 to select between AEC or voice switching mode operation 410 so as to provide an echo free experience.
  • the two-way communication system evaluates 510 the quality of the microphone and speaker stream timestamps 520 , 530 . If the timestamp quality is found to be poor, the two-way communication system falls back to operate in the voice-switching (half duplex) mode 550 . Else, the two-way communication system continues to operate in full duplex with AEC mode 560 .
  • the evaluation 530 of the timestamp quality for the AEC or voice switching mode decision 500 is based on the rate of adjustments made to the queue during AEC operation.
  • the evaluation considers the overall impact of timestamp quality on the acoustic echo cancellation by examining the rate at which adjustments are being made to the queue.
  • the evaluation assesses whether adjustments to the queue are being made in a consistent/periodic manner, and the frequency of adjustments is within tolerance for drift of the acoustic echo cancellation process.
  • the two-way communication end device 400 calculates an estimate of the drift rate based on queue adjustments (action 620 ).
  • the timestamp drift generally results in periodic single sample adjustments of the queue.
  • the frequency at which these queue adjustments are made therefore relates to the timestamp drift rate, and can be used as an estimate of the drift rate.
  • the timestamp quality evaluation then calculates consistency statistics of the periodic queue adjustments made by the AEC process to compensate for timestamp drift.
  • the statistical median and median absolute deviation of the estimated drift rate (action 630 ) are used as measures of how periodic and consistent the adjustments are made by the acoustic echo cancellation process to the queue. Because under normal operating conditions, the number of adjustments made in a second is quite low, the median and median absolute deviation of the queue adjustment rate provide a robust estimate of consistency.
  • alternative timestamp quality implementation can use other statistical calculations of the consistency of drift rate queue adjustments.
  • the evaluation 530 first requires that a minimum number of adjustments to the queue occur (action 610 ) before the timestamp quality evaluation performs any timestamp quality checks. In one implementation of the evaluation 530 , the evaluation first requires monitoring for a minimum number of queue adjustments over a 10 second window of time before checks on the calculated statistics are performed.
  • the two-way communication end device compares the median and median absolute deviation statistics to threshold values (action 640 ), which reflect the acoustic echo cancellation's tolerance for timestamp drift.
  • threshold values For one example AEC implementation, thresholds of 0.1% for the median drift rate and 0.05 per 1000 for the median absolute deviation are used. However, other AEC implementations may have a lower or higher tolerance for timestamp drift. Accordingly, alternative implementations of the timestamp quality evaluation 530 may apply other threshold values of median estimated drift rate and median absolute deviation of the drift rate. If the calculated median and/or median absolute deviation of the estimated drift rate exceed the threshold values, then the queue adjustments are considered too frequent and/or inconsistent such as to exceed the AEC's tolerance for queue adjustments.
  • the two-way communication end device In addition to the periodic consistency check (action 640 ), the two-way communication end device also applies a glitch frequency check (action 650 ). As discussed previously, audio glitches occur when there is a loss of multiple samples of the microphone and/or speaker queues. This requires a large (multiple samples) adjustment of the queue. For the glitch check (action 650 ), the two-way communication end device checks whether a glitch of greater than 4 milliseconds has occurred more frequently than once per second. If this glitch frequency is exceeded, then the glitch frequency check is failed.
  • the timestamp quality evaluation determines that the timestamp quality is poor (action 660 ). If both the queue adjustments pass both the drift rate consistency ( 640 ) and glitch frequency checks ( 650 ), then the timestamp quality is considered adequate.
  • the two-way communication end device can perform the timestamp quality check at periodic intervals, or simply one or more times at the start of the communication system.
  • each end of the two-way communication system performs the quality check at preset intervals after the communication session (e.g., voice call or conference) is initiated.
  • the initial quality check is done at about 4 seconds after the communication session starts, and is then repeated at 10 second intervals. If all quality checks produce the result that a sufficiently high quality for acoustic echo cancellation exists, then the two-way communication system end device may stop quality checks after 100 seconds. Initially, the two-way communication system end device operates in full duplex mode using acoustic echo cancellation, and continues with that operation so long as the quality checks have the sufficiently high quality result.
  • the two-way communication system may begin in half-duplex mode, continue quality checks throughout the communication session and switch to full duplex communication with acoustic echo cancellation when sufficiently high quality is detected, and otherwise remain in the voice switching mode.
  • the quality checking is performed independently for each end device, which may result in one end device having sufficiently high quality to operate in full duplex with acoustic echo cancellation while the other device has insufficient quality and falls back to the voice switching mode.
  • the two-way communication end device 400 shown in FIG. 4 can be implemented as dedicated or special purpose communication device (e.g., a desktop phone, in which the selective AEC/voice switching mode operation 410 is implemented using a digital signal processor programmed by firmware or software to operate as illustrated in FIGS. 5 and 6 .
  • dedicated or special purpose communication device e.g., a desktop phone, in which the selective AEC/voice switching mode operation 410 is implemented using a digital signal processor programmed by firmware or software to operate as illustrated in FIGS. 5 and 6 .
  • the two-way communication system can be implemented using a general purpose computer with suitable programming to perform the selective AEC/voice switching mode operation using a digital signal processor on a sound card, or even the central processing unit of the computer to perform the digital audio signal processing.
  • the two-way communication system can be a laptop or desktop computer with voice communication software (e.g., a telephony, voice conferencing or voice chat application software).
  • the two-way communication system can be a mobile computing device that provides voice communication.
  • FIG. 7 illustrates a generalized example of a suitable computing environment 700 in which the two-way communication system 400 with selective AEC/voice switching mode operation 410 may be implemented on such general purpose computers.
  • the computing environment 700 is not intended to suggest any limitation as to scope of use or functionality, as described embodiments may be implemented in diverse general-purpose or special-purpose computing environments, as well as dedicated audio processing equipment.
  • the computing environment 700 includes at least one processing unit 710 and memory 720 .
  • the processing unit 710 executes computer-executable instructions and may be a real or a virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power.
  • the processing unit also can comprise a central processing unit and co-processors, and/or dedicated or special purpose processing units (e.g., an audio processor or digital signal processor, such as on a sound card).
  • the memory 720 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory), or some combination of the two.
  • the memory 720 stores software 780 implementing one or more audio processing techniques and/or systems according to one or more of the described embodiments.
  • a computing environment may have additional features.
  • the computing environment 700 includes storage 740 , one or more input devices 750 , one or more output devices 760 , and one or more communication connections 770 .
  • An interconnection mechanism such as a bus, controller, or network interconnects the components of the computing environment 700 .
  • operating system software provides an operating environment for software executing in the computing environment 700 and coordinates activities of the components of the computing environment 700 .
  • the storage 740 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CDs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 700 .
  • the storage 740 stores instructions for the software 780 .
  • the input device(s) 750 may be a touch input device such as a keyboard, mouse, pen, touchscreen or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 700 .
  • the input device(s) 750 may be a microphone, sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD or DVD that reads audio or video samples into the computing environment.
  • the output device(s) 760 may be a display, printer, speaker, CD/DVD-writer, network adapter, or another device that provides output from the computing environment 700 .
  • the communication connection(s) 770 enable communication over a communication medium to one or more other computing entities.
  • the communication medium conveys information such as computer-executable instructions, audio or video information, or other data in a data signal.
  • a modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
  • Computer-readable media are any available media that can be accessed within a computing environment.
  • Computer-readable media include memory 720 , storage 740 , and combinations of any of the above.
  • Embodiments can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor.
  • program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular data types.
  • the functionality of the program modules may be combined or split between program modules as desired in various embodiments.
  • Computer-executable instructions for program modules may be executed within a local or distributed computing environment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)

Abstract

A voice communication end device provides echo reduction when operating in a full duplex mode using acoustic echo cancellation, which includes periodic audio queue adjustments to account for drift. The end device performs a quality assessment by calculating consistency statistics for the queue adjustments to determine whether acoustic echo cancellation would be ineffective, and if so falls back to a half duplex mode using voice switching.

Description

BACKGROUND
Acoustic echo is a common phenomenon occurring in two-way voice communication when open speakers are used. For example, FIG. 1 illustrates one end 100 of a typical two-way communication system. The other end is exactly the same. In such a system, the far-end voice is played through a loud speaker 160 and captured by the microphone 110 in the system and sent back to the far end. The far-end user then hears his or her own voice with a certain delay.
There are a number of known approaches to reducing acoustic echo in two-way communication systems. However, these known approaches face particular problems when applied to voice communication systems using personal computers, such as internet telephony and voice chat applications on personal computers.
1. Acoustic Echo Cancellation
Acoustic Echo Cancellation (AEC) is a digital signal processing technology which is used to remove the acoustic echo from a speaker phone in two-way (full duplex) or multi-way communication systems, such as traditional telephone or modern internet audio conversation applications.
With reference again to the example near end 100 of a typical two-way communication system illustrated in FIG. 1, an Acoustic Echo Cancellation is used to remove echo of the far end user's voice. The example near end 100 includes a capture stream path and a render stream path for the audio data in the two directions. The far end of the two-way communication system is exactly the same. In the capture stream path in the figure, an analog to digital (A/D) converter 120 converts the analog sound mic(t) captured by microphone 110 to digital audio samples continuously at a sampling rate (fsmic). The digital audio samples are saved in capture buffer 130 sample by sample. The samples are retrieved from the capture buffer in frame increments (herein denoted as “mic[n]”). Frame here means a number (N) of digital audio samples. The index ‘n’ is used to indicate relative sampling instants for the frames. Finally, samples in mic[n] are processed, including encoding via a voice encoder 170 and sent to the other end.
In the render stream path, the system receives the encoded voice signal from the other end, decodes audio samples via voice decoder 180 and places the audio samples into a render buffer 140 in periodic frame increments (labeled “spk[n]” in the figure). Then the digital to analog (D/A) converter 150 reads audio samples from the render buffer sample by sample and converts them to an analog signal continuously at a sampling rate, fsspk. Finally, the analog signal is played by speaker 160.
In systems such as that depicted by FIG. 1, the near end user's voice is captured by the microphone 110 and sent to the other end. At the same time, the far end user's voice is transmitted through the network to the near end, and played through the speaker 160 or headphone. In this way, both users can hear each other and two-way communication is established. But, a problem occurs if a speaker is used instead of a headphone to play the other end's voice. For example, if the near end user uses a speaker as shown in FIG. 1, his microphone captures not only his voice but also an echo of the sound played from the speaker (labeled as “echo(t)”). In this case, the mic[n] signal that is sent to the far end user includes an echo of the far end user's voice. As the result, the far end user would hear a delayed echo of his or her voice, which is likely to cause annoyance and provide a poor user experience to that user.
Practically, the echo echo(t) can be represented by speaker signal spk(t) convolved by a linear response g(t) (assuming the room can be approximately modeled as a finite duration linear plant) as per the following equation:
echo ( t ) = spk ( t ) * g ( t ) = 0 T e g ( τ ) · spk ( t - τ ) τ
where * means convolution, Te is the echo length or filter length of the room response.
In order to remove the echo for the remote user, AEC 210 is added in the system as shown in FIG. 2. When a frame of samples in the mic[n] signal is retrieved from the capture buffer 130, they are sent to the AEC 210. At the same time, when a frame of samples in the spk[n] signal is sent to the render buffer 140, they are also sent to the AEC 210. The AEC 210 uses the spk[n] signal from the far end to predict the echo in the captured mic[n] signal. Then, the AEC 210 subtracts the predicted echo from the mic[n] signal. This difference or residual is the clear voice signal (voice[n]), which is theoretically echo free and very close to the near end user's voice (voice(t)).
FIG. 3 depicts an implementation of the AEC 210 based on an adaptive filter 310. The AEC 210 takes two inputs, the mic[n] and spk[n] signals. It uses the spk[n] signal to predict the echo in the mic[n] signal. The prediction residual (difference of the mic[n] signal from the prediction based on spk[n]) is the voice[n] signal, which will be output as echo free voice and sent to the far end.
The actual room response (that is represented as g(t) in the above convolution equation) usually varies with time, such as due to change in position of the microphone 110 or speaker 160, body movement of the near end user, and even room temperature. The room response therefore cannot be pre-determined, and must be calculated adaptively at running time. The AEC 210 commonly is based on adaptive filters such as Least Mean Square (LMS) adaptive filters 310, which can adaptively model the varying room response.
The nature of adaptive filtering requires that the microphone signal and the reference or speaker signal must be accurately aligned. In basic terms, the AEC mode has to determine which samples in the speaker signal (spk[n]) are needed to predict the echo at a given sample in the microphone signal (mic[n]). In practical terms, the AEC operates on two streams (the microphone and speaker samples), which generally are sampled by two different sampling clocks and may each be subject to delays. Accordingly, the same indices in the two stream may not be necessarily aligned in physical time. On personal computers, timestamps are typically used to align the microphone and speaker signals, since the timestamp represents the physical time of when a sample is rendered (in the speaker stream) or captured (in the microphone stream). Frames of speaker spk[n] and microphone mic[n] signals are stored in separate data queues and the timestamps are used to make adjustments to the speaker (or microphone) data queues in order to align the speaker and microphone signals. A difference in render and capture sampling (clock) rates is called drift, and to compensate for this, periodic single sample adjustments commensurate with the drift rate are made to the speaker data queue. Also when a glitch occurs (i.e., data loss of one or multiple samples in the speaker or microphone streams) an adjustment of many samples of data may be made at once in the speaker data queue.
However, in practice, these timestamps are noisy and sometimes can be very wrong. One reason for this is that major operating systems, such as Microsoft Windows XP operating system, support numerous different audio devices. It is quite common that some audio device and its driver cannot provide accurate timestamps. In such case, the signals are often out of alignment, and the AEC fails to properly cancel echoes.
2. Voice Switching
Voice switching is a method used for half-duplex two-way communication. A typical example of such communication system has two signal channels: an incoming channel that receives the voice signal coming from the far-end, and an outgoing channel that sends the near end voice signal to the far-end. In a person-to-person scenario, the far-end may be another end user device. Alternatively, in a conference or multi-user scenario, the far end may be a server that hosts the multiple user conference. Based on voice activity being present at the two ends, the channels are selectively turned on or off. In other words, whenever there is voice activity in one channel, the other channel is turned off. By selectively switching off either incoming or outgoing channels based on voice activity in this way, the echo path is broken, which effectively removes acoustic echoes. The drawback of voice switching, however, is that it provides only half-duplex mode of communication, resulting in loss of easy interruptability in conversations.
Voice switching is commonly used on low-end desktop phones in speaker phone mode. A basic voice switching algorithm simply compares the strength of near-end and far-end voices and turns on the communication channel for the end with the stronger voice. It is relatively simple to compare voice activity on a standalone or dedicated phone device, because the microphone and speaker gains are known. During double talk scenarios (i.e., in which both ends are talking simultaneously), it is easy to estimate echo strength and thus easy to compare which voice is stronger. However, for voice communication applications on personal computers, any microphone or speaker may be connected to the computer, and the gains could be adjusted by the users at any time. This complicates the ability to estimate the echo strength, and therefore to compare the voice strength on the channels to accurately determine which channel should be switched on.
SUMMARY
The following Detailed Description concerns techniques (implemented via methods, devices and systems) to reduce acoustic echo in a two way voice communications system. According to the described techniques, the system utilizes acoustic echo cancellation for full duplex voice communications between two communication end devices under normal operating conditions. Additionally, the system includes a voice switching mode as a fall back for situations where acoustic echo cancellation fails or would likely fail to function properly. The technique concerns ways to appropriately decide between use of the normal operation mode utilizing acoustic echo cancellation, and use of the voice switching mode.
One way for making the decision between these modes of operation is to enable the voice switching mode based independently on timestamp based factors that can lead to speaker (or microphone data) queue adjustments during acoustic echo cancellation operation.
According to the technique described herein, the decision between modes relies on an overall measurement of timestamp quality. More particularly, the combined effect of all the timestamp parameter variations on the acoustic echo cancellation process occurs through adjustments to the input data queue (e.g., adjusting the relative offset between speaker and microphone queues or buffers). Hence, the overall impact of timestamp quality on the acoustic echo cancellation can be investigated by examining the rate at which adjustments are being made to the queue. In general summary, the technique therefore chooses or remains in the acoustic echo cancellation mode (i.e., does not enable voice switching) if adjustments to the queue are being made in a consistent/periodic manner, and the frequency of adjustments is within tolerance for drift of the acoustic echo cancellation process.
In one example implementation, the median and median absolute deviation of the rate at which adjustments are made to the queue is used as a measure of queue update consistency. This implementation of the mode decision technique then decides to enable voice switching if the median drift rate or median absolute deviation of the estimated drift rate (both based on queue adjustments) or the rate at which glitches occur exceed pre-determined thresholds. These checks are applied only after a minimum number of adjustments have been made, so as to ensure that the derived statistics are reliable.
This Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Additional features and advantages of the invention will be made apparent from the following detailed description of embodiments that proceeds with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram illustrating one end of a typical two-way communication system in the prior art.
FIG. 2 is a block diagram of the two-way communication system of FIG. 1 with audio echo cancellation.
FIG. 3 is a block diagram of an implementation of audio echo cancellation based on an adaptive filter.
FIG. 4 is a flow diagram illustrating an end device of a two-way voice communication system including selection of acoustic echo cancellation or voice switching mode operation based on timestamp quality.
FIG. 5 is a flow diagram illustrating a decision to operate in acoustic echo cancellation mode or to enable voice switching mode based on timestamp quality in the two-way voice communication end device of FIG. 4.
FIG. 6 is a flow diagram illustrating a process to evaluate timestamp quality for the acoustic echo cancellation or voice switching mode decision of FIG. 5 using measurements of consistency of the rate at which queue adjustments are made.
FIG. 7 is a block diagram of a generalized operating environment in conjunction with which various described embodiments may be implemented.
DETAILED DESCRIPTION
The following detailed description concerns various techniques and systems for providing acoustic echo cancellation with voice switching as a fall back mode in two-way communication systems. The described techniques provide a mode decision that reliably and accurately assesses whether acoustic echo cancellation is feasible based on timestamp quality, by measuring the consistency of the frequency at which adjustments are made to input data queues due to timestamp drift rate. The mode decision techniques are described with particular application in personal computer based telephony and voice chat applications, where the voice switching technique may be employed as a fall back measure in the case that acoustic echo cancellation fails to work properly (such as, due to inaccurate or “noisy” timestamps preventing alignment of microphone and speaker signals). However, the techniques to decide whether to fall back to voice switching can be applied more broadly to other two-way voice communication systems and scenarios.
The various techniques and tools described herein may be used independently. Some of the techniques and tools may be used in combination. Various techniques are described below with reference to flowcharts of processing acts. The various processing acts shown in the flowcharts may be consolidated into fewer acts or separated into more acts. For the sake of simplicity, the relation of acts shown in a particular flowchart to acts described elsewhere is often not shown. In many cases, the acts in a flowchart can be reordered.
I. Overview of Two-Way Communication System with Improved AEC/Voice Switch Mode Selection
FIG. 4 illustrates one end of a two-way communication system that includes the improved AEC or Voice Switching Mode selection, as described more fully below. The other end is typically, but not necessarily identical. Each end may be a communication device, such as a phone device or a personal computer with a telephony or voice chat application, or a game console, among other examples. In some implementations, the far end can be a communication server, such as a voice conferencing host server.
The illustrated near end 100 includes a capture stream path and a render stream path for the audio data in the two directions. In the capture stream path in the figure, an analog to digital (A/D) converter 120 converts the analog sound captured by microphone 110 to digital audio samples continuously at a sampling rate (fsmic). The digital audio samples are saved in capture buffer 130 sample by sample. The samples are retrieved from the capture buffer in frame increments (herein denoted as “mic[n]”). Frame here means a number (N) of digital audio samples. Finally, samples in mic[n] are processed, including encoding via a voice encoder 170 and sent to the other end.
In the render stream path, the system receives the encoded voice signal from the other end, decodes audio samples via voice decoder 180 and places the audio samples into a render buffer 140 in periodic frame increments (labeled “spk[n]” in the figure). Then the digital to analog (D/A) converter 150 reads audio samples from the render buffer sample by sample and converts them to an analog signal continuously at a sampling rate, fsspk. Finally, the analog signal is played by speaker 160.
The capture and render buffers (with associated histories) 130, 140 are also referred to herein as microphone and speaker data queues, respectively.
The illustrated communication system end device includes selective AEC or voice switching mode operation 410 to avoid or reduce acoustic echo (echo(t)) of the far end speaker's voice picked up by the microphone 110 to be sent back to the far end.
II. AEC or Voice Switch Mode Decision
In general use, the selective AEC/voice switching mode operation 410 of the two-way communication system provides full duplex two-way communication using acoustic echo cancellation 210, which is implemented as illustrated in FIG. 2 and discussed above. The selective mode operation instead enables voice switching mode as a fall back mechanism under operating conditions where the acoustic echo cancellation 210 would fail to work properly to cancel echo or would introduce unacceptable noise or distortion effects. Accordingly, the two-way communication end 400 analyzes the AEC mode operation via one or more quality checks to determine if the communication quality is sufficient for acoustic echo cancellation to work properly, so as to determine which operation mode to use.
For the AEC to maintain the speaker and microphone streams in synchrony for proper AEC behavior, information from the timestamps is used to make adjustments to the speaker data queue (or in alternative implementations, the adjustments can be made to the microphone queue). Depending on the physical conditions, adjustments to the speaker data queue may be required due to:
(1) Glitches: A glitch occurs due to data loss of one or multiple samples in the speaker or microphone streams. When a glitch occurs, an adjustment of many samples of data may be made at once in the speaker queue.
(2) Drift: A difference in render and capture sampling (clock) rates is called drift, and results in periodic single sample adjustments commensurate with the drift rate.
(3) Timestamp Noise: Additionally, the timestamp data is not always reliable or noise free. This can lead to irrelevant queue adjustments that can severely impact AEC performance.
In more detail, a timestamp marks the time when the first sample of a data frame is captured or rendered, such as at the A/D converter 120 (FIG. 4) and D/A converter 150, respectively. Ideally, the timestamp should match the device's stated sampling rate perfectly. For example, assuming the two-way communication end device 400 has a sampling rate of 16000 Hz and a 10 millisecond data frame is used by the device for capturing and rendering audio signals, then an audio data frame has 160 samples. This means that for each captured or rendered frame containing 160 samples, the timestamp of the first sample of consecutive frames should increase by exactly 10 milliseconds. In other words, the frame length calculated from the timestamps of consecutive frames should be exactly 10 milliseconds.
In practice, there may be errors in the timestamps, as discussed more fully in the background. This can result in the frame length calculated from timestamps being more or less than expected length (e.g., 10 milliseconds in this example implementation). If the long term average of the calculated frame length varies from the expected frame length, then the difference is called the timestamp drift. The drift divided by the nominal frame length is called the timestamp drift rate. A non-zero timestamp drift rate signifies the communication end device's sampling rate is off from its claimed or nominal rate. Finally, for each audio data frame, the difference of the respective frame's length from the long term average frame length is termed the “timestamp noise.”
Further, as already remarked, queue adjustments also may result from audio glitches. Audio glitches mean discontinuities in the audio stream. For audio data capture (e.g., the microphone 110 and A/D converter 120), audio glitches may occur when the application is not able to retrieve samples in the capture buffer in time so that the capture buffer becomes over full, which can result in lost audio capture samples. For the audio data rendering (e.g., from loudspeaker 160 and D/A converter), audio glitches can mean the application does not fill the render buffer quickly enough, so the audio rendering device has no data to play.
Even when the timestamp data matches the physical situation accurately and adjustments to the queue are necessary to maintain correct AEC operation, the discontinuity caused by an adjustment has a negative impact on AEC performance. Beyond a point at which the periodic or transient adjustments are made at too high of a rate, the adaptive filters used for AEC stop working entirely. In such situations, the two-way communication system should instead enable voice switching mode, which is far less sensitive to timing mismatches.
FIG. 5 illustrates a top level decision 500 made by the two-way communication system 400 to select between AEC or voice switching mode operation 410 so as to provide an echo free experience. For this decision 540, the two-way communication system evaluates 510 the quality of the microphone and speaker stream timestamps 520, 530. If the timestamp quality is found to be poor, the two-way communication system falls back to operate in the voice-switching (half duplex) mode 550. Else, the two-way communication system continues to operate in full duplex with AEC mode 560.
III. Timestamp Quality Evaluation
With reference now to FIG. 6, the evaluation 530 of the timestamp quality for the AEC or voice switching mode decision 500 (FIG. 5) is based on the rate of adjustments made to the queue during AEC operation. The evaluation considers the overall impact of timestamp quality on the acoustic echo cancellation by examining the rate at which adjustments are being made to the queue. In summary, the evaluation assesses whether adjustments to the queue are being made in a consistent/periodic manner, and the frequency of adjustments is within tolerance for drift of the acoustic echo cancellation process.
For the evaluation 530, the two-way communication end device 400 calculates an estimate of the drift rate based on queue adjustments (action 620). As discussed above, the timestamp drift generally results in periodic single sample adjustments of the queue. The frequency at which these queue adjustments are made therefore relates to the timestamp drift rate, and can be used as an estimate of the drift rate.
The timestamp quality evaluation then calculates consistency statistics of the periodic queue adjustments made by the AEC process to compensate for timestamp drift. In one example timestamp quality evaluation implementation, the statistical median and median absolute deviation of the estimated drift rate (action 630) are used as measures of how periodic and consistent the adjustments are made by the acoustic echo cancellation process to the queue. Because under normal operating conditions, the number of adjustments made in a second is quite low, the median and median absolute deviation of the queue adjustment rate provide a robust estimate of consistency. However, alternative timestamp quality implementation can use other statistical calculations of the consistency of drift rate queue adjustments.
In order to ensure that the derived statistics are reliable, the evaluation 530 first requires that a minimum number of adjustments to the queue occur (action 610) before the timestamp quality evaluation performs any timestamp quality checks. In one implementation of the evaluation 530, the evaluation first requires monitoring for a minimum number of queue adjustments over a 10 second window of time before checks on the calculated statistics are performed.
After at least this minimum time window (action 610) has passed, the two-way communication end device compares the median and median absolute deviation statistics to threshold values (action 640), which reflect the acoustic echo cancellation's tolerance for timestamp drift. For one example AEC implementation, thresholds of 0.1% for the median drift rate and 0.05 per 1000 for the median absolute deviation are used. However, other AEC implementations may have a lower or higher tolerance for timestamp drift. Accordingly, alternative implementations of the timestamp quality evaluation 530 may apply other threshold values of median estimated drift rate and median absolute deviation of the drift rate. If the calculated median and/or median absolute deviation of the estimated drift rate exceed the threshold values, then the queue adjustments are considered too frequent and/or inconsistent such as to exceed the AEC's tolerance for queue adjustments.
In addition to the periodic consistency check (action 640), the two-way communication end device also applies a glitch frequency check (action 650). As discussed previously, audio glitches occur when there is a loss of multiple samples of the microphone and/or speaker queues. This requires a large (multiple samples) adjustment of the queue. For the glitch check (action 650), the two-way communication end device checks whether a glitch of greater than 4 milliseconds has occurred more frequently than once per second. If this glitch frequency is exceeded, then the glitch frequency check is failed.
In the case that either the queue adjustments made by the AEC process exceed the consistency statistics thresholds or the glitch frequency check, then the timestamp quality evaluation determines that the timestamp quality is poor (action 660). If both the queue adjustments pass both the drift rate consistency (640) and glitch frequency checks (650), then the timestamp quality is considered adequate.
The two-way communication end device can perform the timestamp quality check at periodic intervals, or simply one or more times at the start of the communication system. In one example implementation, each end of the two-way communication system performs the quality check at preset intervals after the communication session (e.g., voice call or conference) is initiated. The initial quality check is done at about 4 seconds after the communication session starts, and is then repeated at 10 second intervals. If all quality checks produce the result that a sufficiently high quality for acoustic echo cancellation exists, then the two-way communication system end device may stop quality checks after 100 seconds. Initially, the two-way communication system end device operates in full duplex mode using acoustic echo cancellation, and continues with that operation so long as the quality checks have the sufficiently high quality result. However, if a quality check fails, then the two-way communication system end switches over to voice switching mode of operation. In alternative implementations, the two-way communication system may begin in half-duplex mode, continue quality checks throughout the communication session and switch to full duplex communication with acoustic echo cancellation when sufficiently high quality is detected, and otherwise remain in the voice switching mode. The quality checking is performed independently for each end device, which may result in one end device having sufficiently high quality to operate in full duplex with acoustic echo cancellation while the other device has insufficient quality and falls back to the voice switching mode.
IV. Computing Environment
The two-way communication end device 400 shown in FIG. 4 can be implemented as dedicated or special purpose communication device (e.g., a desktop phone, in which the selective AEC/voice switching mode operation 410 is implemented using a digital signal processor programmed by firmware or software to operate as illustrated in FIGS. 5 and 6.
Alternatively, the two-way communication system can be implemented using a general purpose computer with suitable programming to perform the selective AEC/voice switching mode operation using a digital signal processor on a sound card, or even the central processing unit of the computer to perform the digital audio signal processing. For example, the two-way communication system can be a laptop or desktop computer with voice communication software (e.g., a telephony, voice conferencing or voice chat application software). Alternatively, the two-way communication system can be a mobile computing device that provides voice communication. FIG. 7 illustrates a generalized example of a suitable computing environment 700 in which the two-way communication system 400 with selective AEC/voice switching mode operation 410 may be implemented on such general purpose computers. The computing environment 700 is not intended to suggest any limitation as to scope of use or functionality, as described embodiments may be implemented in diverse general-purpose or special-purpose computing environments, as well as dedicated audio processing equipment.
With reference to FIG. 7, the computing environment 700 includes at least one processing unit 710 and memory 720. In FIG. 7, this most basic configuration 730 is included within a dashed line. The processing unit 710 executes computer-executable instructions and may be a real or a virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. The processing unit also can comprise a central processing unit and co-processors, and/or dedicated or special purpose processing units (e.g., an audio processor or digital signal processor, such as on a sound card). The memory 720 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory), or some combination of the two. The memory 720 stores software 780 implementing one or more audio processing techniques and/or systems according to one or more of the described embodiments.
A computing environment may have additional features. For example, the computing environment 700 includes storage 740, one or more input devices 750, one or more output devices 760, and one or more communication connections 770. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 700. Typically, operating system software (not shown) provides an operating environment for software executing in the computing environment 700 and coordinates activities of the components of the computing environment 700.
The storage 740 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CDs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 700. The storage 740 stores instructions for the software 780.
The input device(s) 750 may be a touch input device such as a keyboard, mouse, pen, touchscreen or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment 700. For audio or video, the input device(s) 750 may be a microphone, sound card, video card, TV tuner card, or similar device that accepts audio or video input in analog or digital form, or a CD or DVD that reads audio or video samples into the computing environment. The output device(s) 760 may be a display, printer, speaker, CD/DVD-writer, network adapter, or another device that provides output from the computing environment 700.
The communication connection(s) 770 enable communication over a communication medium to one or more other computing entities. The communication medium conveys information such as computer-executable instructions, audio or video information, or other data in a data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
Embodiments can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 700, computer-readable media include memory 720, storage 740, and combinations of any of the above.
Embodiments can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc. that perform particular tasks or implement particular data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
For the sake of presentation, the detailed description uses terms like “determine,” “receive,” and “perform” to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
In view of the many possible embodiments to which the principles of our invention may be applied, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.

Claims (20)

1. A method of preventing acoustic echo in a two-way voice communication end device via queue adjustments, the method comprising:
upon starting a communication session with another communication end device, making adjustments to an audio data queue corresponding to either a render stream or a capture stream of the two-way voice communication end device based at least in part on an offset of respective timestamps of the render stream and the capture stream;
assessing consistency of the adjustments made to the audio data queue;
determining whether a timestamp drift rate is within a tolerance threshold for effective acoustic echo cancellation from the consistency assessment;
in a case that the timestamp drift rate is within the tolerance threshold, operating in a full duplex voice communication with acoustic echo cancellation mode; and
in a case that the timestamp drift rate exceeds the tolerance threshold, operating in a half duplex mode with voice switching.
2. The method of claim 1 wherein said assessing consistency of the queue adjustments comprises estimating the timestamp drift rate based on a frequency at which the queue adjustments are made.
3. The method of claim 1 wherein said assessing consistency of the queue adjustments comprises calculating a median absolute deviation of a drift rate relating to a rate of queue adjustments.
4. The method of claim 3 wherein said determining whether the timestamp drift rate is within the tolerance threshold comprises checking whether the calculated median absolute deviation of the drift rate exceeds the tolerance threshold.
5. The method of claim 1 wherein said assessing consistency of the queue adjustments comprises calculating a median of a drift rate relating to a rate of queue adjustments.
6. The method of claim 5 wherein said determining whether the timestamp drift rate is within the tolerance threshold comprises checking whether the calculated median of the drift rate exceeds the tolerance threshold.
7. The method of claim 1 further comprising:
performing a glitch frequency check; and
in a case that the glitch frequency check fails, operating in the half duplex mode with voice switching.
8. One or more computer-readable media, not consisting of a signal, having computer executable instructions, which, when executed by a computer, cause the computer to perform the method of claim 1.
9. The method of claim 1 wherein said assessing consistency of the queue adjustments comprises assessing periodicity of a frequency at which the queue adjustments are made.
10. The method of claim 1 wherein the timestamp drift rate is indicative of a difference between a calculated frame length and an expected frame length of the audio data queue.
11. A two-way voice communication end device comprising:
an audio render device for playing an incoming voice signal received from another end device to a user, and providing timestamps for a render audio stream of the incoming voice signal;
an outgoing voice channel for capturing an outgoing voice signal from the user for sending to said other end device, and providing timestamps for a capture audio stream of the outgoing voice signal;
an audio processor for processing the capture and render audio streams according to a selective one of at least two operating modes comprising a full duplex mode using acoustic echo cancellation and a half duplex mode using voice switching to reduce acoustic echo via queue adjustments, wherein the audio processor makes adjustments to an audio data queue corresponding to at least one of the render audio stream and the capture audio stream due to a relative offset between the timestamps for the capture audio stream and the timestamps for the render audio stream during operation in the full duplex mode;
the audio processor performing a timestamp quality evaluation to select the operating mode, wherein the timestamp quality evaluation comprises the acts of:
assessing consistency of the adjustments made to the audio data queue;
determining whether a timestamp drift rate is within a tolerance threshold for effective acoustic echo cancellation from the consistency assessment;
in a case that the timestamp drift rate is within the tolerance threshold, operating in a full duplex voice communication with acoustic echo cancellation mode; and
in a case that the timestamp drift rate exceeds the tolerance threshold, operating in the half duplex mode.
12. The two-way voice communication end device of claim 11 wherein said assessing consistency of the queue adjustments comprises estimating the timestamp drift rate based on a frequency at which the queue adjustments are made.
13. The two-way voice communication end device of claim 11 wherein said assessing consistency of the queue adjustments comprises calculating a median absolute deviation of a drift rate relating to a rate of queue adjustments.
14. The two-way voice communication end device of claim 13 wherein said determining whether the timestamp drift rate is within the tolerance threshold comprises checking whether the calculated median absolute deviation of the drift rate exceeds the tolerance threshold.
15. The two-way voice communication end device of claim 11 wherein said assessing consistency of the queue adjustments comprises calculating a median of a drift rate relating to a rate of queue adjustments.
16. The two-way voice communication end device of claim 15 wherein said determining whether the timestamp drift rate is within the tolerance threshold comprises checking whether the calculated median of the drift rate exceeds the tolerance threshold.
17. The two-way voice communication end device of claim 11 the audio processor further performs a glitch frequency check, and if the glitch frequency check fails, operating in the half duplex mode.
18. The two-way voice communication end device of claim 11 wherein the timestamp drift rate is indicative of a difference between a calculated frame length and an expected frame length of the audio data queue.
19. A method comprising:
receiving an audio data queue with timestamps that has been adjusted due to relative offset of render and capture streams;
based at least in part on a rate at which the adjustments were made to the audio data queue, performing a timestamp quality evaluation on the received audio data queue, wherein the performing of the timestamp quality evaluation comprises determining based at least in part on a tolerance threshold whether timestamp quality is poor or good;
if the timestamp quality is determined to be poor, indicating that a two-way communication system operate in half duplex mode with voice switching; and
if the timestamp quality is determined to be good, indicating that the two-way communication system operate in a full duplex mode with acoustic echo cancellation.
20. The method of claim 19 wherein the performing of the timestamp evaluation comprises determining a drift rate using the rate at which the adjustments were made to the audio data queue, and the determining of whether the timestamp quality is poor or good is based at least in part on the drift rate.
US12/143,625 2008-06-20 2008-06-20 Timestamp quality assessment for assuring acoustic echo canceller operability Active 2031-03-07 US8369251B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US12/143,625 US8369251B2 (en) 2008-06-20 2008-06-20 Timestamp quality assessment for assuring acoustic echo canceller operability

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/143,625 US8369251B2 (en) 2008-06-20 2008-06-20 Timestamp quality assessment for assuring acoustic echo canceller operability

Publications (2)

Publication Number Publication Date
US20090316881A1 US20090316881A1 (en) 2009-12-24
US8369251B2 true US8369251B2 (en) 2013-02-05

Family

ID=41431304

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/143,625 Active 2031-03-07 US8369251B2 (en) 2008-06-20 2008-06-20 Timestamp quality assessment for assuring acoustic echo canceller operability

Country Status (1)

Country Link
US (1) US8369251B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10796709B1 (en) 2019-05-16 2020-10-06 Microsoft Technology Licensing, Llc Acoustic echo cancellation bypass

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8599704B2 (en) * 2007-01-23 2013-12-03 Microsoft Corporation Assessing gateway quality using audio systems
US20130332155A1 (en) * 2012-06-06 2013-12-12 Microsoft Corporation Double-Talk Detection for Audio Communication
KR20160023089A (en) * 2014-08-21 2016-03-03 엘지전자 주식회사 Digital device and method for controlling the same
CN108877825A (en) * 2018-06-26 2018-11-23 珠海宏桥高科技有限公司 A kind of Network echo cancellation element and method based on voice-activated and logic control
CN112735398B (en) * 2019-10-28 2022-09-06 思必驰科技股份有限公司 Man-machine conversation mode switching method and system
CN112002315B (en) * 2020-07-28 2023-12-29 珠海格力节能环保制冷技术研究中心有限公司 Voice control method and device, electrical equipment, storage medium and processor

Citations (71)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4039753A (en) 1974-06-05 1977-08-02 Elektroakusztikai Gyar Singing suppressor device
US4069395A (en) 1977-04-27 1978-01-17 Bell Telephone Laboratories, Incorporated Analog dereverberation system
US4275398A (en) 1979-04-06 1981-06-23 Del Norte Technology, Inc. FM Radio ranging system
US4359606A (en) 1979-05-11 1982-11-16 Toa Electric Co., Ltd. Voice switching control system
US4636586A (en) 1985-09-20 1987-01-13 Rca Corporation Speakerphone with adaptive cancellation of room echoes
US4696015A (en) 1983-10-28 1987-09-22 Etablissement Public De Diffusion Dit Telediffusion De France Echo correction especially for television broadcast systems
US4696032A (en) 1985-02-26 1987-09-22 Siemens Corporate Research & Support, Inc. Voice switched gain system
US5099472A (en) 1989-10-24 1992-03-24 Northern Telecom Limited Hands free telecommunication apparatus and method
US5263019A (en) 1991-01-04 1993-11-16 Picturetel Corporation Method and apparatus for estimating the level of acoustic feedback between a loudspeaker and microphone
GB2269968A (en) 1992-08-14 1994-02-23 British Broadcasting Corp Method and apparatus for attenuating an unwanted signal in a mix of signals
US5305307A (en) 1991-01-04 1994-04-19 Picturetel Corporation Adaptive acoustic echo canceller having means for reducing or eliminating echo in a plurality of signal bandwidths
US5323459A (en) 1992-11-10 1994-06-21 Nec Corporation Multi-channel echo canceler
US5353348A (en) 1993-05-14 1994-10-04 Jrc International, Inc. Double echo cancelling system
US5430796A (en) 1990-11-19 1995-07-04 Nec Corporation Auxiliary attenuator arrangement for use in voice-switched speakerphone and method of selectively inserting communication loss in speakerphone paths
US5454041A (en) 1994-02-25 1995-09-26 Sony Electronics Inc. Mix-minus monitor system
US5477534A (en) 1993-07-30 1995-12-19 Kyocera Corporation Acoustic echo canceller
US5542000A (en) 1993-03-19 1996-07-30 Yamaha Corporation Karaoke apparatus having automatic effector control
US5559793A (en) 1993-01-08 1996-09-24 Multi-Tech Systems, Inc. Echo cancellation system and method
US5619582A (en) 1996-01-16 1997-04-08 Oltman; Randy Enhanced concert audio process utilizing a synchronized headgear system
US5646990A (en) 1995-09-21 1997-07-08 Rockwell International Corporation Efficient speakerphone anti-howling system
US5666407A (en) 1995-12-05 1997-09-09 Ncr Corporation Software-based bridging system for full duplex audio telephone conferencing
US5680450A (en) 1995-02-24 1997-10-21 Ericsson Inc. Apparatus and method for canceling acoustic echoes including non-linear distortions in loudspeaker telephones
US5721730A (en) 1995-07-06 1998-02-24 Coherent Communications Systems Corp. Spectral noise compensation for telecommunication signal processing
US5923749A (en) 1995-12-06 1999-07-13 Telefonaktiebolaget Lm Ericsson Method and system for eliminating acoustic echos in a digital telecommunication system
US6002776A (en) 1995-09-18 1999-12-14 Interval Research Corporation Directional acoustic signal processor and method therefor
US6044150A (en) 1997-08-21 2000-03-28 3Com Corporation Method and apparatus for a host-based personal computer speakerphone
US6078645A (en) * 1997-02-20 2000-06-20 Lucent Technologies Inc. Apparatus and method for monitoring full duplex data communications
US6081722A (en) 1994-03-16 2000-06-27 U.S. Philips Corporation Mobile radio transmission system with channel allocation, utilizing the extended kohonen networks
US6169728B1 (en) 1996-03-29 2001-01-02 Motorola Inc. Apparatus and method for spectrum management in a multipoint communication system
US6205124B1 (en) 1996-12-31 2001-03-20 Compaq Computer Corporation Multipoint digital simultaneous voice and data system
US6212273B1 (en) * 1998-03-20 2001-04-03 Crystal Semiconductor Corporation Full-duplex speakerphone circuit including a control interface
US6215880B1 (en) 1996-04-24 2001-04-10 Nec Corporation Two-wire conference apparatus with echo canceler without training
US6219418B1 (en) 1995-10-18 2001-04-17 Telefonaktiebolaget Lm Ericsson (Publ) Adaptive dual filter echo cancellation method
US6324170B1 (en) 1998-09-10 2001-11-27 Nortel Networks Limited Echo controller with compensation for variable delay networks
US6377679B1 (en) 1996-12-26 2002-04-23 Kabushiki Kaisha Kobe Seiko Sho Speakerphone
US6418203B1 (en) 1997-06-06 2002-07-09 Data Race, Inc. System and method for communicating audio information between a computer and a duplex speakerphone modem
US20020090008A1 (en) 1997-10-22 2002-07-11 Cioffi John M. Method and apparatus for coordinating multi-point to point communications in a multi-tone data transmission system
US20020101982A1 (en) 2001-01-30 2002-08-01 Hammam Elabd Line echo canceller scalable to multiple voice channels/ports
US6535609B1 (en) 1997-06-03 2003-03-18 Lear Automotive Dearborn, Inc. Cabin communication system
US6574336B1 (en) 1996-06-19 2003-06-03 Nokia Telecommunications Oy Echo suppressor and non-linear processor of echo canceller
US6583821B1 (en) * 1999-07-16 2003-06-24 Thomson Licensing S.A. Synchronizing apparatus for a compressed audio/video signal receiver
US20030174847A1 (en) 1998-07-31 2003-09-18 Circuit Research Labs, Inc. Multi-state echo suppressor
US20030206624A1 (en) 2002-05-03 2003-11-06 Acoustic Technologies, Inc. Full duplex echo cancelling circuit
US20030206625A9 (en) 2001-02-16 2003-11-06 Jamil Ahmad Tone detection and echo cancellation in a communications network
US20040001597A1 (en) 2002-07-01 2004-01-01 Tandberg Asa Audio communication system and method with improved acoustic characteristics
US20040001598A1 (en) 2002-06-05 2004-01-01 Balan Radu Victor System and method for adaptive multi-sensor arrays
US20040013275A1 (en) 2002-06-05 2004-01-22 Balan Radu Victor Apparatus and method for estimating the direction of arrival of a source signal using a microphone array
US6707910B1 (en) 1997-09-04 2004-03-16 Nokia Mobile Phones Ltd. Detection of the speech activity of a source
US6724736B1 (en) 2000-05-12 2004-04-20 3Com Corporation Remote echo cancellation in a packet based network
US20040081315A1 (en) 2002-10-25 2004-04-29 Boland Simon Daniel Echo detection and monitoring
US6738358B2 (en) 2000-09-09 2004-05-18 Intel Corporation Network echo canceller for integrated telecommunications processing
US6748086B1 (en) 2000-10-19 2004-06-08 Lear Corporation Cabin communication system without acoustic echo cancellation
US20040125942A1 (en) 2002-11-29 2004-07-01 Franck Beaucoup Method of acoustic echo cancellation in full-duplex hands free audio conferencing with spatial directivity
US20040141528A1 (en) 2003-01-21 2004-07-22 Leblanc Wilfrid Using RTCP statistics for media system control
US6799062B1 (en) 2000-10-19 2004-09-28 Motorola Inc. Full-duplex hands-free transparency circuit and method therefor
US6868157B1 (en) 1997-09-16 2005-03-15 Sanyo Electric Co., Ltd. Echo canceling method, echo canceller and voice switch
US20050223113A1 (en) * 2002-07-02 2005-10-06 Klaus Hoffmann Runtime-dependent switching off of the echo compensation in packet networks
US20060018459A1 (en) 2004-06-25 2006-01-26 Mccree Alan V Acoustic echo devices and methods
US7031269B2 (en) 1997-11-26 2006-04-18 Qualcomm Incorporated Acoustic echo canceller
US7085370B1 (en) 2000-06-30 2006-08-01 Telefonaktiebolaget Lm Ericsson (Publ) Ringback detection circuit
US7120259B1 (en) 2002-05-31 2006-10-10 Microsoft Corporation Adaptive estimation and compensation of clock drift in acoustic echo cancellers
US20070019802A1 (en) * 2005-06-30 2007-01-25 Symbol Technologies, Inc. Audio data stream synchronization
US20070041324A1 (en) * 2005-06-10 2007-02-22 Kishan Shenoi Adaptive play-out buffers and adaptive clock operation in packet networks
US20070165838A1 (en) 2006-01-13 2007-07-19 Microsoft Corporation Selective glitch detection, clock drift compensation, and anti-clipping in audio echo cancellation
US20070165837A1 (en) * 2005-12-30 2007-07-19 Microsoft Corporation Synchronizing Input Streams for Acoustic Echo Cancellation
US20070263849A1 (en) 2006-04-28 2007-11-15 Microsoft Corporation Integration of a microphone array with acoustic echo cancellation and center clipping
US20070263850A1 (en) 2006-04-28 2007-11-15 Microsoft Corporation Integration of a microphone array with acoustic echo cancellation and residual echo suppression
US20070280472A1 (en) 2006-05-30 2007-12-06 Microsoft Corporation Adaptive acoustic echo cancellation
WO2007147033A2 (en) 2006-06-14 2007-12-21 Divitas Networks, Inc. Code-based echo cancellation
US7433463B2 (en) 2004-08-10 2008-10-07 Clarity Technologies, Inc. Echo cancellation and noise reduction method
US20090207763A1 (en) * 2008-02-15 2009-08-20 Microsoft Corporation Voice switching for voice communication on computers

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6337679B1 (en) * 1999-10-08 2002-01-08 Shin Jiuh Corp. Instruction input device
US6724170B1 (en) * 2002-11-16 2004-04-20 Motorola, Inc. Interconnect system for rechargeable computer batteries

Patent Citations (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4039753A (en) 1974-06-05 1977-08-02 Elektroakusztikai Gyar Singing suppressor device
US4069395A (en) 1977-04-27 1978-01-17 Bell Telephone Laboratories, Incorporated Analog dereverberation system
US4275398A (en) 1979-04-06 1981-06-23 Del Norte Technology, Inc. FM Radio ranging system
US4359606A (en) 1979-05-11 1982-11-16 Toa Electric Co., Ltd. Voice switching control system
US4696015A (en) 1983-10-28 1987-09-22 Etablissement Public De Diffusion Dit Telediffusion De France Echo correction especially for television broadcast systems
US4696032A (en) 1985-02-26 1987-09-22 Siemens Corporate Research & Support, Inc. Voice switched gain system
US4636586A (en) 1985-09-20 1987-01-13 Rca Corporation Speakerphone with adaptive cancellation of room echoes
US5099472A (en) 1989-10-24 1992-03-24 Northern Telecom Limited Hands free telecommunication apparatus and method
US5430796A (en) 1990-11-19 1995-07-04 Nec Corporation Auxiliary attenuator arrangement for use in voice-switched speakerphone and method of selectively inserting communication loss in speakerphone paths
US5263019A (en) 1991-01-04 1993-11-16 Picturetel Corporation Method and apparatus for estimating the level of acoustic feedback between a loudspeaker and microphone
US5305307A (en) 1991-01-04 1994-04-19 Picturetel Corporation Adaptive acoustic echo canceller having means for reducing or eliminating echo in a plurality of signal bandwidths
GB2269968A (en) 1992-08-14 1994-02-23 British Broadcasting Corp Method and apparatus for attenuating an unwanted signal in a mix of signals
US5323459A (en) 1992-11-10 1994-06-21 Nec Corporation Multi-channel echo canceler
US5559793A (en) 1993-01-08 1996-09-24 Multi-Tech Systems, Inc. Echo cancellation system and method
US5542000A (en) 1993-03-19 1996-07-30 Yamaha Corporation Karaoke apparatus having automatic effector control
US5353348A (en) 1993-05-14 1994-10-04 Jrc International, Inc. Double echo cancelling system
US5477534A (en) 1993-07-30 1995-12-19 Kyocera Corporation Acoustic echo canceller
US5454041A (en) 1994-02-25 1995-09-26 Sony Electronics Inc. Mix-minus monitor system
US6081722A (en) 1994-03-16 2000-06-27 U.S. Philips Corporation Mobile radio transmission system with channel allocation, utilizing the extended kohonen networks
US5680450A (en) 1995-02-24 1997-10-21 Ericsson Inc. Apparatus and method for canceling acoustic echoes including non-linear distortions in loudspeaker telephones
US5721730A (en) 1995-07-06 1998-02-24 Coherent Communications Systems Corp. Spectral noise compensation for telecommunication signal processing
US6002776A (en) 1995-09-18 1999-12-14 Interval Research Corporation Directional acoustic signal processor and method therefor
US5646990A (en) 1995-09-21 1997-07-08 Rockwell International Corporation Efficient speakerphone anti-howling system
US6219418B1 (en) 1995-10-18 2001-04-17 Telefonaktiebolaget Lm Ericsson (Publ) Adaptive dual filter echo cancellation method
US5666407A (en) 1995-12-05 1997-09-09 Ncr Corporation Software-based bridging system for full duplex audio telephone conferencing
US5923749A (en) 1995-12-06 1999-07-13 Telefonaktiebolaget Lm Ericsson Method and system for eliminating acoustic echos in a digital telecommunication system
US5619582A (en) 1996-01-16 1997-04-08 Oltman; Randy Enhanced concert audio process utilizing a synchronized headgear system
US6169728B1 (en) 1996-03-29 2001-01-02 Motorola Inc. Apparatus and method for spectrum management in a multipoint communication system
US6215880B1 (en) 1996-04-24 2001-04-10 Nec Corporation Two-wire conference apparatus with echo canceler without training
US6574336B1 (en) 1996-06-19 2003-06-03 Nokia Telecommunications Oy Echo suppressor and non-linear processor of echo canceller
US6377679B1 (en) 1996-12-26 2002-04-23 Kabushiki Kaisha Kobe Seiko Sho Speakerphone
US6205124B1 (en) 1996-12-31 2001-03-20 Compaq Computer Corporation Multipoint digital simultaneous voice and data system
US6078645A (en) * 1997-02-20 2000-06-20 Lucent Technologies Inc. Apparatus and method for monitoring full duplex data communications
US6535609B1 (en) 1997-06-03 2003-03-18 Lear Automotive Dearborn, Inc. Cabin communication system
US6418203B1 (en) 1997-06-06 2002-07-09 Data Race, Inc. System and method for communicating audio information between a computer and a duplex speakerphone modem
US6044150A (en) 1997-08-21 2000-03-28 3Com Corporation Method and apparatus for a host-based personal computer speakerphone
US6707910B1 (en) 1997-09-04 2004-03-16 Nokia Mobile Phones Ltd. Detection of the speech activity of a source
US6868157B1 (en) 1997-09-16 2005-03-15 Sanyo Electric Co., Ltd. Echo canceling method, echo canceller and voice switch
US20020090008A1 (en) 1997-10-22 2002-07-11 Cioffi John M. Method and apparatus for coordinating multi-point to point communications in a multi-tone data transmission system
US7031269B2 (en) 1997-11-26 2006-04-18 Qualcomm Incorporated Acoustic echo canceller
US6212273B1 (en) * 1998-03-20 2001-04-03 Crystal Semiconductor Corporation Full-duplex speakerphone circuit including a control interface
US20030174847A1 (en) 1998-07-31 2003-09-18 Circuit Research Labs, Inc. Multi-state echo suppressor
US6324170B1 (en) 1998-09-10 2001-11-27 Nortel Networks Limited Echo controller with compensation for variable delay networks
US6583821B1 (en) * 1999-07-16 2003-06-24 Thomson Licensing S.A. Synchronizing apparatus for a compressed audio/video signal receiver
US6724736B1 (en) 2000-05-12 2004-04-20 3Com Corporation Remote echo cancellation in a packet based network
US7085370B1 (en) 2000-06-30 2006-08-01 Telefonaktiebolaget Lm Ericsson (Publ) Ringback detection circuit
US6738358B2 (en) 2000-09-09 2004-05-18 Intel Corporation Network echo canceller for integrated telecommunications processing
US6799062B1 (en) 2000-10-19 2004-09-28 Motorola Inc. Full-duplex hands-free transparency circuit and method therefor
US6748086B1 (en) 2000-10-19 2004-06-08 Lear Corporation Cabin communication system without acoustic echo cancellation
US20020101982A1 (en) 2001-01-30 2002-08-01 Hammam Elabd Line echo canceller scalable to multiple voice channels/ports
US20030206625A9 (en) 2001-02-16 2003-11-06 Jamil Ahmad Tone detection and echo cancellation in a communications network
US20030206624A1 (en) 2002-05-03 2003-11-06 Acoustic Technologies, Inc. Full duplex echo cancelling circuit
US7120259B1 (en) 2002-05-31 2006-10-10 Microsoft Corporation Adaptive estimation and compensation of clock drift in acoustic echo cancellers
US20070047738A1 (en) * 2002-05-31 2007-03-01 Microsoft Corporation Adaptive estimation and compensation of clock drift in acoustic echo cancellers
US20040013275A1 (en) 2002-06-05 2004-01-22 Balan Radu Victor Apparatus and method for estimating the direction of arrival of a source signal using a microphone array
US20040001598A1 (en) 2002-06-05 2004-01-01 Balan Radu Victor System and method for adaptive multi-sensor arrays
US20040001597A1 (en) 2002-07-01 2004-01-01 Tandberg Asa Audio communication system and method with improved acoustic characteristics
US20050223113A1 (en) * 2002-07-02 2005-10-06 Klaus Hoffmann Runtime-dependent switching off of the echo compensation in packet networks
US20040081315A1 (en) 2002-10-25 2004-04-29 Boland Simon Daniel Echo detection and monitoring
US20040125942A1 (en) 2002-11-29 2004-07-01 Franck Beaucoup Method of acoustic echo cancellation in full-duplex hands free audio conferencing with spatial directivity
US20040141528A1 (en) 2003-01-21 2004-07-22 Leblanc Wilfrid Using RTCP statistics for media system control
US20060018459A1 (en) 2004-06-25 2006-01-26 Mccree Alan V Acoustic echo devices and methods
US7433463B2 (en) 2004-08-10 2008-10-07 Clarity Technologies, Inc. Echo cancellation and noise reduction method
US20070041324A1 (en) * 2005-06-10 2007-02-22 Kishan Shenoi Adaptive play-out buffers and adaptive clock operation in packet networks
US20070019802A1 (en) * 2005-06-30 2007-01-25 Symbol Technologies, Inc. Audio data stream synchronization
US20070165837A1 (en) * 2005-12-30 2007-07-19 Microsoft Corporation Synchronizing Input Streams for Acoustic Echo Cancellation
US20070165838A1 (en) 2006-01-13 2007-07-19 Microsoft Corporation Selective glitch detection, clock drift compensation, and anti-clipping in audio echo cancellation
US20070263849A1 (en) 2006-04-28 2007-11-15 Microsoft Corporation Integration of a microphone array with acoustic echo cancellation and center clipping
US20070263850A1 (en) 2006-04-28 2007-11-15 Microsoft Corporation Integration of a microphone array with acoustic echo cancellation and residual echo suppression
US20070280472A1 (en) 2006-05-30 2007-12-06 Microsoft Corporation Adaptive acoustic echo cancellation
WO2007147033A2 (en) 2006-06-14 2007-12-21 Divitas Networks, Inc. Code-based echo cancellation
US20090207763A1 (en) * 2008-02-15 2009-08-20 Microsoft Corporation Voice switching for voice communication on computers

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
"Echo Cancellation," Chapter 4, pp. 1-8 (1990).
"Techniques for Frequency Stability Analysis, IEEE International Frequency Control Symposium" and dated May 4, 2003. *
Ahgren, "Teleconferencing, System Identification and Array Processing," IT Licentiate Thesis, Uppsala University Department of Information Technology, , 75 pages, Oct. 2001.
Ahgren, "Teleconferencing, System Identification and Array Processing," IT Licentiate Thesis, Uppsala University Department of Information Technology, <http://www.it.uu.se/research/publications/lic/2001-012/2001-012.pdf>, 75 pages, Oct. 2001.
Buchner et al., "Wave-Domain Adaptive Filtering: Acoustic Echo Cancellation for Full-Duplex Systems Based on Wave-Field Synthesis," Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, 2004, pp. IV-117-IV-120.
CISCO, "Configuring Echo Cancellation," http://www.cisco.com/en/US/docs/ios/12-3/vvf-c/voice-port-configuration-guide/ch5-echo.html, © 1992-2008, 17 pages.
Ditech Communications, "Ditech's Voice Quality Assurance(TM)," , 2 pages, 2005.
Ditech Communications, "Ditech's Voice Quality Assurance™," <http://www.ditechcom.com/platforms/DataSheets/OV—VQA.pdf>, 2 pages, 2005.
Frunze, "Echo Cancellation Demystified," Technical Paper, SPIRIT Corp., undated, 13 pages.
Herbordt et al., "Joint Optimization of Lcmv Beamforming and Acoustic Echo Cancellation," <http://www.slt.atr.jp/~wherbord/archives/mypub/HerbordtJoint-optimization-of-LCMV-beamforming-and-acoustic-echo-cancellation.pdf>, 4 pages.
Herbordt et al., "Joint Optimization of Lcmv Beamforming and Acoustic Echo Cancellation," <http://www.slt.atr.jp/˜wherbord/archives/mypub/HerbordtJoint—optimization—of—LCMV—beamforming—and—acoustic—echo—cancellation.pdf>, 4 pages.
Hoshuyama, "Nonlinear Echo Suppression Technology Enabling Quality Handsfree Talk for Compact Equipment," NEC Technical Journal, vol. 2, No. 2, 2007, pp. 27-30.
Knowles Acoustics, "Beat the Noise and Echo, Communicate Anywhere via VoIP," <http://www.speechtechmag.com/whitepapers/knowlesa.pdf, 11 pages, 2004.
W. J. Riley ("Techniques for Frequency Stability Analysis, IEEE International Frequency Control Symposium" and dated May 4, 2003) pp. 10, 30, and 33. *
Yensen et al., "Echo Target Determination Using Acoustic Round Trip Delay for Voice Over IP Conferences," Proceedings of the 1999 IEEE International Symposium on Circuits and Systems, vol. 4, 1999, pp. IV-348-IV-351.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10796709B1 (en) 2019-05-16 2020-10-06 Microsoft Technology Licensing, Llc Acoustic echo cancellation bypass

Also Published As

Publication number Publication date
US20090316881A1 (en) 2009-12-24

Similar Documents

Publication Publication Date Title
US8295475B2 (en) Selective glitch detection, clock drift compensation, and anti-clipping in audio echo cancellation
US11601554B2 (en) Detection of acoustic echo cancellation
US8934945B2 (en) Voice switching for voice communication on computers
US20080247535A1 (en) Method and apparatus for mitigating impact of nonlinear effects on the quality of audio echo cancellation
US8369251B2 (en) Timestamp quality assessment for assuring acoustic echo canceller operability
CN110225214B (en) Method, attenuation unit, system and medium for attenuating a signal
JP6773403B2 (en) Gain control system and gain control method
US9966067B2 (en) Audio noise estimation and audio noise reduction using multiple microphones
US8842851B2 (en) Audio source localization system and method
CN103748865B (en) Utilize the clock deskew of the acoustic echo arrester of not audible tone
US9443528B2 (en) Method and device for eliminating echoes
JP5332733B2 (en) Echo canceller
US8385558B2 (en) Echo presence determination in voice conversations
EP2982101B1 (en) Noise reduction
KR20180091439A (en) Acoustic echo cancelling apparatus and method
CN110870211B (en) Method and system for detecting and compensating for inaccurate echo prediction
US9934791B1 (en) Noise supressor
US20080259828A1 (en) Method and apparatus for reducing timestamp noise in audio echo cancellation
CN108540680B (en) Switching method and device of speaking state and conversation system
CN110570881A (en) method, apparatus, device and medium for switching back to voice mode in facsimile transparent transmission mode
JP5963077B2 (en) Telephone device
JP5288500B2 (en) Hands-free telephone device, hands-free calling method, and hands-free calling program
JP4395105B2 (en) Acoustic coupling amount estimation method, acoustic coupling amount estimation device, program, and recording medium
JP3756842B2 (en) Echo suppression method, echo suppression device, echo suppression program
CN109361827B (en) Echo secondary suppression method for communication terminal

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PRAKASH, VINOD;HE, CHAO;REEL/FRAME:021148/0617

Effective date: 20080620

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034564/0001

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 4

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY