EP1190495A1 - Coded domain echo control - Google Patents

Coded domain echo control

Info

Publication number
EP1190495A1
EP1190495A1 EP00948555A EP00948555A EP1190495A1 EP 1190495 A1 EP1190495 A1 EP 1190495A1 EP 00948555 A EP00948555 A EP 00948555A EP 00948555 A EP00948555 A EP 00948555A EP 1190495 A1 EP1190495 A1 EP 1190495A1
Authority
EP
European Patent Office
Prior art keywords
parameter
code
near end
digital signal
echo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00948555A
Other languages
German (de)
French (fr)
Inventor
Ravi Chandran
Daniel J. Marchok
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Coriant Operations Inc
Original Assignee
Tellabs Operations Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tellabs Operations Inc filed Critical Tellabs Operations Inc
Publication of EP1190495A1 publication Critical patent/EP1190495A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0014Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the source coding

Definitions

  • the present invention relates to coded domain enhancement of compressed speech and in particular to coded domain echo contol.
  • GSM 06.10 Digital cellular telecommunication system (Phase 2); Full rate speech; Part 2: Transcoding", ETS 300 580-2, March 1998, Second Edition.
  • GSM 06.60 Digital cellular telecommunications system (Phase 2); Enhanced Full Rate (EFR) speech transcoding", June 1998.
  • GSM 08.62 Digital cellular telecommunications system (Phase 2+); Inband Tandem Free Operation (TFO) of Speech Codecs", ETSI, March 2000.
  • GSM 06.12 European digital cellular telecommunications system (Phase 2); Comfort noise aspect for full rate speech traffic channels", ETSI, September 1994.
  • Speech transmission between the mobile stations (handsets) and the base station is in compressed or coded form.
  • Speech coding techniques such as the GSM FR [1] and EFR [2] are used to compress the speech.
  • the devices used to compress speech are called vocoders.
  • the coded speech requires less than 2 bits per sample. This situation is depicted in Figure 1. Between the base stations, the speech is transmitted in an uncoded form (using PCM companding which requires 8 bits per sample).
  • coded speech and uncoded speech may be described as follows:
  • Uncoded speech refers to the digital speech signal samples typically used in telephony; these samples are either in linear 13-bits per sample form or companded form such as the 8-bits per sample ⁇ -law or A-law PCM form; the typical bit-rate is
  • Coded speech refers to the compressed speech signal parameters (also referred to as coded parameters) which use a bit rate typically well below 64kbps such as 13 kbps in the case of the GSM FR and 12.2 kbps in the case of GSM EFR; the compression methods are more extensive than the simple PCM companding scheme; examples of compression methods are linear predictive coding, code-excited linear prediction and multi-band excitation coding [4].
  • TFO Tandem-Free Operation
  • the TFO standard applies to mobile-to- mobile calls.
  • the speech signal is conveyed between mobiles in a compressed form after a brief negotiation period.
  • the elimination of tandem codecs is known to improve speech quality in the case where the original signal is clean.
  • the key point to note is that the speech transmission remains coded between the mobile handsets and is depicted in Figure 2.
  • the echo problem and its traditional solution are shown in Figure 4.
  • echo occurs due to the impedance mismatch at the 4-wire-to-2- wire hybrids.
  • the mismatch results in electrical reflections of a portion of the far-end signal into the near-end signal.
  • the endpath impulse response is estimated using a network echo canceller (EC) and is used to produce an estimate of the echo signal.
  • the estimate is then subtracted from the near-end signal to remove the echo.
  • NLP non-linear processor
  • the echo occurs due to the feedback from the speaker (earpiece) to the microphone (mouthpiece).
  • the acoustic feedback can be significant and the echo can be annoying, particularly in the case of hands-free phones.
  • Figure 5 shows the feedback path from the speaker to the microphone in a digital cellular handset.
  • the depicted handset does not have echo cancellation implemented in the handset.
  • a vocoder tandem i.e. two encoder/decoder pairs placed in series
  • comfort noise generation may be used to mask the echo.
  • Comfort noise generation is used for silence suppression or discontinuous transmission purposes (e.g. [5]). It is possible to use such techniques to completely mask the echo whenever echo is detected. However, such techniques suffer from "choppiness" particularly during double-talk conditions, as well as poor and unnatural background transparency.
  • the proposed techniques are capable of performing echo control (acoustic or linear) directly on the coded speech (i.e. by direct modification of the coded parameters).
  • Low computational complexity and delay are achieved. Tandeming effects are avoided or minimized, resulting in better perceived quality after echo control. Excellent background transparency is also achieved.
  • Speech compression which falls under the category of lossy source coding, is commonly referred to as speech coding.
  • Speech coding is performed to minimize the bandwidth necessary for speech transmission. This is especially important in wireless telephony where bandwidth is scarce. In the relatively bandwidth abundant packet networks, speech coding is still important to minimize network delay and jitter. This is because speech communication, unlike data, is highly intolerant of delay. Hence a smaller packet size eases the transmission through a packet network.
  • Table 1 The four ETSI GSM standards of concern are listed in Table 1.
  • a set of consecutive digital speech samples is referred to as a speech frame.
  • the GSM coders operate on a frame size of 20ms (160 samples at 8kHz sampling rate). Given a speech frame, a speech encoder determines a small set of parameters for a speech synthesis model. With these speech parameters and the speech synthesis model, a speech frame can be reconstructed that appears and sounds very similar to the original speech frame. The reconstruction is performed by the speech decoder. In the GSM vocoders listed above, the encoding process is much more computationally intensive than the decoding process.
  • the speech parameters determined by the speech encoder depend on the speech synthesis model used.
  • the GSM coders in Table 1 utilize linear predictive coding (LPC) models.
  • LPC linear predictive coding
  • a block diagram of a simplified view of a generic LPC speech synthesis model is shown in Figure 7. This model can be used to generate speech-like signals by specifying the model parameters appropriately.
  • the parameters include the time-varying filter coefficients, pitch periods, codebook vectors and the gain factors.
  • the synthetic speech is generated as follows.
  • An appropriate codebook vector, c(n) is first scaled by the codebook gain
  • n denotes sample time.
  • a pitch synthesis filter whose parameters include the pitch gain, g , and the pitch
  • T The result is sometimes referred to as the total excitation vector, u(n) .
  • the pitch synthesis filter provides the harmonic quality of voiced speech.
  • the total excitation vector is then filtered by the LPC synthesis filter which specifies the broad spectral shape of the speech frame and the broad spectral shape of the corresponding audio signal. For each speech frame, the parameters are usually updated more than once.
  • the codebook vector, codebook gain and the pitch synthesis filter parameters are determined every subframe (5ms).
  • LPC synthesis filter parameters are determined twice per frame (every 10ms) in EFR and once per frame in FR.
  • a typical sequence of steps used in a speech encoder is as follows:
  • a typical sequence of steps used in a speech decoder is as follows:
  • the GSM FR vocoder As an example of the arrangement of coded parameters in the bit-stream transmitted by the encoder, the GSM FR vocoder is considered.
  • a frame is defined as 160 samples of speech sampled at 8kHz, i.e. a frame is 20ms long. With A-law PCM companding, 160 samples would require 1280 bits for transmission.
  • the encoder compresses the 160 samples into 260 bits.
  • the arrangement of the various coded parameters in the 260 bits of each frame is shown in Figure 8.
  • the first 36 bits of each coded frame consists of the log-area ratios which correspond to LPC synthesis filter.
  • the remaining 224 bits can be grouped into 4 subframes of 56 bits each. Within each subframe, the coded parameter bits contain the pitch synthesis filter related parameters followed by the codebook vector and gain related parameters.
  • the preferred embodiment is useful in a communications system for transmitting a near end digital signal using a compression code comprising a plurality of parameters including a first parameter.
  • the parameters represent an audio signal comprising a plurality of audio characteristics.
  • the compression code is decodable by a plurality of decoding steps.
  • the communications system also transmits a far end digital signal using a compression code.
  • the echo in the near end digital signal can be reduced by reading at least the first parameter of the plurality of parameters in response to the near end digital signal.
  • At least one of the plurality of the decoding steps is performed on the near end digital signal and the far end digital signal to generate at least partially decoded near end signals and at least partially decoded far end signals.
  • the first parameter is adjusted in response to the at least partially decoded near end signals and at least partially decoded far end signals to generate an adjusted first parameter.
  • the first parameter is replaced with the adjusted first parameter in the near end digital signal.
  • the reading, generating and adjusting preferably are performed by a processor.
  • Another embodiment of the invention is useful in a communications system for transmitting a near end digital signal comprising code samples further comprising first bits using a compression code and second bits using a linear code.
  • the code samples represent an audio signal having a plurality of audio characteristics.
  • the system also transmits a far end digital signal. In such an environment, any echo in the near end digital signal can be reduced without decoding the compression code by adjusting the first bits and second bits in response to the near end digital signal and the far end digital signal.
  • Figure 1 is a schematic block diagram of a system for speech transmission in a GSM digital cellular network.
  • FIG. 2 is a schematic block diagram of a system for speech transmission in a GSM network under tandem-free operation (TFO).
  • FIG. 3 is a graph illustrating transmission of speech under tandem-free operation (TFO).
  • Figure 4 is a schematic block diagram of a traditional solution to an echo problem in a wireline network.
  • Figure 5 is a schematic block diagram illustrating acoustic feedback from a speaker to a microphone in a digital cellular telephone.
  • Figure 6 is a schematic block diagram of a traditional echo cancellation approach for coded speech.
  • Figure 7 is a schematic block diagram of a generic linear predictive code (LPC) speech synthesis model or speech decoder model.
  • LPC linear predictive code
  • Figure 8 is a diagram illustrating the arrangement of coded parameters in the bit stream for GSM FR.
  • Figure 9 is a schematic block diagram of a preferred form of coded domain echo control system for acoustic echo environments made in accordance with the invention.
  • Figure 10 is a schematic block diagram of another preferred form of coded domain echo control system for echo due to 4-wire-to-2-wire hybrids made in accordance with the invention.
  • Figure 11 is a schematic block diagram of a simplified end path model with flat delay and attenuation.
  • Figure 12 is a graph illustrating a preliminary echo likelihood versus near end to far end subframe power ratio.
  • Figure 13 is a flow diagram illustrating a preferred form of coded domain echo control methodology.
  • Figure 14 is a graph illustrating an exemplary pitch synthesis filter magnitude frequency response.
  • Figure 15 is a graph illustrating exemplary magnitude frequency responses of an original LPC synthesis filter and flattened versions of such a filter.
  • the codebook vector, c(n) is filtered by H(z) to result in the synthesized
  • LPC-based vocoders use parameters similar to the above set, parameters that may be converted to the above forms, or parameters that are related to the above forms.
  • the LPC coefficients in LPC-based vocoders may be represented using log-area ratios (e.g. the GSM FR) or line spectral frequencies (e.g. GSM EFR); both of these forms can be converted to LPC coefficients.
  • An example of a case where a parameter is related to the above form is the block maximum parameter in the GSM FR vocoder; the block maximum can be considered to be directly proportional to the codebook gain in the model described by equation (1).
  • coded parameter modification methods is mostly limited to the generic speech decoder model, it is relatively straightforward to tailor these methods for any LPC-based vocoder, and possibly even other models.
  • linear code By a linear code, we mean a compression technique that results in one coded parameter or coded sample for each sample of the audio signal.
  • linear codes are PCM (A-law and ⁇ -law) ADPCM (adaptive differential pulse code modulation), and delta modulation.
  • Compression code By a compression code, we mean a technique that results in fewer than one coded parameter for each sample of the audio signal. Typically, compression codes result in a small set of coded parameters for each block or frame of audio signal samples. Examples of compression codes are linear predictive coding based vocoders such as the GSM vocoders (HR, FR, EFR).
  • Figure 9 shows a novel implementation of coded domain echo control (CDEC) for a situation where acoustic echo is present.
  • a communications system 10 transmits near end coded digital signals over a network 24 using a compression code, such as any of the codes used by the Codecs identified in Table 1.
  • the compression code is generated by an encoder 16 from linear audio signals generated by a near end microphone 14 within a near end speaker handset 12.
  • the compression code comprises parameters, such as the shown in Figure 8.
  • the parameters represent an audio signal comprising a plurality of audio characteristics, including audio level and power.
  • the compression code is decodable by various decoding steps.
  • system 10 controls echo in the near end digital signals due to the presence of a far end digital signals transmitted by system 10 over a network 32. The echo is controlled with minimal delay and minimal, if any, decoding of the compression code parameters shown in Figure 8.
  • Near end digital signals using the compression code are received on a near end terminal 20, and digital signals using an adjusted compression code are transmitted by a near end terminal 22 over a network 24 to a far end handset (not shown) which includes a decoder (not shown) of the adjusted compression code.
  • the adjusted compression code is compatible with the original compression code. In other words, when the coded parameters are modified or adjusted, we term it the adjusted compression code, but it still is decodable using a standard decoder corresponding to the original compression code.
  • a linear far end audio signal is encoded by a far end encoder (not shown) to generate far end digital signals using a compression code compatible with decoder 18, and is transmitted over a network 32 to a far end terminal 34.
  • a decoder 18 of near end handset 12 decodes the far end digital signals. As shown in Figure 9, echo signals from the far end signals may find their way to encoder 16 of the near end handset 12 through acoustic feedback.
  • a processor 40 performs various operations on the near end and far end compression code.
  • Processor 40 may be a microprocessor, microcontroller, digital signal processor, or other type of logic unit capable of arithmetic and logical operations.
  • a different coded domain echo control algorithm 44 is executed by processor 40 at all times - under compressed mode and linear mode, during TFO as well as non-TFO.
  • a partial decoder 48 is executed by processor 40 to read at least a first of the parameters received at terminal 20.
  • Another partial decoder 46 is executed by processor 40 to generate at least partially decoded far end signals. Decoder 48 generates at least partially decoded near end signals. (Note that the compression codes used by the near end and far end signals may be different, and hence the partial decoders may also be different.)
  • algorithm 44 Based on the partial decoding, algorithm 44 generates an echo likelihood signal at least estimating the amount of echo in the near end digital signal.
  • the echo likelihood signal varies over time since the amount of echo depends on the far end speech signal.
  • the echo likelihood signal is used by algorithm 44 to adjust the parameter(s) read by algorithm 44.
  • the adjusted parameter is written into the near end digital signal to form an adjusted near end digital signal which is transmitted from terminal 22 to network 24. In other words, the adjusted parameter is substitued for the originally read parameter.
  • the partial decoders 46 and 48 shown within the Network ALC Device are algorithms executed by processor 40 and are codec-dependent.
  • the partial decoders operate on signals compressed using compression codes.
  • partial decoder 46 may decode the linear code rather than the compression code. Also, in this case, partial decoder 48 decodes the linear code and only determines the coded parameters from the compression code without actually synthesizing the audio signal from the compression code.
  • Blocks 44, 46 and 48 also may be implemented as hardwired circuits.
  • Figure 10 shows that the Figure 9 embodiment can be useful for a system in which the echo is due to a 4-wire-to-2-wire hybrid.
  • the CDEC device/algorithm removes the effects of echo from the near-end coded speech by directly modifying the coded parameters in the bit-stream received from the near-end. Decoding of the near-end and far-end signals is performed in order to determine the likelihood of echo being present in the near-end. Certain statistics are measured from the decoded signals to determine this likelihood value.
  • the decoding of near-end and far-end signals may be complete or partial depending on the vocoder being used for the encode and decode operations. Some examples of situations where partial decoding suffices are listed below:
  • CELP code-excited linear prediction
  • the CDEC device may be placed between the base station and the switch (known as the A-interface) or between the two switches. Since the 6 MSBs of each 8-bit sample of the speech signal corresponds to the PCM code as shown in Figure 3, it is possible to avoid decoding the coded speech altogether in this situation. A simple table-lookup is sufficient to convert the 8-bit companded samples to 13-bit linear speech samples using A-law companding tables. This provides an economical way to obtain a version of the speech signal without invoking the appropriate decoder. Note that the speech signal obtained in this manner is somewhat noisy, but has been found to be adequate for the measurement of the statistics necessary for determining the likelihood of echo.
  • the far-end and near-end signals are available, certain statistics are measured and used to determine the likelihood of echo being present in the near-end signal.
  • the echo likelihood is estimated for each speech subframe, where the subframe duration is dependent on the vocoder being used. A preferred approach is described in this section.
  • FIG. 11 A simplified model of the end-path is assumed as shown in Figure 11.
  • the end-path is assumed to consist of a flat delay of ⁇ samples and an echo return loss (ERL), ⁇ .
  • ERP echo return loss
  • s NE (n) and s FE (n) are the near-end and far-end uncoded
  • P NE is the power of the current subframe of the near-end signal.
  • P FE (0) is the power of the current subframe of the far-end signal.
  • P FE (m) is the power of the m th subframe before the current subframe of the
  • N is the number of samples in a subframe.
  • R is the near-end to far-end subframe power ratio.
  • p is the echo likelihood obtained by smoothing the preliminary echo
  • the echo likelihood is estimated for each subframe using the steps below.
  • the processing may be more appropriately performed frame-by- frame rather than subframe-by- subframe.
  • denominator is essentially the maximum far-end subframe power measured during the expected end-path delay time period.
  • the codebook gain parameter, G for each subframe is reduced by a scale factor depending on the echo likelihood, p , for the subframe.
  • G new gain parameter
  • This parameter is then requantized according to the vocoder standard.
  • the codebook gain controls the overall level of the synthesized signal in the speech decoder model of Figure 7, and therefore controls the overall level of the corresponding audio signal. Attenuating the codebook gain in turn results in the attenuation of the echo.
  • the resulting 6 bit value is reinserted at the appropriate positions in the bit-stream.
  • the codebook vector, c(n) is modified by randomizing the pulse positions
  • Randomizing the codebook vector results in destroying the correlation properties of the echo. This has the effect of destroying much of the "speech-like" nature of the echo.
  • the randomization is performed whenever the likelihood of echo is determined to be high, preferably when p > 0.8 .
  • randomization may be performed using any suitable pseudo-random bit generation technique.
  • the codebook vector for each subframe is determined by the RPE grid position parameter (2 bits) and 13 RPE pulses (3 bits each). These 41 bits are replaced with 41 random bits using a pseudo-random bit generator.
  • the pitch synthesis filter implements any period of long-term correlation in the speech signal, and is particularly important for modeling the harmonics of voiced speech.
  • the model of this filter discussed in Figure 7 uses only two parameters, the
  • the pitch period T
  • the pitch gain g p
  • the pitch period is relatively constant over several subframes or frames.
  • the pitch gain in most vocoders ranges from zero to one or a small value above one (e.g. 1.2 in GSM EFR).
  • the pitch gain is at or near its maximum value.
  • the voiced harmonics of the echo are generally well modeled by the pitch synthesis filter; the likelihood of echo is detected to be high ( p > 0.8 ).
  • the likelihood of echo is at moderate levels ( 0.5 ⁇ p ⁇ 0.8 ).
  • the encoding process generally results in modeling the stronger of the two signals. It is reasonable to assume that, in most cases, the near-end speech is stronger than the echo. If this is the case, then the encoding process, due to its nature, tends to model mostly the near-end speech harmonics and little or none of the echo harmonics with the pitch synthesis filter.
  • the pitch period is randomized so that long-term correlation in the echo is removed, hence destroying the voiced nature of the echo. Such randomization is performed only when the likelihood of echo is high, preferably when p > 0.8 .
  • the pitch gain is reduced so as to control the strength of the harmonics or the strength of the long-term correlation in the audio signal.
  • Such gain attenuation is preferably performed only when the likelihood of echo is at least moderate ( p > 0.5 ).
  • the pitch period is not randomized during moderate echo likelihood but the pitch gain may be attenuated so that the voicing quality of the signal is not as strong.
  • the dotted line is the response for a high pitch gain
  • audio signal can be controlled by modifying this parameter in this manner.
  • N corresponds to the pitch period T of the model of Figure 7. N takes up 7 bits in
  • bit-stream can range from 40 to 120, inclusive.
  • the LTP gain parameter of subframe j of the GSM FR vocoder denoted by
  • the magnitude frequency response of this filter may be
  • the modified transfer function is
  • the effect of such spectral morphing on echo is to reduce or remove any formant structure present in the signal.
  • the echo is blended or morphed to sound like background noise.
  • the LPC synthesis filter magnitude frequency response for a voiced speech segment and its flattened versions for several different values of ⁇ are shown in Figure 15.
  • the spectral morphing factor ⁇ is determined
  • a similar spectral morphing method is obtained for other representations of the LPC filter coefficients commonly used in vocoders such as reflection coefficients, log-area ratios, inverse sines functions, and line spectral frequencies.
  • the GSM FR vocoder utilizes log-area ratios for representing the
  • the modified log-area ratios are then quantized according to the specifications in the standard. Note that these approaches to modification of the log-area ratios preserve the stability of the LPC synthesis filter.
  • Figure 8 shows the order in which the coded parameters from the GSM FR encoder are received.
  • a straightforward approach involves buffering up the entire 260 bits for each frame and then processing these buffered bits for coded domain echo control purposes. However, this introduces a buffering delay of about 20ms plus the processing delay.
  • the entire first subframe can be decoded as soon as bit 92 is received.
  • the first subframe may be processed after about 7.1ms (20ms times 92/260) of buffering delay.
  • the buffering delay is reduced by almost 13ms.
  • the coded LPC synthesis filter parameters are modified based on information available at the end of the first subframe of the frame.
  • the entire frame is affected by the echo likelihood computed based on the first subframe.
  • no noticeable artifacts were found due to this 'early' decision, particularly because the echo likelihood is a smoothed quantity based effectively on several previous subframes as well as the current subframe.
  • bit-stream When applying the novel coded domain processing techiques described in this report for removing echo, some are all of the bits corresponding to the coded parameters are modified in the bit-stream. This may affect other error-correction or detection bits that may also be embedded in the bit-stream. For instance, a speech encoder may embed some checksums in the bit-stream for the decoder to verify to ensure that an error-free frame is received. Such checksums as well as any parity check bits, error correction or detection bits, and framing bits are updated in accordance with the appropriate standard, if necessary.
  • additional information is available in addition to the coded parameters.
  • This additional information is the 6 MSBs of the A-law PCM samples of the audio signal.
  • these PCM samples may be used to reconstruct a version of the audio signal for both the far end and near end without using the coded parameters. This results in computational savings.

Abstract

A communications system (10) transmits a near end digital signal using a compression code comprising a plurality of parameters including a first parameter. The parameters represent an audio signal comprising a plurality of audio characteristics. The compression code is decodable by a plurality of decoding steps. The system also transmits a far end digital signal using a compression code. A terminal (20) receives the near end digital signal, and a terminal (36) receives the far end digital signal. A processor (40) is responsive to the near end digital signal to read at least the first parameter. The processor generates at least partially decoded near end signals and at least partially decoded far end signals. Based on such signals, the processor adjusts the first parameter and writes the adjusted first parameter into the near end digital signal. Another terminal (22) transmits the adjusted near end digital signal. As a result, the echo in the near end digital signal is reduced.

Description

TITLE OF THE INVENTION
CODED DOMAIN ECHO CONTROL
CROSS-REFERENCE TO RELATED APPLICATIONS
This is a utility application corresponding to provisional application no. 60/142,136 entitled "CODED DOMAIN ENHANCEMENT OF COMPRESSED SPEECH " filed July 2, 1999.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
Not Applicable.
BACKGROUND OF THE INVENTION The present invention relates to coded domain enhancement of compressed speech and in particular to coded domain echo contol.
This specification will refer to the following references:
[1] GSM 06.10, "Digital cellular telecommunication system (Phase 2); Full rate speech; Part 2: Transcoding", ETS 300 580-2, March 1998, Second Edition. [2] GSM 06.60, "Digital cellular telecommunications system (Phase 2); Enhanced Full Rate (EFR) speech transcoding", June 1998.
[3] GSM 08.62, "Digital cellular telecommunications system (Phase 2+); Inband Tandem Free Operation (TFO) of Speech Codecs", ETSI, March 2000.
[4] J. R. Deller, J. G. Proakis, J. H. L. Hansen, "Discrete-Time Processing of Speech Signals", Chapter 7, Prentice-Hall Inc, 1987,
[5] GSM 06.12, "European digital cellular telecommunications system (Phase 2); Comfort noise aspect for full rate speech traffic channels", ETSI, September 1994.
In the GSM digital cellular network, speech transmission between the mobile stations (handsets) and the base station is in compressed or coded form. Speech coding techniques such as the GSM FR [1] and EFR [2] are used to compress the speech. The devices used to compress speech are called vocoders. The coded speech requires less than 2 bits per sample. This situation is depicted in Figure 1. Between the base stations, the speech is transmitted in an uncoded form (using PCM companding which requires 8 bits per sample).
The terms coded speech and uncoded speech may be described as follows:
Uncoded speech: refers to the digital speech signal samples typically used in telephony; these samples are either in linear 13-bits per sample form or companded form such as the 8-bits per sample μ -law or A-law PCM form; the typical bit-rate is
64 kbps.
Coded speech: refers to the compressed speech signal parameters (also referred to as coded parameters) which use a bit rate typically well below 64kbps such as 13 kbps in the case of the GSM FR and 12.2 kbps in the case of GSM EFR; the compression methods are more extensive than the simple PCM companding scheme; examples of compression methods are linear predictive coding, code-excited linear prediction and multi-band excitation coding [4].
The Tandem-Free Operation (TFO) standard [3] will be deployed in GSM digital cellular networks in the near future. The TFO standard applies to mobile-to- mobile calls. Under TFO, the speech signal is conveyed between mobiles in a compressed form after a brief negotiation period. This eliminates tandem voice codecs during mobile-to-mobile calls. The elimination of tandem codecs is known to improve speech quality in the case where the original signal is clean. The key point to note is that the speech transmission remains coded between the mobile handsets and is depicted in Figure 2.
Under TFO, the transmissions between the handsets and base stations are coded, requiring less than 2 bits per speech sample. However, 8 bits per speech sample are still available for transmission between the base stations. At the base station, the speech is decoded and then A-law companded so that 8 bits per sample are necessary. However, the original coded speech bits are used to replace the 2 least significant bits (LSBs) in each 8-bit A-law companded sample. Once TFO is established between the handsets, the base stations only send the 2 LSBs in each 8-bit sample to their respective handsets and discard the 6 MSBs. Hence vocoder tandeming is avoided. The process is illustrated in Figure 3.
The echo problem and its traditional solution are shown in Figure 4. In wireline networks, echo occurs due to the impedance mismatch at the 4-wire-to-2- wire hybrids. The mismatch results in electrical reflections of a portion of the far-end signal into the near-end signal. Depending on the channel impulse response of the endpath and network delay, the echo can be annoying to the far end listener. The endpath impulse response is estimated using a network echo canceller (EC) and is used to produce an estimate of the echo signal. The estimate is then subtracted from the near-end signal to remove the echo. After EC processing, any residual echo is removed by the non-linear processor (NLP).
In the case of a digital cellular handset, the echo occurs due to the feedback from the speaker (earpiece) to the microphone (mouthpiece). The acoustic feedback can be significant and the echo can be annoying, particularly in the case of hands-free phones.
Figure 5 shows the feedback path from the speaker to the microphone in a digital cellular handset. The depicted handset does not have echo cancellation implemented in the handset.
Under TFO in GSM networks, if echo cancellation is implemented in the network, a traditional approach requires decoding the coded speech, processing the resulting uncoded speech and then re-encoding it. Such decoding and re-encoding is necessary because traditional echo cancellers can only operate on the uncoded speech signal. This approach is shown in Figure 6. Some of the disadvantages of this approach are as follows.
1. This approach is computationally expensive due to the need for two decoders and an encoder. Typically, encoders are at least an order of magnitude more complex computationally than decoders. Thus, the presence of an encoder, in particular, is a major computational burden.
2. The delay introduced by the decoding and re-encoding processes is undesirable.
3. A vocoder tandem (i.e. two encoder/decoder pairs placed in series) is introduced in this approach, which is known to degrade speech quality due to quantization effects. In another straightforward approach, comfort noise generation may be used to mask the echo. Comfort noise generation is used for silence suppression or discontinuous transmission purposes (e.g. [5]). It is possible to use such techniques to completely mask the echo whenever echo is detected. However, such techniques suffer from "choppiness" particularly during double-talk conditions, as well as poor and unnatural background transparency.
The proposed techniques are capable of performing echo control (acoustic or linear) directly on the coded speech (i.e. by direct modification of the coded parameters). Low computational complexity and delay are achieved. Tandeming effects are avoided or minimized, resulting in better perceived quality after echo control. Excellent background transparency is also achieved.
Speech compression, which falls under the category of lossy source coding, is commonly referred to as speech coding. Speech coding is performed to minimize the bandwidth necessary for speech transmission. This is especially important in wireless telephony where bandwidth is scarce. In the relatively bandwidth abundant packet networks, speech coding is still important to minimize network delay and jitter. This is because speech communication, unlike data, is highly intolerant of delay. Hence a smaller packet size eases the transmission through a packet network. The four ETSI GSM standards of concern are listed in Table 1.
Table 1: GSM Speech Codecs
In speech coding, a set of consecutive digital speech samples is referred to as a speech frame. The GSM coders operate on a frame size of 20ms (160 samples at 8kHz sampling rate). Given a speech frame, a speech encoder determines a small set of parameters for a speech synthesis model. With these speech parameters and the speech synthesis model, a speech frame can be reconstructed that appears and sounds very similar to the original speech frame. The reconstruction is performed by the speech decoder. In the GSM vocoders listed above, the encoding process is much more computationally intensive than the decoding process.
The speech parameters determined by the speech encoder depend on the speech synthesis model used. The GSM coders in Table 1 utilize linear predictive coding (LPC) models. A block diagram of a simplified view of a generic LPC speech synthesis model is shown in Figure 7. This model can be used to generate speech-like signals by specifying the model parameters appropriately. In this example speech synthesis model, the parameters include the time-varying filter coefficients, pitch periods, codebook vectors and the gain factors. The synthetic speech is generated as follows. An appropriate codebook vector, c(n) , is first scaled by the codebook gain
factor G . Here n denotes sample time. The scaled codebook vector is then filtered by
a pitch synthesis filter whose parameters include the pitch gain, g , and the pitch
period, T . The result is sometimes referred to as the total excitation vector, u(n) . As
implied by its name, the pitch synthesis filter provides the harmonic quality of voiced speech. The total excitation vector is then filtered by the LPC synthesis filter which specifies the broad spectral shape of the speech frame and the broad spectral shape of the corresponding audio signal. For each speech frame, the parameters are usually updated more than once.
For instance, in the GSM FR and EFR coders, the codebook vector, codebook gain and the pitch synthesis filter parameters are determined every subframe (5ms). The
LPC synthesis filter parameters are determined twice per frame (every 10ms) in EFR and once per frame in FR.
A typical sequence of steps used in a speech encoder is as follows:
1. Obtain a frame of speech samples.
2. Multiply the frame of samples by a window (e.g. Hamming window) and determine the autocorrelation function up to lag M .
3. Determine the reflection coefficients and/or LPC coefficients from the autocorrelation function. (Note that reflection coefficients are an alternative representation of the LPC filter coefficients.)
4. Transform the reflection coefficients or LPC filter coefficients to a different form suitable for quantization (e.g. log-area ratios or line spectral frequencies)
5. Quantize the transformed LPC coefficients using vector quantization techniques.
6. Add any additional error correction/detection, framing bits etc.
7. Transmit the coded parameters.
The following sequence of operations is typically performed for each subframe by the speech encoder: 1. Determine the pitch period.
2. Determine the corresponding pitch gain.
3. Quantize the pitch period and pitch gain.
4. Inverse filter the original speech signal through the quantized LPC synthesis filter to obtain the LPC residual signal.
5. Inverse filter the LPC residual signal through the pitch synthesis filter to obtain the pitch residual.
6. Determine the best codebook vector.
7. Determine the best codebook gain.
8. Quantize the codebook gain and codebook vector.
9. Update the filter memories appropriately.
A typical sequence of steps used in a speech decoder is as follows:
First, perform any error correction/detection and framing.
Then, for each subframe:
1. Dequantize all the received coded parameters (LPC coefficients, pitch period, pitch gain, codebook vector, codebook gain). 2. Scale the codebook vector by the codebook gain and filter it using the pitch synthesis filter to obtain the LPC excitation signal.
3. Filter the LPC excitation signal using the LPC synthesis filter to obtain a preliminary speech signal.
4. Construct a post-filter (usually based on the LPC coefficients).
5. Filter the preliminary speech signal to reduce quantization noise to obtain the final synthesized speech.
As an example of the arrangement of coded parameters in the bit-stream transmitted by the encoder, the GSM FR vocoder is considered. For the GSM FR vocoder, a frame is defined as 160 samples of speech sampled at 8kHz, i.e. a frame is 20ms long. With A-law PCM companding, 160 samples would require 1280 bits for transmission. The encoder compresses the 160 samples into 260 bits. The arrangement of the various coded parameters in the 260 bits of each frame is shown in Figure 8. The first 36 bits of each coded frame consists of the log-area ratios which correspond to LPC synthesis filter. The remaining 224 bits can be grouped into 4 subframes of 56 bits each. Within each subframe, the coded parameter bits contain the pitch synthesis filter related parameters followed by the codebook vector and gain related parameters.
BRIEF SUMMARY OF THE INVENTION The preferred embodiment is useful in a communications system for transmitting a near end digital signal using a compression code comprising a plurality of parameters including a first parameter. The parameters represent an audio signal comprising a plurality of audio characteristics. The compression code is decodable by a plurality of decoding steps. The communications system also transmits a far end digital signal using a compression code. In such an environment, the echo in the near end digital signal can be reduced by reading at least the first parameter of the plurality of parameters in response to the near end digital signal. At least one of the plurality of the decoding steps is performed on the near end digital signal and the far end digital signal to generate at least partially decoded near end signals and at least partially decoded far end signals. The first parameter is adjusted in response to the at least partially decoded near end signals and at least partially decoded far end signals to generate an adjusted first parameter. The first parameter is replaced with the adjusted first parameter in the near end digital signal. The reading, generating and adjusting preferably are performed by a processor.
Another embodiment of the invention is useful in a communications system for transmitting a near end digital signal comprising code samples further comprising first bits using a compression code and second bits using a linear code. The code samples represent an audio signal having a plurality of audio characteristics. The system also transmits a far end digital signal. In such an environment, any echo in the near end digital signal can be reduced without decoding the compression code by adjusting the first bits and second bits in response to the near end digital signal and the far end digital signal.
BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic block diagram of a system for speech transmission in a GSM digital cellular network.
Figure 2 is a schematic block diagram of a system for speech transmission in a GSM network under tandem-free operation (TFO).
Figure 3 is a graph illustrating transmission of speech under tandem-free operation (TFO).
Figure 4 is a schematic block diagram of a traditional solution to an echo problem in a wireline network.
Figure 5 is a schematic block diagram illustrating acoustic feedback from a speaker to a microphone in a digital cellular telephone.
Figure 6 is a schematic block diagram of a traditional echo cancellation approach for coded speech.
Figure 7 is a schematic block diagram of a generic linear predictive code (LPC) speech synthesis model or speech decoder model.
Figure 8 is a diagram illustrating the arrangement of coded parameters in the bit stream for GSM FR.
Figure 9 is a schematic block diagram of a preferred form of coded domain echo control system for acoustic echo environments made in accordance with the invention. Figure 10 is a schematic block diagram of another preferred form of coded domain echo control system for echo due to 4-wire-to-2-wire hybrids made in accordance with the invention.
Figure 11 is a schematic block diagram of a simplified end path model with flat delay and attenuation.
Figure 12 is a graph illustrating a preliminary echo likelihood versus near end to far end subframe power ratio.
Figure 13 is a flow diagram illustrating a preferred form of coded domain echo control methodology.
Figure 14 is a graph illustrating an exemplary pitch synthesis filter magnitude frequency response.
Figure 15 is a graph illustrating exemplary magnitude frequency responses of an original LPC synthesis filter and flattened versions of such a filter.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The preferred embodiments will be described with reference to the following abbreviations:
Speech Synthesis Transfer Function
Although many non-linearities and heuristics are involved in the speech synthesis at the decoder, the following approximate transfer function may be attributed to the synthesis process:
The codebook vector, c(n) , is filtered by H(z) to result in the synthesized
speech. The key point to note about this generic LPC speech synthesis or decoder model for speech decoding is that the available coded parameters that can be modified to achieve echo control are:
1. c(n) : codebook vector
2. G : codebook gain
3. gp : pitch gain
4. T : pitch period 5. { k,k = l,...,M) : LPC coefficients
Most LPC-based vocoders use parameters similar to the above set, parameters that may be converted to the above forms, or parameters that are related to the above forms. For instance, the LPC coefficients in LPC-based vocoders may be represented using log-area ratios (e.g. the GSM FR) or line spectral frequencies (e.g. GSM EFR); both of these forms can be converted to LPC coefficients. An example of a case where a parameter is related to the above form is the block maximum parameter in the GSM FR vocoder; the block maximum can be considered to be directly proportional to the codebook gain in the model described by equation (1).
Thus, although the discussion of coded parameter modification methods is mostly limited to the generic speech decoder model, it is relatively straightforward to tailor these methods for any LPC-based vocoder, and possibly even other models.
It should also be clear that non-linear processing methods such as center- clipping used with uncoded speech for echo control cannot be used on the coded parameters because the coded parameter representation of the speech signal is
significantly different. Even the codebook vector signal, c(n) , is not amenable to
center-clipping due to the significant quantization involved. In many vocoders, the majority of the codebook vector samples are already zero while the non-zero pulses are highly quantized. Hence such non-linear processing approaches are not applicable or effective.
In this specification and claims, the terms linear code and compression code have the following meanings: Linear code: By a linear code, we mean a compression technique that results in one coded parameter or coded sample for each sample of the audio signal. Examples of linear codes are PCM (A-law and μ -law) ADPCM (adaptive differential pulse code modulation), and delta modulation.
Compression code: By a compression code, we mean a technique that results in fewer than one coded parameter for each sample of the audio signal. Typically, compression codes result in a small set of coded parameters for each block or frame of audio signal samples. Examples of compression codes are linear predictive coding based vocoders such as the GSM vocoders (HR, FR, EFR).
Coded Domain Echo Control
Overview
Figure 9 shows a novel implementation of coded domain echo control (CDEC) for a situation where acoustic echo is present. A communications system 10 transmits near end coded digital signals over a network 24 using a compression code, such as any of the codes used by the Codecs identified in Table 1. The compression code is generated by an encoder 16 from linear audio signals generated by a near end microphone 14 within a near end speaker handset 12. The compression code comprises parameters, such as the shown in Figure 8. The parameters represent an audio signal comprising a plurality of audio characteristics, including audio level and power. The compression code is decodable by various decoding steps. As will be explained, system 10 controls echo in the near end digital signals due to the presence of a far end digital signals transmitted by system 10 over a network 32. The echo is controlled with minimal delay and minimal, if any, decoding of the compression code parameters shown in Figure 8.
Near end digital signals using the compression code are received on a near end terminal 20, and digital signals using an adjusted compression code are transmitted by a near end terminal 22 over a network 24 to a far end handset (not shown) which includes a decoder (not shown) of the adjusted compression code. Note that the adjusted compression code is compatible with the original compression code. In other words, when the coded parameters are modified or adjusted, we term it the adjusted compression code, but it still is decodable using a standard decoder corresponding to the original compression code. A linear far end audio signal is encoded by a far end encoder (not shown) to generate far end digital signals using a compression code compatible with decoder 18, and is transmitted over a network 32 to a far end terminal 34. A decoder 18 of near end handset 12 decodes the far end digital signals. As shown in Figure 9, echo signals from the far end signals may find their way to encoder 16 of the near end handset 12 through acoustic feedback.
A processor 40 performs various operations on the near end and far end compression code. Processor 40 may be a microprocessor, microcontroller, digital signal processor, or other type of logic unit capable of arithmetic and logical operations.
For each type of codec, a different coded domain echo control algorithm 44 is executed by processor 40 at all times - under compressed mode and linear mode, during TFO as well as non-TFO. A partial decoder 48 is executed by processor 40 to read at least a first of the parameters received at terminal 20. Another partial decoder 46 is executed by processor 40 to generate at least partially decoded far end signals. Decoder 48 generates at least partially decoded near end signals. (Note that the compression codes used by the near end and far end signals may be different, and hence the partial decoders may also be different.) Based on the partial decoding, algorithm 44 generates an echo likelihood signal at least estimating the amount of echo in the near end digital signal. The echo likelihood signal varies over time since the amount of echo depends on the far end speech signal. The echo likelihood signal is used by algorithm 44 to adjust the parameter(s) read by algorithm 44. The adjusted parameter is written into the near end digital signal to form an adjusted near end digital signal which is transmitted from terminal 22 to network 24. In other words, the adjusted parameter is substitued for the originally read parameter. The partial decoders 46 and 48 shown within the Network ALC Device are algorithms executed by processor 40 and are codec-dependent.
The partial decoders operate on signals compressed using compression codes.
In the case where processor 40 is implemented in a TFO environment, partial decoder 46 may decode the linear code rather than the compression code. Also, in this case, partial decoder 48 decodes the linear code and only determines the coded parameters from the compression code without actually synthesizing the audio signal from the compression code.
Blocks 44, 46 and 48 also may be implemented as hardwired circuits.
Figure 10 shows that the Figure 9 embodiment can be useful for a system in which the echo is due to a 4-wire-to-2-wire hybrid.
The CDEC device/algorithm removes the effects of echo from the near-end coded speech by directly modifying the coded parameters in the bit-stream received from the near-end. Decoding of the near-end and far-end signals is performed in order to determine the likelihood of echo being present in the near-end. Certain statistics are measured from the decoded signals to determine this likelihood value.
Partial Decoding
The decoding of near-end and far-end signals may be complete or partial depending on the vocoder being used for the encode and decode operations. Some examples of situations where partial decoding suffices are listed below:
1. In code-excited linear prediction (CELP) vocoders, a post-filtering process is performed on the signal decoded using the LPC-based model. This post-filtering process reduces quantization noise. However, since it does not significantly affect the measurement of the statistics necessary for determining the likelihood of echo, the post- filtering stage can be avoided for economy.
2. Under TFO in GSM networks, the CDEC device may be placed between the base station and the switch (known as the A-interface) or between the two switches. Since the 6 MSBs of each 8-bit sample of the speech signal corresponds to the PCM code as shown in Figure 3, it is possible to avoid decoding the coded speech altogether in this situation. A simple table-lookup is sufficient to convert the 8-bit companded samples to 13-bit linear speech samples using A-law companding tables. This provides an economical way to obtain a version of the speech signal without invoking the appropriate decoder. Note that the speech signal obtained in this manner is somewhat noisy, but has been found to be adequate for the measurement of the statistics necessary for determining the likelihood of echo.
Determination Of Echo Likelihood
Assuming that some uncoded version (either fully or partially decoded) of the far-end and near-end signals are available, certain statistics are measured and used to determine the likelihood of echo being present in the near-end signal. The echo likelihood is estimated for each speech subframe, where the subframe duration is dependent on the vocoder being used. A preferred approach is described in this section.
A simplified model of the end-path is assumed as shown in Figure 11. The end-path is assumed to consist of a flat delay of τ samples and an echo return loss (ERL), λ .
In this model, sNE(n) and sFE(n) are the near-end and far-end uncoded
signals, respectively. It is assumed that the range of τ is known for a given implementation of CDEC, and is specified as follows:
This assumption is reasonable since the maximum and minimum end-path delays depend mostly on the speech encoding, speech decoding, channel encoding, channel decoding and other known transmission delays. The ERL range is assumed to be:
0 < λ < 1 (3) The echo likelihood estimation process uses the following variables:
PNE is the power of the current subframe of the near-end signal.
PFE(0) is the power of the current subframe of the far-end signal.
PFE(m) is the power of the mth subframe before the current subframe of the
far-end signal. In other words, a buffer of past values of far-end subframe power
values is maintained. The buffer size is Bm!Α = [r^ / N"| so that the subframe power
of the far-end signal up to the maximum possible end-path delay is available. Here N is the number of samples in a subframe.
R is the near-end to far-end subframe power ratio.
px is the preliminary echo likelihood.
p is the echo likelihood obtained by smoothing the preliminary echo
likelihood.
The echo likelihood is estimated for each subframe using the steps below. For some vocoders, particularly lower bit rate vocoders such as GSM HR, the processing may be more appropriately performed frame-by- frame rather than subframe-by- subframe.
Determine the power of sNE{ή) for the current subframe as
1 ^-. JV-l 2
P N N E F - — -h T >—I «=0 s NE (n ') . Determine the power of sFE(n) for the current subframe as
N '
Determine the near-end to far-end power ratio as
R = ^ where Bmn = I r^ / N I . The
denominator is essentially the maximum far-end subframe power measured during the expected end-path delay time period.
Shift the far-end power values in the buffer, i.e.
Determine the preliminary echo likelihood as
0 , for £ > 63
A = -0.016 ? + 1.008, for 0.5 < R ≤ 63.
1 , for R < 0.5
Smooth the preliminary echo likelihood to obtain the echo likelihood using
p = 0.9p + 0Λp,
The graph for the preliminary likelihood as a function of near-end to far-end subframe power ratio is shown in Figure 12.
Coded Parameter Modification
In this section, the preferred techniques for direct modification of the coded parameters based on the echo likelihood are described. The direct modification of each coded parameter of the generic speech decoder model of Figure 7 is first described. Then the corresponding method for modification of the parameters for a standard-based vocoder is described. As an example of a standard-based vocoder, the GSM FR vocoder is considered. After each parameter is modified and quantized according to the standard, the appropriate parameters in the bit-stream are modified appropriately. The preferred embodiment of the overall process is depicted in Figure 13.
Codebook Gain Modification
The codebook gain parameter, G , for each subframe is reduced by a scale factor depending on the echo likelihood, p , for the subframe. The modified codebook
gain parameter, denoted by Gnew , is given by:
Gnew = {l -P)G (4)
This parameter is then requantized according to the vocoder standard. Note that the codebook gain controls the overall level of the synthesized signal in the speech decoder model of Figure 7, and therefore controls the overall level of the corresponding audio signal. Attenuating the codebook gain in turn results in the attenuation of the echo.
For the GSM FR, the block maximum parameter, xmax , is directly proportional
to the codebook gain parameter of the generic model of Figure 7. Hence the modified block maximum parameter is computed as
^ma .new *s men requantized according to the method prescribed in the standard.
The resulting 6 bit value is reinserted at the appropriate positions in the bit-stream.
Codebook Vector Modification
The codebook vector, c(n) , is modified by randomizing the pulse positions
and amplitudes. Randomizing the codebook vector results in destroying the correlation properties of the echo. This has the effect of destroying much of the "speech-like" nature of the echo. The randomization is performed whenever the likelihood of echo is determined to be high, preferably when p > 0.8 . The
randomization may be performed using any suitable pseudo-random bit generation technique.
In the case of the GSM FR, the codebook vector for each subframe is determined by the RPE grid position parameter (2 bits) and 13 RPE pulses (3 bits each). These 41 bits are replaced with 41 random bits using a pseudo-random bit generator.
Pitch Synthesis Filter Modification
The pitch synthesis filter implements any period of long-term correlation in the speech signal, and is particularly important for modeling the harmonics of voiced speech. The model of this filter discussed in Figure 7 uses only two parameters, the
pitch period, T , and the pitch gain, gp . During voiced speech, the pitch period is relatively constant over several subframes or frames. The pitch gain in most vocoders ranges from zero to one or a small value above one (e.g. 1.2 in GSM EFR). During strong voiced speech, the pitch gain is at or near its maximum value.
If only echo is present in the near-end signal, the voiced harmonics of the echo are generally well modeled by the pitch synthesis filter; the likelihood of echo is detected to be high ( p > 0.8 ).
If both echo and near-end speech are present in the near-end signal during a frame period, the likelihood of echo is at moderate levels ( 0.5 ≤ p ≤ 0.8 ). In such
situations, the encoding process generally results in modeling the stronger of the two signals. It is reasonable to assume that, in most cases, the near-end speech is stronger than the echo. If this is the case, then the encoding process, due to its nature, tends to model mostly the near-end speech harmonics and little or none of the echo harmonics with the pitch synthesis filter.
In order to remove or mask voiced echo, the harmonic nature of the echo is destroyed. This is achieved by modifying the pitch synthesis filter parameters as follows:
The pitch period is randomized so that long-term correlation in the echo is removed, hence destroying the voiced nature of the echo. Such randomization is performed only when the likelihood of echo is high, preferably when p > 0.8 .
The pitch gain is reduced so as to control the strength of the harmonics or the strength of the long-term correlation in the audio signal. Such gain attenuation is preferably performed only when the likelihood of echo is at least moderate ( p > 0.5 ).
The new pitch gain is obtained as
gp,neW = (6)
Note that with this approach, the pitch period is not randomized during moderate echo likelihood but the pitch gain may be attenuated so that the voicing quality of the signal is not as strong.
Figure 14 shows the magnitude frequency responses of a pitch synthesis filter with pitch period 7 = 41. The dotted line is the response for a high pitch gain
( Sp - 0-75 ) and the solid line illustrates what happens when the pitch gain is
attenuated to gp = 0.3 . The strength of the harmonics and long-term correlation of an
audio signal can be controlled by modifying this parameter in this manner.
In the GSM FR vocoder, the LTP lag parameter of subframe j , denoted by
N , corresponds to the pitch period T of the model of Figure 7. N takes up 7 bits in
the bit-stream and can range from 40 to 120, inclusive. Hence when randomizing N} ,
it must be replaced with a random number that is also in this range.
The LTP gain parameter of subframe j of the GSM FR vocoder, denoted by
bj , corresponds to the pitch gain gp of Figure 7. The modified LTP gain parameter
is obtained in a manner similar to equation (6) as b j.„e e w w i ofther>w°is-e5 (7)
LPC Synthesis Filter Modification
In the generic speech decoder model of Figure 7, the LPC synthesis filter
transfer function is 1/1 l _4=1 α*z~* ) • This filter provides the broad spectral shaping
for the synthesized signal. The magnitude frequency response of this filter may be
flattened by replacing the coefficients [ak] with with Q ≤ β ≤ l . β is termed
the spectral morphing factor. In other words, the modified transfer function is
l/π -^^αt/? . Note that when β = 0, the original LPC synthesis filter is
transformed into an all-pass filter, and when β = \ , the original filter remains
unchanged. For all values of β between 0 and 1, the original filter magnitude
frequency response experiences some flattening, with greater flattening as β - 0.
Note that filter stability is maintained in this transformation.
The effect of such spectral morphing on echo is to reduce or remove any formant structure present in the signal. The echo is blended or morphed to sound like background noise. As an example, the LPC synthesis filter magnitude frequency response for a voiced speech segment and its flattened versions for several different values of β are shown in Figure 15.
In the preferred implementation, the spectral morphing factor β is determined
as
A similar spectral morphing method is obtained for other representations of the LPC filter coefficients commonly used in vocoders such as reflection coefficients, log-area ratios, inverse sines functions, and line spectral frequencies.
For example, the GSM FR vocoder utilizes log-area ratios for representing the
LPC synthesis filter. Given the 8 log-area ratios corresponding to a frame, denoted by LAR(Ϊ) , i = 1, 2, ...8 , the spectrally morphed log-area ratios are obtained using
LARnew(i) = βLAR(i) (9)
where β is determined according to equation (8). This method spectrally
flattens the LPC synthesis filter magnitude frequency response. Alternatively, in order to morph the log-area ratios towards a predetermined spectrum or magnitude frequency response, such as the background noise spectrum represented by a set of
log-area ratios denoted by LARn0lse{i) , the appropriate morphing equation is
LARnew (i) = βLAR (i) + (1 - β)LARnoιse (i) ( 10)
The modified log-area ratios are then quantized according to the specifications in the standard. Note that these approaches to modification of the log-area ratios preserve the stability of the LPC synthesis filter.
An exemplary approach for background noise spectrum estimation and representation of filter coefficients comprising log-area ratios corresponding to the vocoder and an LPC filter is provided in the comfort noise generation standard [5] and the references therein.
When line spectral frequencies are used for representing the LPC synthesis filter (e.g. the GSM EFR), an approach similar to that for log-area ratios is also
appropriate. Denote the line spectral frequencies by ft,i = \,..M , where M is the
order of the LPC synthesis filter which is assumed even (typical). When the line spectral frequencies are evenly spaced apart from 0 to half the sampling frequency, the resulting LPC synthesis filter will be all-pass (i.e. flat magnitude frequency response). Denote the set of line spectral frequencies corresponding to such a
spectrally flat LPC filter by fl βat,i = l,..,M . Then, the spectrally morphed line
spectral frequencies are obtained using
f,*„ = βf, + <\ -fl)f..fl« (H)
where β is determined according to equation (8). This method spectrally
flattens the LPC synthesis filter magnitude frerquency response. Alternatively, in order to morph the line spectral frequencies towards a predetermined spectrum or magnitude frequency response, such as the background noise spectrum represented by
a set of line spectral frequencies denoted by ft noιse , the appropriate morphing equation
is
/,*«, = βf, + Q -fl)fIJm. (12) The modified line spectral frequencies are then quantized according to the specifications in the standard. Note that these approaches to modification of the line spectral frequencies preserve the stability of the LPC synthesis filter. Appropriate techniques for background noise spectrum estimation and representation of filter coefficients comprising line spectral frequencies may be found in the corresponding vocoder standards on comfort noise generation.
Minimal Delay Technique
Large buffering, processing and transmission delays are already present in cellular networks without any network voice quality enhancement processing. Further network processing of the coded speech for speech enhancement purposes will add additional delay. Minimizing this delay is important to speech quality. In this section, a novel approach for minimizing the delay is discussed. The example used is the GSM FR vocoder.
Figure 8 shows the order in which the coded parameters from the GSM FR encoder are received. A straightforward approach involves buffering up the entire 260 bits for each frame and then processing these buffered bits for coded domain echo control purposes. However, this introduces a buffering delay of about 20ms plus the processing delay.
It is possible to minimize the buffering delay as follows. First, note that the entire first subframe can be decoded as soon as bit 92 is received. Hence the first subframe may be processed after about 7.1ms (20ms times 92/260) of buffering delay. Hence the buffering delay is reduced by almost 13ms. When using this novel low delay approach, the coded LPC synthesis filter parameters are modified based on information available at the end of the first subframe of the frame. In other words, the entire frame is affected by the echo likelihood computed based on the first subframe. In experiments conducted, no noticeable artifacts were found due to this 'early' decision, particularly because the echo likelihood is a smoothed quantity based effectively on several previous subframes as well as the current subframe.
Update of Error Correction/Detection Bits and Framing Bits
When applying the novel coded domain processing techiques described in this report for removing echo, some are all of the bits corresponding to the coded parameters are modified in the bit-stream. This may affect other error-correction or detection bits that may also be embedded in the bit-stream. For instance, a speech encoder may embed some checksums in the bit-stream for the decoder to verify to ensure that an error-free frame is received. Such checksums as well as any parity check bits, error correction or detection bits, and framing bits are updated in accordance with the appropriate standard, if necessary.
Operation under the GSM Tandem Free Operation Standard
If only the coded parameters are available, then partial or full decoding may be performed as explained earlier, whereby the coded parameters are used to reconstruct a version of the audio signal. However, when operating under a situation such as the
GSM TFO environment, additional information is available in addition to the coded parameters. This additional information is the 6 MSBs of the A-law PCM samples of the audio signal. In this case, these PCM samples may be used to reconstruct a version of the audio signal for both the far end and near end without using the coded parameters. This results in computational savings.
Those skilled in the art of communications will recognize that the preferred embodiments can be modified and altered without departing from the true spirit and scope of the invention as defined in the appended claims.

Claims

What is claimed is:
1. In a communications system for transmitting a near end digital signal using a compression code comprising a plurality of parameters including a first parameter, said parameters representing an audio signal comprising a plurality of audio characteristics, said compression code being decodable by a plurality of decoding steps, said communications system also transmitting a far end digital signal using a compression code, apparatus for reducing echo in said near end digital signal comprising: a processor responsive to said near end digital signal to read at least said first parameter of said plurality of parameters, to perform at least one of said plurality of decoding steps on said near end digital signal and said far end digital signal to generate at least partially decoded near end signals and at least partially decoded far end signals, responsive to said at least partially decoded near end signals and at least partially decoded far end signals to adjust said first parameter to generate an adjusted first parameter and to replace said first parameter with said adjusted first parameter in said near end digital signal.
2. Apparatus, as claimed in claim 1 , wherein said first parameter is a quantized first parameter and wherein said processor generates said adjusted first parameter in part by quantizing said adjusted first parameter before writing said adjusted first parameter into said near end digital signal.
3. Apparatus, as claimed in claim 1, wherein said processor is responsive to said at least partially decoded near end signals and said at least partially decoded far end signals to generate an echo likelihood signal representing the amount of echo present in said partially decoded near end signals, and wherein said processor is responsive to said echo likelihood signal to adjust said first parameter.
4. Apparatus, as claimed in claim 3, wherein said characteristics comprise spectral shape and wherein said first parameter comprises a representation of filter coefficients, and wherein said processor is responsive to said echo likelihood signal to adjust said representation of filter coefficients towards a magnitude frequency response.
5. Apparatus, as claimed in claim 4, wherein said representation of filter coefficients comprises line spectral frequencies.
6. Apparatus, as claimed in claim 4, wherein said representation of filter coefficients comprises log area ratios.
7. Apparatus, as claimed in claim 4, wherein said magnitude frequency response corresponds to background noise.
8. Apparatus, as claimed in claim 1 , wherein said characteristics comprise the overall level of said audio signal and wherein said first parameter comprises codebook gain.
9. Apparatus, as claimed in claim 1 , wherein said first parameter comprises a codebook vector parameter.
10. Apparatus, as claimed in claim 1 , wherein said characteristics comprise period of long-term correlation and wherein said first parameter comprises a pitch period parameter.
11. Apparatus, as claimed in claim 1 , wherein said characteristics comprise strength of long-term correlation and wherein said first parameter comprises a pitch gain parameter.
12. Apparatus, as claimed in claim 1, wherein said characteristics comprise spectral shape and wherein said first parameter comprises a representation of filter coefficients.
13. Apparatus, as claimed in claim 12, wherein said representation of filter coefficients comprises log area ratios.
14. Apparatus, as claimed in claim 12, wherein said representation of filter coefficients comprises line spectral frequencies.
15. Apparatus, as claimed in claim 12, wherein said representation of filter coefficients corresponds to a linear predictive coding synthesis filter.
16. Apparatus, as claimed in claim 1 , wherein said first parameter corresponds to a first characteristic of said plurality of audio characteristics, wherein said plurality of decoding steps comprises at least one decoding step avoiding substantial altering of said first characteristic and wherein said processor avoids performing said at least one decoding step.
17. Apparatus, as claimed in claim 16, wherein said audio characteristic comprises power and wherein said first characteristic comprises power.
18. Apparatus, as claimed in claim 16, wherein said at least one decoding step comprises post-filtering.
19. Apparatus, as claimed in claim 1, wherein said compression code comprises a linear predictive code.
20. Apparatus, as claimed in claiml, wherein said compression code comprises regular pulse excitation - long term prediction code.
21. Apparatus, as claimed in claiml , wherein said compression code comprises code-excited linear prediction code.
22. Apparatus, as claimed in claim 1, wherein said first parameter comprises a series of first parameters received over time, wherein said processor is responsive to said near end digital signal to read said series of first parameters, and wherein said processor is responsive to said at least partially decoded near end and far end signals and to at least a plurality of said series of first parameters to generate said adjusted first parameter.
23. Apparatus, as claimed in claim 1 , wherein said compression code is arranged in frames of said digital signals and wherein said frames comprise a plurality of subframes each comprising said first parameter, wherein said processor is responsive to said compression code to read at least said first parameter from each of said plurality of subframes, and wherein said processor replaces said first parameter with said adjusted first parameter in each of said plurality of subframes.
24. Apparatus, as claimed in claim 23, wherein said processor reads said first parameter from a first of said subframes, begins to perform at least a plurality of said decoding steps on said near end digital signal during said first subframe and replaces said first parameter with said adjusted first parameter before processing a subframe following the first subframe so as to achieve lower delay.
25. Apparatus, as claimed in claim 1, wherein said compression code is arranged in frames of said digital signals and wherein said frames comprise a plurality of subframes each comprising said first parameter, wherein said processor performs at least a plurality of said decoding steps during a first of said subframes to generate said at least partially decoded near end and far end signals, reads said first parameter from a second of said subframes occurring subsequent to said first subframe, generates said adjusted first parameter in response to said at least partially decoded near end and far end signals and said first parameter, and replaces said first parameter of said second subframe with said adjusted first parameter.
26. In a communications system for transmitting a near end digital signal comprising code samples, said code samples comprising first bits using an compression code and second bits using a linear code, said code samples representing an audio signal, said audio signal having a plurality of audio characteristics, said system also transmitting a far end digital signal, apparatus for reducing echo in said near end digital signal without decoding said compression code comprising:
27. a processor responsive to said near end digital signal and said far end digital signal to adjust said first bits and said second bits.
28. Apparatus, as claimed in claim 26, wherein said linear code comprises pulse code modulation (PCM) code.
29. Apparatus, as claimed in claim 26, wherein said compression code samples conform to the tandem- free operation of the global system for mobile communications standard.
30. Apparatus, as claimed in claim 26, wherein said first bits comprise the two least significant bits of said samples and wherein said second bits comprise the 6 most significant bits of said samples.
31. Apparatus, as claimed in claim 29, wherein said 6 most significant bits comprise PCM code.
32. In a communications system for transmitting a near end digital signal using a compression code comprising a plurality of parameters including a first parameter, said parameters representing an audio signal comprising a plurality of audio characteristics, said compression code being decodable by a plurality of decoding steps, said communications system also transmitting a far end digital signal using a compression code, a method of reducing echo in said near end digital signal comprising: reading at least said first parameter of said plurality of parameters in response to said near end digital signal; performing at least one of said plurality of decoding steps on said near end digital signal and said far end digital signal to generate at least partially decoded near end signals and at least partially decoded far end signals; adjusting said first parameter in response to said at least partially decoded near end signals and at least partially decoded far end signals to generate an adjusted first parameter; and replacing said first parameter with said adjusted first parameter in said near end digital signal.
33. A method, as claimed in claim 31 , wherein said first parameter is a quantized first parameter and wherein said adjusting comprises generating said adjusted first parameter in part by quantizing said adjusted first parameter.
34. A method, as claimed in claim 31 , wherein said adjusting comprises generating an echo likelihood signal representing the amount of echo present in said partially decoded near end signals in response to said at least partially decoded near end signals and said at least partially decoded far end signals, and wherein said adjusting further comprises adjusting said first parameter in response to said echo likelihood signal.
35. A method, as claimed in claim 33, wherein said characteristics comprise spectral shape and wherein said first parameter comprises a representation of filter coefficients, and wherein said adjusting comprises adjusting said representation of filter coefficients towards a magnitude frequency response in response to said echo likelihood signal.
36. A method, as claimed in claim 34, wherein said representation of filter coefficients comprises line spectral frequencies.
37. A method, as claimed in claim 34, wherein said representation of filter coefficients comprises log area ratios.
38. A method, as claimed in claim 34, wherein said magnitude frequency response corresponds to background noise.
39. A method, as claimed in claim 31 , wherein said characteristics comprise the overall level of said audio signal and wherein said first parameter comprises codebook gain.
40. A method, as claimed in claim 31 , wherein said first parameter comprises a codebook vector parameter.
41. A method, as claimed in claim 31 , wherein said characteristics comprise period of long-term coπelation and wherein said first parameter comprises a pitch period parameter.
42. A method, as claimed in claim 31 , wherein said characteristics comprise strength of long-term coπelation and wherein said first parameter comprises a pitch gain parameter.
43. A method, as claimed in claim 31 , wherein said characteristics comprise spectral shape and wherein said first parameter comprises a representation of filter coefficients.
44. A method, as claimed in claim 42, wherein said representation of filter coefficients comprises log area ratios.
45. A method, as claimed in claim 42, wherein said representation of filter coefficients comprises line spectral frequencies.
46. A method, as claimed in claim 42, wherein said representation of filter coefficients corresponds to a linear predictive coding synthesis filter.
47. A method, as claimed in claim 31 , wherein said first parameter coπesponds to a first characteristic of said plurality of audio characteristics, wherein said plurality of decoding steps comprises at least one decoding step avoiding substantial altering of said first characteristic and wherein said performing at least a plurality of said decoding steps comprises avoiding performing said at least one decoding step.
48. A method, as claimed in claim 46, wherein said audio characteristic comprises power and wherein said first characteristic comprises power.
49. A method, as claimed in claim 46, wherein said at least one decoding step comprises post-filtering.
50. A method, as claimed in claim 31 , wherein said compression code comprises a linear predictive code.
51. A method, as claimed in claim 31 , wherein said compression code comprises regular pulse excitation - long term prediction code.
52. A method, as claimed in claim 31 , wherein said compression code comprises code-excited linear prediction code.
53. A method, as claimed in claim 31 , wherein said first parameter comprises a series of first parameters received over time, wherein said reading comprises reading said series of first parameters, and wherein said adjusting comprises generating said adjusted first parameter in response to said at least partially decoded near end and far end signals and to at least a plurality of said series of first parameters.
54. A method, as claimed in claim 31 , wherein said compression code is aπanged in frames of said digital signals and wherein said frames comprise a plurality of subframes each comprising said first parameter, wherein said reading comprises reading at least said first parameter from each of said plurality of subframes in response to said compression code, and wherein said replacing comprises replacing said first parameter with said adjusted first parameter in each of said plurality of subframes.
55. A method, as claimed in claim 53, wherein said reading comprises reading said first parameter from a first of said subframes, wherein said performing comprises beginning to perform at least a plurality of said decoding steps on said near end digital signal during said first subframe and wherein said replacing comprises replacing said first parameter with said adjusted first parameter before processing a subframe following the first subframe so as to achieve lower delay.
56. A method, as claimed in claim 31 , wherein said compression code is arranged in frames of said digital signals and wherein said frames comprise a plurality of subframes each comprising said first parameter, wherein said performing comprises performing at least a plurality of said decoding steps during a first of said subframes to generate said at least partially decoded near end and far end signals, wherein said reading comprises reading said first parameter from a second of said subframes occurring subsequent to said first subframe, wherein said adjusting comprises generating said adjusted first parameter in response to said at least partially decoded near end and far end signals and said first parameter, and wherein said replacing comprises replacing said first parameter of said second subframe with said adjusted first parameter.
57. In a communications system for transmitting a near end digital signal comprising code samples, said code samples comprising first bits using a compression code and second bits using a linear code, said code samples representing an audio signal, said audio signal having a plurality of audio characteristics, said system also transmitting a far end digital signal, a method of reducing echo in said near end digital signal without decoding said compression code comprising: adjusting said first bits and said second bits in response to said near end digital signal and said far end digital signal.
58. A method, as claimed in claim 56, wherein said linear code comprises pulse code modulation (PCM) code.
59. A method, as claimed in claim 56, wherein said compression code samples conform to the tandem- free operation of the global system for mobile communications standard.
60. A method, as claimed in claim 56, wherein said first bits comprise the two least significant bits of said samples and wherein said second bits comprise the 6 most significant bits of said samples.
61. A method, as claimed in claim 59, wherein said 6 most significant bits comprise PCM code.
EP00948555A 1999-07-02 2000-06-30 Coded domain echo control Withdrawn EP1190495A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14213699P 1999-07-02 1999-07-02
US142136P 1999-07-02
PCT/US2000/018104 WO2001003316A1 (en) 1999-07-02 2000-06-30 Coded domain echo control

Publications (1)

Publication Number Publication Date
EP1190495A1 true EP1190495A1 (en) 2002-03-27

Family

ID=22498680

Family Applications (3)

Application Number Title Priority Date Filing Date
EP00948555A Withdrawn EP1190495A1 (en) 1999-07-02 2000-06-30 Coded domain echo control
EP00946994A Withdrawn EP1190494A1 (en) 1999-07-02 2000-06-30 Coded domain adaptive level control of compressed speech
EP00946954A Pending EP1208413A2 (en) 1999-07-02 2000-06-30 Coded domain noise control

Family Applications After (2)

Application Number Title Priority Date Filing Date
EP00946994A Withdrawn EP1190494A1 (en) 1999-07-02 2000-06-30 Coded domain adaptive level control of compressed speech
EP00946954A Pending EP1208413A2 (en) 1999-07-02 2000-06-30 Coded domain noise control

Country Status (5)

Country Link
EP (3) EP1190495A1 (en)
JP (3) JP2003503760A (en)
AU (3) AU6067100A (en)
CA (3) CA2378012A1 (en)
WO (3) WO2001003317A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1301018A1 (en) * 2001-10-02 2003-04-09 Alcatel Apparatus and method for modifying a digital signal in the coded domain
JP3946074B2 (en) * 2002-04-05 2007-07-18 日本電信電話株式会社 Audio processing device
JP3876781B2 (en) 2002-07-16 2007-02-07 ソニー株式会社 Receiving apparatus and receiving method, recording medium, and program
EP1521242A1 (en) * 2003-10-01 2005-04-06 Siemens Aktiengesellschaft Speech coding method applying noise reduction by modifying the codebook gain
US7613607B2 (en) 2003-12-18 2009-11-03 Nokia Corporation Audio enhancement in coded domain
US8874437B2 (en) 2005-03-28 2014-10-28 Tellabs Operations, Inc. Method and apparatus for modifying an encoded signal for voice quality enhancement
JP5312030B2 (en) * 2005-10-31 2013-10-09 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Method and apparatus for reducing delay, echo canceller apparatus, and noise suppression apparatus
US7852792B2 (en) * 2006-09-19 2010-12-14 Alcatel-Lucent Usa Inc. Packet based echo cancellation and suppression
JP4915575B2 (en) * 2007-05-28 2012-04-11 パナソニック株式会社 Audio transmission system
JP4915576B2 (en) * 2007-05-28 2012-04-11 パナソニック株式会社 Audio transmission system
JP4915577B2 (en) * 2007-05-28 2012-04-11 パナソニック株式会社 Audio transmission system
WO2009029076A1 (en) * 2007-08-31 2009-03-05 Tellabs Operations, Inc. Controlling echo in the coded domain
CN102726034B (en) 2011-07-25 2014-01-08 华为技术有限公司 A device and method for controlling echo in parameter domain
TWI469135B (en) * 2011-12-22 2015-01-11 Univ Kun Shan Adaptive differential pulse code modulation (adpcm) encoding and decoding method
JP6011188B2 (en) * 2012-09-18 2016-10-19 沖電気工業株式会社 Echo path delay measuring apparatus, method and program
JP6816277B2 (en) * 2017-07-03 2021-01-20 パイオニア株式会社 Signal processing equipment, control methods, programs and storage media

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0683114B2 (en) * 1985-03-08 1994-10-19 松下電器産業株式会社 Eco-Cancer
US4969192A (en) * 1987-04-06 1990-11-06 Voicecraft, Inc. Vector adaptive predictive coder for speech and audio
US5140543A (en) * 1989-04-18 1992-08-18 Victor Company Of Japan, Ltd. Apparatus for digitally processing audio signal
US5097507A (en) * 1989-12-22 1992-03-17 General Electric Company Fading bit error protection for digital cellular multi-pulse speech coder
US5680508A (en) * 1991-05-03 1997-10-21 Itt Corporation Enhancement of speech coding in background noise for low-rate speech coder
JP3353257B2 (en) * 1993-08-30 2002-12-03 日本電信電話株式会社 Echo canceller with speech coding and decoding
US5828995A (en) * 1995-02-28 1998-10-27 Motorola, Inc. Method and apparatus for intelligible fast forward and reverse playback of time-scale compressed voice messages
JPH0954600A (en) * 1995-08-14 1997-02-25 Toshiba Corp Voice-coding communication device
JPH0993132A (en) * 1995-09-27 1997-04-04 Toshiba Corp Device and method for coding decoding
JPH10143197A (en) * 1996-11-06 1998-05-29 Matsushita Electric Ind Co Ltd Reproducing device
JP3283200B2 (en) * 1996-12-19 2002-05-20 ケイディーディーアイ株式会社 Method and apparatus for converting coding rate of coded audio data
US5943645A (en) * 1996-12-19 1999-08-24 Northern Telecom Limited Method and apparatus for computing measures of echo
US6064693A (en) * 1997-02-28 2000-05-16 Data Race, Inc. System and method for handling underrun of compressed speech frames due to unsynchronized receive and transmit clock rates
JP3317181B2 (en) * 1997-03-25 2002-08-26 ヤマハ株式会社 Karaoke equipment
US6112177A (en) * 1997-11-07 2000-08-29 At&T Corp. Coarticulation method for audio-visual text-to-speech synthesis
EP2154679B1 (en) * 1997-12-24 2016-09-14 BlackBerry Limited Method and apparatus for speech coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0103316A1 *

Also Published As

Publication number Publication date
AU6067100A (en) 2001-01-22
CA2378035A1 (en) 2001-01-11
JP2003503760A (en) 2003-01-28
CA2378062A1 (en) 2001-01-11
WO2001003317A1 (en) 2001-01-11
EP1208413A2 (en) 2002-05-29
WO2001003316A1 (en) 2001-01-11
WO2001002929A3 (en) 2001-07-19
AU6063600A (en) 2001-01-22
JP2003533902A (en) 2003-11-11
CA2378012A1 (en) 2001-01-11
EP1190494A1 (en) 2002-03-27
WO2001002929A2 (en) 2001-01-11
AU6203300A (en) 2001-01-22
JP2003504669A (en) 2003-02-04

Similar Documents

Publication Publication Date Title
RU2325707C2 (en) Method and device for efficient masking of deleted shots in speech coders on basis of linear prediction
JP3842821B2 (en) Method and apparatus for suppressing noise in a communication system
US20060215683A1 (en) Method and apparatus for voice quality enhancement
US20070160154A1 (en) Method and apparatus for injecting comfort noise in a communications signal
EP1190495A1 (en) Coded domain echo control
JPH09204199A (en) Method and device for efficient encoding of inactive speech
US20060217969A1 (en) Method and apparatus for echo suppression
JPH02155313A (en) Coding method
US8874437B2 (en) Method and apparatus for modifying an encoded signal for voice quality enhancement
JP2003533902A5 (en)
WO2006104692A1 (en) Method and apparatus for modifying an encoded signal
US20060217970A1 (en) Method and apparatus for noise reduction
US20060217988A1 (en) Method and apparatus for adaptive level control
US20030065507A1 (en) Network unit and a method for modifying a digital signal in the coded domain
CA2244008A1 (en) Nonlinear filter for noise suppression in linear prediction speech pr0cessing devices
US20060217971A1 (en) Method and apparatus for modifying an encoded signal
EP1020848A2 (en) Method for transmitting auxiliary information in a vocoder stream
US6141639A (en) Method and apparatus for coding of signals containing speech and background noise
EP1112568A1 (en) Speech coding with background noise reproduction
Chandran et al. Compressed domain noise reduction and echo suppression for network speech enhancement
CN105632504A (en) ADPCM codec and method of packet loss concealment in ADPCM codec
JP3163567B2 (en) Voice coded communication system and apparatus therefor
Wada et al. Measurement of the effects of nonlinearities on the network-based linear acoustic echo cancellation
Kulakcherla Non linear adaptive filters for echo cancellation of speech coded signals
Åkerberg et al. Audio Techniques

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020111

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20031231