EP1190494A1 - Adaptive pegelregelung komprimierter sprache im kodebereich - Google Patents

Adaptive pegelregelung komprimierter sprache im kodebereich

Info

Publication number
EP1190494A1
EP1190494A1 EP00946994A EP00946994A EP1190494A1 EP 1190494 A1 EP1190494 A1 EP 1190494A1 EP 00946994 A EP00946994 A EP 00946994A EP 00946994 A EP00946994 A EP 00946994A EP 1190494 A1 EP1190494 A1 EP 1190494A1
Authority
EP
European Patent Office
Prior art keywords
parameter
adjusted
near end
characteristic
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP00946994A
Other languages
English (en)
French (fr)
Inventor
Ravi Chandran
Bruce E. Dunne
Daniel J. Marchok
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Coriant Operations Inc
Original Assignee
Tellabs Operations Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tellabs Operations Inc filed Critical Tellabs Operations Inc
Publication of EP1190494A1 publication Critical patent/EP1190494A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M9/00Arrangements for interconnection not involving centralised switching
    • H04M9/08Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
    • H04M9/082Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B3/00Line transmission systems
    • H04B3/02Details
    • H04B3/20Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0001Systems modifying transmission characteristics according to link quality, e.g. power backoff
    • H04L1/0014Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the source coding

Definitions

  • the present invention relates to coded domain enhancement of compressed speech and in particular to coded domain adaptive level control and noise reduction in the coded domain.
  • Network enhancement of coded speech would normally require decoding, linear processing and re-encoding of the processed signal. Such a method is illustrated in Figure 1 and is very expensive. Moreover, the encoding process is often an order of magnitude more computationally intensive than the speech enhancement methods. Speech compression is increasingly used in telecommunications, especially in cellular telephony and voice over packet networks.
  • Past network speech enhancement techniques which operate in the linear domain have several shortcomings. For example, past network speech enhancement techniques which operate in the linear domain require decoding of compressed speech, performing the necessary enhancements and re-encoding of the speech.
  • PSTN Public Switched Telephone Network
  • Telephony customers expect a comfortable listening level to maximize comprehension of their conversation.
  • the transmitted speech level from a telephone instrument depends on the speaker's volume and the position of the speaker relative to the microphone. If volume control is available on the telephone instrument, the listener could manually adjust it to a desirable level. However, for historical reasons, most telephone instruments do not have volume controls. Also, direct volume control by the listener does not address the need to maintain appropriate levels for network equipment. Furthermore, as technology is progressing towards the era of hands-free telephony especially in the case of mobile phones in vehicles, manual adjustment is considered cumbersome and potentially hazardous to the vehicle operators.
  • Maintaining speech quality has generally been the responsibility of network service providers; telephone instrument manufacturers typically have played a relatively minor role in meeting such responsibility.
  • network service providers have provided tight specifications for equipment and networks with regard to speech levels.
  • the network service providers have to ensure the proper speech levels with lesser influence over specifications and equipment used in other networks.
  • Figure 2 shows the network configuration of a linear domain ALC device 202.
  • the ALC device processes the near-end speech signal (at port Sin).
  • the far-end signal (at port Rin) is used for determining double-talk.
  • ALC device 202 processes a digital near end speech signal in a typical transmission network and determines the gain required to attain a target speech level by measuring the current speech level. Numerous algorithms can be devised to determine a suitable gain. For example, the
  • ALC device could use a voice activity detector and apply new gain values only at the beginning of speech bursts. Furthermore, the maximum and minimum gain, and the maximum rate of change of the gain may all be constrained. In general, ALC devices utilize (1) some form of power measurement scheme on the near end signal to determine the current speech level, (2) a voice activity detector on the near end signal to demarcate speech bursts, and possibly (3) a double-talk detector on the far and near end signals to determine whether the near end signal contains echo.
  • the ALC device determines the gain required to attain the target speech level by measuring the current speech level. Each digitized speech sample is multiplied by a gain factor. The double-talk information is used to prevent adjusting the gain factor erroneously based on echo.
  • Tellabs algorithms/products for level control include
  • TFO Tandem Free Operation
  • GSM Global System for Mobile Communications
  • the TFO standard applies to mobile-to-mobile calls.
  • the speech signal is conveyed between mobiles in a compressed form after a brief negotiation period.
  • the compressed speech is contained in TFO frames which bypass the transcoders in the network.
  • the elimination of tandem codecs is known to improve speech quality in the case where the original signal is clean. Even in the case of clean speech, it may still be desirable to adjust the speech level to a suitable loudness level.
  • Traditional methods for such level control would require decoding, processing and re-encoding the speech, which results in tandeming and is computationally-intensive.
  • the coded domain approach avoids such tandeming and eliminates the need for full re-encoding. This document describes methods for speech level control in the coded domain.
  • level control in conjunction with the GSM FR and EFR coders is addressed.
  • One preferred embodiment is useful in a communications system for transmitting digital signals using a compression code comprising a predetermined plurality of parameters including a first parameter, the parameters representing an audio signal comprising a plurality of audio characteristics including a first characteristic, the first parameter being related to the first characteristic, the compression code being decodable by a plurality of decoding steps including a first decoding step for decoding the parameters related to the first characteristic.
  • the first characteristic may be adjusted by reading at least the first parameter in response to the digital signals. At least a first parameter value is derived from the first parameter. An adjusted first parameter value representing an adjustment of the first characteristic is generated in response to the digital signals and the first parameter value.
  • An adjusted first parameter is derived in response to the adjusted first parameter value, and the first parameter of the compression code is replaced with the adjusted first parameter.
  • the preceding steps of reading, deriving, generating and replacing preferably are performed by a processor. As a result of the foregoing technique, the delay required to adjust the first characteristic may be reduced.
  • a second preferred embodiment is useful in a communication system for transmitting digital signals comprising code samples comprising first bits using a compression code and second bits using a linear code.
  • the code samples represent an audio signal have a plurality of audio characteristics, including a first characteristic.
  • the first characteristic may be adjusted without decoding the compression code by adjusting the first bits and the second bits in response to the second bits.
  • the adjusting preferably is performed with a processor.
  • Figure 1 is a schematic block diagram of a system for network enhancement of coded speech in the linear domain.
  • Figure 2 is a schematic block diagram of a system for automatic level control (ALC).
  • ALC automatic level control
  • FIG. 3 is a schematic block diagram of a linear predictive coding (LPC) speech synthesis model.
  • LPC linear predictive coding
  • Figure 4 is a schematic block diagram distinguishing coded domain digital speech parameters from linear domain digital speech samples.
  • Figure 5 is a schematic block diagram of a coded domain ALC system.
  • Figure 6 is a graph illustrating GSM full rate codec quantization levels for block maxima.
  • Figure 7a is a schematic block diagram of a backward adaptive standard deviation based quantizer.
  • Figure 7b is a schematic block diagram of a backward adaptive differential based quantizer.
  • Figure 8 is a schematic block diagram of an adaptive differential quantizer using a linear predictor.
  • Figure 9 is a schematic block diagram of a GSM enhanced full rate SLRP quantizer.
  • Figure 10 is a graph illustrating GSM enhanced full rate codec quantization levels for a gain correction factor.
  • Figure 11 is a schematic block diagram of one technique for performing ALC.
  • Figure 12 is a schematic block diagram of one technique for coded domain ALC.
  • Figure 13 is a flow diagram illustrating a technique for overflow/underflow prevention.
  • Figure 14 is a schematic block diagram of a preferred form of ALC system using feedback of the realized gain in ALC algorithms requiring past gain values.
  • Figure 15 is a schematic block diagram of one form of a coded domain ALC device.
  • Figure 16 is a schematic block diagram of a system for instantaneous scalar requantization for a GSM FR codec.
  • Figure 17 is a schematic block diagram of a system for differential scalar requantization for a GSM EFR codec.
  • Figure 18a is a graph showing a step in desired gain.
  • Figure 18b is a graph showing actual realized gain superimposed on the desired gain with a quantizer in the feedback loop.
  • Figure 18c is a graph showing actual realized gain superimposed on the desired gain resulting from placing a quantizer outside the feedback loop shown in Figure 19.
  • Figure 19 is a schematic block diagram of an ALC device showing a quantizer placed outside the feedback loop.
  • Figure 20 is a schematic block diagram of a simplified version of the ALC device shown in Figure 19.
  • Figure 21a is a schematic block diagram of a coded domain ALC implementation for ALC algorithms using feedback of past gain values with a quantizer in the feedback loop.
  • Figure 21b is a schematic block diagram of a coded domain ALC implementation for ALC algorithms using feedback of past gain values with a quantizer outside the feedback loop.
  • Figure 22 is a graph showing spacing between adjacent Ri values in an EFR codec, and more specifically showing EFR Codec SLRPs: (Ri + ] - R; ) against i.
  • Figure 23 a is a diagram of a compressed speech frame of an EFR encoder illustrating the times at which various bits are received and the earliest possible decoding of samples as a buffer is filled from left to right.
  • Figure 23b is a diagram of a compressed speech frame of an FR encoder illustrating the times at which various bits are received and the earliest possible decoding of samples as a buffer is filled from left to right.
  • Figure 24 is a schematic block diagram of a preferred form of coded domain ALC system made in accordance with the invention.
  • Figure 25 is a schematic block diagram of a preferred form of SLRP quantization in GSM EFR.
  • Figure 26 is a schematic block diagram of an alternative form of SLRP quantization in GSM EFR.
  • Figure 27 is a schematic block diagram of a preferred form of re-encoding the SLRP in GSM EFR.
  • Figure 28 is a graph illustrating an exemplary speech signal.
  • Figure 29 is a graph illustrating exemplary speech level adjustment with CD- ALC for FR.
  • GSM 06.10 Digital cellular telecommunication system (Phase 2); Full rate speech; Part 2: Transcoding", March 1998.
  • GSM 06.60 Digital cellular telecommunications system (Phase 2); Enhanced Full Rate (EFR) speech transcoding", June 1998.
  • speech signals are digitally sampled prior to transmission.
  • Such digital (i.e. discrete-time discrete- valued) signals are referred to in this specification as being in the linear domain or in linear mode.
  • the adjustment of the speech levels in such linear domain signals is accomplished by multiplying every sample of the signal by an appropriate gain factor to attain the desired target speech level.
  • Linear echo or acoustic echo may be present in the near end signal depending on the type of end path in the network. If such echo has significant power and is not already cancelled by an echo canceller, then a double-talk detector may also be required. This is to ensure that the gain is not inadvertently increased due to the echo of the far end speech signal.
  • Digital speech signals that are typically carried in telephony networks usually undergo a basic form of compression such as pulse code modulation (PCM) before transmission.
  • PCM pulse code modulation
  • Such compression schemes are very inexpensive in terms of computations and delay. It is a relatively simple matter for the ALC device to convert the compressed digital samples to the linear domain, process the linear samples, and then compress the processed samples before transmission. As such, these signals can effectively be considered to be in the linear domain.
  • compressed, or coded speech will refer to speech that is compressed using advanced compression techniques that require significant computational complexity.
  • linear code and compression code mean the following:
  • Linear code By a linear code, we mean a compression technique that results in one coded parameter or coded sample for each sample of the audio signal. Examples of linear codes are PCM (A-law and ⁇ -law) ADPCM (adaptive differential
  • Compression code By a compression code, we mean a technique that results in fewer than one coded parameter for each sample of the audio signal. Typically, compression codes result in a small set of coded parameters for each block or frame of audio signal samples. Examples of compression codes are linear predictive coding based vocoders such as the GSM vocoders (HR, FR, EFR).
  • Speech compression which falls under the category of lossy source coding, is commonly referred to as speech coding.
  • Speech coding is performed to minimize the bandwidth necessary for speech transmission. This is especially important in wireless telephony where bandwidth is a scarce resource.
  • speech coding is still important to minimize network delay and jitter. This is because speech communication, unlike data, is highly intolerant of delay. Hence a smaller packet size eases the transmission through a packet network.
  • Table 1 Several industry standard speech codecs (coder-decoder pairs) are listed in Table 1 for reference.
  • a set of consecutive digital speech samples is referred to as a speech frame.
  • a speech encoder determines a small set of parameters for a speech synthesis model. With these speech parameters and the speech synthesis model, a speech frame can be reconstructed that appears and sounds very similar to the original speech frame. The reconstruction is performed by the speech decoder. It should be noted that, in most speech coders, the encoding process is much more computationally intensive than the decoding process. Furthermore, the MIPs required to attain good quality speech coding is very high. The processing capabilities of digital signal processing chipsets have advanced sufficiently only in recent years to enable the widespread use of speech coding in applications such as cellular telephone handsets.
  • the speech parameters determined by the speech encoder depend on the speech synthesis model used.
  • the coders in Table 1 utilize linear predictive coding (LPC) models.
  • LPC linear predictive coding
  • a block diagram of a simplified view of the LPC speech synthesis model is shown in Figure 3.
  • This model can be used to generate speech-like signals by specifying the model parameters appropriately.
  • the parameters include the time-varying filter coefficients, pitch periods, excitation vectors and gain factors. Basically, the excitation vector, c(n), is first scaled by the gain factor, G. The result is then filtered by a pitch synthesis
  • Other models such as the multiband excitation model are also used in speech coding. In this context, it suffices to note that the speech parameters together with the assumed model provide a means to remove the redundancies in the digital speech signal so as to achieve compression.
  • the overall DC gain is provided by G and ALC would primarily involve modifying G.
  • SLRPs speech level related parameters
  • the first three GSM codecs in Table 1 will now be discussed. All of the first three coders process speech sampled at 8kHz and assume that the samples are obtained as 13-bit linear PCM values.
  • the frame length is 160 samples (20ms).
  • the SLRP may be specified each subframe (e.g. the GSM FR and EFR codecs) or once per frame (e.g. the GSM HR codec).
  • ⁇ gc and y gc are the unquantized and quantized gain correction factors in the GSM
  • the quantized and corresponding unquantized parameters are related through
  • the quantization function is a many-to-one transformation and is not
  • FIG 4 that Figure distinguishes the coded domain from the linear domain.
  • the coded domain refers to the output of speech encoders or the input of the speech decoders, which should be identical if there are no channel errors.
  • the coded domain includes both the speech parameters and the methods used to quantize or dequantize these parameters.
  • the speech parameters that are determined by the encoder undergo a quantization process prior to transmission. This quantization is critical to achieving bit rates lower than that required by the original digital speech signal. The quantization process often involves the use of look-up tables. Furthermore, different speech parameters may be quantized using different techniques.
  • Processing of speech in the coded domain involves directly modifying the quantized speech parameters to a different set of quantized values allowed by the quantizer for each of the parameters.
  • the parameters being modified are the SLRPs.
  • the coded domain counterpart to the linear domain ALC configuration of Figure 2 is shown in Figure 5. Note that the codecs used for the two directions of transmission shown may not be identical. Furthermore, the codecs used may change over time. Hence the coded domain ALC algorithm preferably operates robustly under such changing conditions.
  • the quantization of a single speech parameter is termed scalar quantization.
  • Vector quantization is usually applied to a set of parameters that are related to each other in some way such as the LPC coefficients.
  • Scalar quantization is generally applied to a parameter that is relatively independent of the other parameters. A mixture of both types of quantization methods is also possible.
  • SLRPs are usually scalar quantized, focus is placed on the most commonly used scalar quantization techniques.
  • the quantization process is independent of the past and future values of the parameter. Only the current value of the parameter is used in the quantization process.
  • the parameter to be quantized is compared to a set of permitted quantization levels.
  • the quantization level that best matches the given parameter in terms of some closeness measure is chosen to represent that parameter.
  • the permitted quantization levels are stored in a look-up table at both the encoder and the decoder.
  • the index into the table of the chosen quantization level is transmitted by the encoder to the decoder.
  • the quantization level may be determined using a mathematical formula.
  • the quantization levels are usually spaced non-uniformly in the case of
  • Adaptive quantizers are often used in speech coding to minimize the quantization error at the cost of greater computational complexity.
  • Adaptive quantizers may utilize forward adaptation or backward adaptation.
  • forward adaptation schemes extra side information regarding the dynamic range has to be transmitted periodically to the decoder in addition to the quantization table index. Thus, such schemes are usually not used in speech coders.
  • Backward adaptive quantizers are preferred because they do not require transmission of any side information.
  • Two general types of backward adaptive quantizers are commonly used: standard deviation based and differential. These are depicted in
  • a quantized version of the normalization factor is used at both the quantizer and dequantizer. In some variations of this scheme, decisions to expand or compress the quantization intervals may be based simply on the previous parameter input only.
  • differential quantization scheme can also be represented as in Figure 7 when a linear predictor, P(z), is used. Note that if we approximate the transfer function P(z)/[1-P(z)] by the linear predictor,
  • g c (n) denotes the gain factor that is used to scale the
  • a 32-level non-uniform quantization is performed on ⁇ gc ( ⁇ ) to obtain ⁇ gc (n) .
  • the decoder thus, can obtain the predicted gain in the same manner as the encoder using (3) once the current subframe
  • R(n) denotes the prediction error given by
  • the actual information transmitted from the encoder to the decoder are the bits representing the look-up table index of the quantized R(n) parameter,
  • the quantization of the SLRP at the encoder is performed indirectly by using the mean-removed excitation vector energy each subframe.
  • E( ⁇ ) denotes the mean-
  • second line of equation (5) is the mean excitation vector energy, ⁇ _ (n) , i.e.
  • the excitation vector ⁇ c(i) ⁇ is decoded at the decoder prior to the
  • E( ⁇ ) is the predicted
  • the decoder decodes the excitation vector and computes E, (n) using equation
  • g c (n) Y gc (n)g c '( ⁇ ) .
  • the 32 quantization levels for f gc (n) are
  • Figure 10 Note that the vertical axis in Figure 10 which represents the quantization levels is plotted on a logarithmic scale.
  • Figure 5 illustrated a preferred location of an ALC device operating on coded speech. With reference to this Figure, possible implementations of the ALC device will be discussed.
  • the quantized SLRP is decoded (e.g., read) from the coded domain signal (e.g., compression code signal) and multiplied (e.g., adjusted) by a gain factor determined by the ALC algorithm.
  • the SLRP may be considered an adjusted SLRP value.
  • the result is then requantized (e.g., to form an adjusted SLRP).
  • the coded domain signal is appropriately modified to reflect the change in the SLRP.
  • the adjusted SLRP may be substituted for the original SLRP.
  • any form of error protection used on the coded domain signal must be appropriately reinstated.
  • the ALC device may require measures of the speech level, voice activity and double-talk activity to determine the gain that is to be applied to the SLRP. This may require the decoding of the coded domain signal to some extent. For most codecs, only a partial decoding of the coded speech is necessary to perform ALC. The speech is decoded to the extent necessary to extract (e.g., read) the
  • SLRP as well as other parameters essential for obtaining sufficiently accurate speech level, voice activity and double-talk measurements.
  • Some examples of situations where only partial decoding suffices include:
  • a post-filtering process i.e., decoding step
  • This post-filtering helps to reduce quantization noise but does not change the overall power level of the signal.
  • the post-filtering process i.e., decoding step
  • silence suppression scheme is often used in cellular telephony and voice over packet networks.
  • coded speech frames are transmitted only during voice activity and very little transmission is performed during silence.
  • the decoders automatically insert some comfort noise during the silence periods to mimic the background noise from the other end.
  • One example of such a scheme used in GSM cellular networks is called discontinuous transmission (DTX).
  • DTX discontinuous transmission
  • the decoder in the ALC device can completely avoid decoding the signal during silence. In such cases, the determination of voice and double-talk activities can also be simplified in the ALC device.
  • the coded speech bits for each channel will be carried through the wireline network between base stations at 64 kbits/sec. This bitstream can be divided into 8-bit samples. The 2 least significant bits of each sample will contain the coded speech bits while the upper 6 bits will contain the bits corresponding to the appropriate PCM samples.
  • the conversion of the PCM information to linear speech is very inexpensive and provides a somewhat noisy version of the linear speech signal. It is possible to use this noisy linear domain speech signal to perform the necessary voice activity, double-talk and speech level measurements as is usually done in linear domain ALC algorithms. Thus, in this case, only a minimal amount of decoding of the coded domain speech parameters is necessary.
  • the SLRP and any other parameters that are required for the requantization of the SLRP would have to be decoded.
  • the other parameters would be decoded only to the extent necessary for requantization of the SLRP. This will be clear from the examples that will follow in later sections.
  • SLRP that usually differs from the desired value.
  • the desired gain that was applied by the Gain Determination block will differ from the gain that will be realized when the signal is decoded.
  • overflow or underflow problems may arise due to this difference because the speech signal may be over-amplified or over- suppressed, respectively.
  • some ALC algorithms may utilize the past desired gain values to determine current and future desired gain values. Since the desired gain values do not reflect the actual realized gain values, such algorithms may perform erroneously when applied as shown in Figure 12.
  • the requantization process can sometimes result in undesirable reverberations in the SLRP. This can cause the speech level to be modulated unintentionally, resulting in a distorted speech signal.
  • Such SLRP reverberations are encountered in feedback quantization schemes such as differential quantization.
  • the iterative scheme of Figure 13 can be incorporated in the Gain Determination block.
  • the realized gain value after requantization of the SLRP may be computed.
  • the realized gain is checked to see if overflow or underflow problems could occur. This could be accomplished, for example, by determining what the new speech level would be by multiplying the realized gain by the original speech level.
  • a speech decoder could be used in the ALC device to see whether overflow/underflow actually occurs. Either way, if the realized gain value is deemed to be too high or too low, the new SLRP is reduced or increased, respectively, until the danger of overflow/underflow is considered to be no longer present.
  • the gain that is fed back should be the realized gain after the SLRP requantization process, not the desired gain.
  • a preferred approach is shown in Figure 14. If the desired gain was used in the feedback loop instead of the realized gain, the controller would not be tracking the actual decoded speech signal level, resulting in erroneous level control.
  • these methods preferably include the integration of the gain determination and SLRP requantization techniques.
  • FIG. 15 illustrates the general configuration of an ALC device that uses joint gain determination and SLRP requantization. The details will depend on the particular ALC device.
  • the requantization of the SLRPs for these particular cases will be described while noting that the approaches may be easily extended to any other quantization scheme.
  • the joint determination of the gain and SLRP requantization in the ALC device configuration of Figure 15 may utilize the requantization techniques described here.
  • the original value of the quantized SLRP will be denoted by ⁇ (n) , where n is
  • the desired gain determined by the ALC device will be denoted by g(n) .
  • the realized gain after SLRP requantization will be denoted by
  • overflow and underflow prevention are desired, then the iterative scheme described in Figure 13 may be used.
  • the partial decoding of the speech samples using the requantized SLRP may be performed to the extent necessary. This, of course, involves additional complexity in the algorithm. The decoded samples can then be directly inspected to ensure that overflow or underflow has not taken place.
  • These desired gain values preferably have the same spacing as the SLRP quantization values, with OdB being one of the gains. This ensures that the desired and realized gain values will always be aligned so that equation (8) would not have to be evaluated for each table value. Hence the requantization is greatly simplified.
  • the original quantization index of the SLRP is simply increased or decreased by a value corresponding to the desired gain value divided by the SLRP quantization table spacing. For instance, suppose that the SLRP quantization table spacing is denoted by ⁇ .
  • the discrete set of permitted desired gain values would be l+ ⁇ ..., -2 ⁇ , - ⁇ , 0, ⁇ , 2 ⁇ , ... ⁇ if the SLRP quantization table values are uniformly spaced linearly, and 0+ ⁇ ..., -2 ⁇ , - ⁇ , 0, ⁇ , 2 ⁇ , ... ⁇ if the SLRP quantization table values are uniformly spaced linearly, and 0+ ⁇ ..., -2 ⁇ , - ⁇ , 0, ⁇ , 2 ⁇ , ... ⁇ if the SLRP quantization table values are
  • would be the average spacing between adjacent quantization table values, where the average is performed appropriately using either linear or logarithmic distances between the values.
  • An example of instantaneous scalar requantization is shown for the GSM FR
  • This codec's SLRP is the block maximum, x ⁇ , which is
  • the Q and Q "1 blocks represent the SLRP requantization
  • the index of the block maximum is first
  • the index of the requantized x max is then substituted for the original value in the
  • FIG. 17 shows a general coded domain ALC technique with only the compo- nents relevant to ALC being shown.
  • G(n) denotes the original logarithmic gain value determined by the encoder.
  • G(n) is equal to
  • the SLRP, R(n) is modified by the ALC
  • the device to R ALC (n) based on the desired gain.
  • the realized gain, ⁇ R(n) is the
  • AR(n) R ALC (n)-R(n) (9) Note that this is different from the actual gain realized at the decoder which,
  • the actual realized gain is essentially an amplified version of the SLRP realized gain due to the decoding process, under steady-state conditions.
  • steady-state it is meant that ⁇ G(n) is kept constant for a period of time that is sufficiently long so
  • ⁇ R(n) is either steady or oscillates in a regular manner about a particular level.
  • This method for differential scalar requantization basically attempts to mimic the operation of the encoder at the ALC device. If the presence of the quantizers at the encoder and the ALC device is ignored, then both the encoder and the ALC device
  • G ALC (n) G(n) + AG(n) + quantization error
  • the feedback of the SLRP realized gain, ⁇ R(n) , in the ALC device can cause
  • Figure 18(a) shows the step in the desired gain.
  • Figure 18(b) shows the actual realized gain superimposed on the desired gain.
  • the reverberations in the SLRP realized gain shown in Figure 18(b) cause a modulation of the speech signal and can result in audible distortions. Thus, depending on the ALC specifications, such reverberations may be undesirable.
  • the reverberations can be eliminated by 'moving' the quantizer outside the feedback loop
  • the average error during steady state operation of the requantizer with and without the quantizer in the feedback loop are 0.39dB and 1.03dB, respectively.
  • the ALC apparatus of Figure 19 can be simplified as shown in Figure 20, resulting in savings in computation. This is done by replacing the linear system
  • Some ALC algorithms may utilize past gain values to determine current and future gain values.
  • the gain that is fed back should be the actual realized gain after the SLRP requantization process, not the desired gain. This was discussed above in conjunction with Figure 14.
  • Differential scalar requantization for such feedback-based ALC algorithms can be implemented as shown in Figure 21.
  • the ALC device is mimicking the actions of the decoder to determine the actual realized gain.
  • any of the methods described earlier that have quantizers within the feedback loop may be used.
  • any of the methods described earlier that have quantizers outside the feedback loop may be used.
  • the earliest point at which the first sample can be decoded is after the reception of bit 91 as shown in Figure 23(a). This represents a buffering delay of approximately 7.46ms. It turns out that sufficient information is received to decode not just the first sample but the entire first subframe at this point. Similarly, the entire first subframe can be decoded after about 7.11ms of buffering delay in the FR decoder.
  • each subframe has an associated SLRP in both the EFR and FR coding schemes. This is generally true for most other codecs where the encoder operates at a subframe level.
  • ALC in the coded domain can be performed subframe-by-subframe rather than frame-by-frame.
  • the new SLRP computed by the ALC device can replace the original SLRP in the received bitstream.
  • the delay incurred before the SLRP can be decoded is determined by the position of the bits corresponding to the SLRP in the received bitstream. In the case of the FR and EFR codecs, the position of the SLRP bits for the first subframe determines this delay.
  • the ALC algorithm must be designed to determine the gain for the current subframe based on previous subframes only. In this way, almost no buffering delay will be necessary to modify the SLRP.
  • the bits corresponding to the SLRP in a given subframe are received, they will first be decoded. Then the new SLRP will be computed based on the original SLRP and information from the previous subframes only. The original SLRP bits will be replaced with the new SLRP bits. There is no need to wait until all the bits necessary to decode the current subframe are received.
  • the buffering delay incurred by the algorithm will depend on the processing delay which is small. Information about the speech level is derived from the current subframe only after replacement of the SLRP for the current subframe.
  • the SLRP computed for the next subframe can be appropriately set to minimize the likelihood of continued overflows.
  • This near-zero buffering delay method is especially applicable to the FR codec since the decoding of the SLRP for this codec does not involve decoding any other parameters.
  • the subframe excitation vector is also needed to decode the SLRP and the more complex differential requantization techniques have to be used for requantizing the SLRP. Even in this case, significant reduction in the delay is attained by performing the speech level update based on the current subframe after the SLRP is replaced for the current subframe.
  • the received bitstream can be divided into 8- bit samples.
  • the 2 least significant bits of each sample will contain the coded speech bits while the upper 6 bits will contain the bits corresponding to the appropriate PCM samples.
  • a noisy version of the linear speech samples is available to the ALC device in this case. It is possible to use this noisy linear domain speech signal to perform the necessary voice activity, double-talk and speech level measurements as is usually done in linear domain ALC algorithms.
  • only a minimal amount of decoding of the coded domain speech parameters is necessary. Only parameters that are required for the determination and requantization of the SLRP would have to be decoded. Partial decoding of the speech signal is unnecessary as the noisy linear domain speech samples can be relied upon to measure the speech level as well as perform voice activity and double-talk detection.
  • a processor which may include a microprocessor, a microcontroller or a digital signal processor, as well as other logic units capable of logical and arithmetic operations.
  • Speech compression which falls under the category of lossy source coding, is commonly referred to as speech coding.
  • Speech coding is performed to minimize the bandwidth necessary for speech transmission. This is especially important in wireless telephony where bandwidth is scarce. In the relatively bandwidth abundant packet networks, speech coding is still important to minimize network delay and jitter. This is because speech communication, unlike data, is highly intolerant of delay. Hence a smaller packet size eases the transmission through a packet network.
  • the four ETSI GSM standards of concern are listed in Table 3. Each of the standards defines a linear predictive code. Table 3 is a subset of the speech codecs identified in Table 1. Table 3: GSM Speech Codecs
  • a set of consecutive digital speech samples is referred to as a speech frame.
  • the GSM coders operate on a frame size of 20ms (160 samples at 8kHz sampling rate). Given a speech frame, a speech encoder determines a small set of parameters for a speech synthesis model. With these speech parameters and the speech synthesis model, a speech frame can be reconstructed that appears and sounds very similar to the original speech frame. The reconstruction is performed by the speech decoder. In the GSM speech coders listed above, the encoding process is much more computationally intensive than the decoding process.
  • the speech parameters determined by the speech encoder depend on the speech synthesis model used.
  • the GSM coders in Table 3 utilize linear predictive coding (LPC) models.
  • LPC linear predictive coding
  • a block diagram of a simplified view of the LPC speech synthesis model is shown in Figure 3.
  • the Figure 3 model can be used to generate speech-like signals by specifying the model parameters appropriately.
  • the parameters include the time-varying filter coefficients, pitch periods, codebook vectors and the gain factors.
  • the synthetic speech is generated as follows.
  • An appropriate codebook vector, c( ⁇ ) is first scaled by the
  • pitch synthesis filter whose parameters include the pitch gain, g .
  • the pitch synthesis filter provides the harmonic
  • the total excitation vector is then filtered by the LPC synthesis filter which specifies the broad spectral shape of the speech frame.
  • the parameters are usually updated more than once. For instance, in the GSM FR and EFR coders, the codebook vector, codebook gain and the pitch synthesis filter parameters are determined every subframe (5ms).
  • LPC synthesis filter parameters are determined twice per frame (every 10ms) in EFR and once per frame in FR.
  • a typical speech encoder executes the following sequence of steps:
  • a typical speech decoder executes the following sequence of steps:
  • G specifies the DC gain of the transfer function. This, in turn, implies that G can be modified to adjust the overall speech level in an approximately linear manner. Hence, G is termed the
  • GSM coders use speech level related parameters (SLRPs). These SLRPs correspond to G in the general speech synthesis model of Figure 3.
  • CD-ALC coded domain ALC
  • CD-ALC coded domain ALC
  • SLRP modification algorithm For each codec, a different coded domain SLRP modification algorithm must be devised.
  • preferred algorithms for the FR and EFR coders are described.
  • the quantization of a single speech parameter is termed scalar quantization.
  • vector quantization is usually applied to a set of parameters that are related to each other in some way such as the LPC coefficients.
  • Scalar quantization is generally applied to a parameter that is relatively independent of the other parameters, such as the codebook gain. For the purposes of implementing CD-ALC, the discussion is limited to scalar quantization only.
  • Both the FR and EFR coders utilize scalar quantization for their respective codebook gains (which we are also referring to as the SLRPs).
  • the FR coder performs
  • the EFR coder performs an adaptive differential scalar quantization
  • ALC is shown in Figure 24.
  • a communications system 10 transmits near end digital signals from a near end handset 12 over a network 14 using a compression code, such as any of the codes used by the Codecs identified inTable 2.
  • the compression code is generated by an encoder 16 from linear audio signals generated by the near end handset 12.
  • the compression code comprises parameters, such as the parameters labeled SLRP in Table 2.
  • the parameters represent an audio signal comprising a plurality of audio characteristics, including audio level.
  • the audio level is related to the parameters labeled SLRP inTable 2.
  • the compression code is decodable by various decoding steps, including one or more steps for decoding the parameters related to audio level.
  • system 10 adjusts the audio level with minimal delay and minimal, if any, decoding of the compression code parameter relating to audio level.
  • Near end digital signals using the compression code are received on a near end terminal 20 and send in port Sin, and an adjusted compression code is transmitted by a near end terminal 22 and send out port Sout over a network 24 to a far end handset 26 which includes a decoder 28 of the compression code.
  • a linear far end audio signal is encoded by a far end encoder 30 to generate far end digital signals using the same compression code as encoder 16, and is transmitted over a network 32 to a far end terminal 34 and receive in port Rin.
  • Network 34 also transmits the far end signals to a terminal 36 and a receive out port Rout.
  • a decoder 18 of near end handset 12 decodes the far end digital signals. As shown in Figure 24, echo signals from the far end signals may find their way to encoder 16 of the near end handset 12.
  • a processor 40 performs various operations on the near end and far end compression code.
  • Processor 40 may be a microprocessor, microcontroller, digital signal processor, or other type of logic unit capable of arithmetic and logical operations.
  • a different coded domain SLRP modification algorithm is executed by processor 40.
  • a linear domain level control algorithm 42 executed by processor 40 is in operation at all times - under native mode and linear mode, during TFO as well as non-TFO.
  • a partial decoder 48 decodes enough of the compression code to form linear code from which the audio level of the audio signal represented by the compression code can be determined. Decoder 48 also reads a compression code parameter related to audio level, such as one of the parameters identified inTable 2. The read parameter is dequantized to form a parameter value.
  • the linear domain level control algorithm determines the gain factor for level adjustment and writes it to a predetermined memory location within processor 40. This gain factor is read by the appropriate codec-dependent coded domain SLRP modification algorithm 44 also executed by processor 40. Algorithm 44 modifies the read SLRP parameter (i.e., the gain factor) to form an adjusted SLRP parameter value
  • the adjusted parameter value is quantized to form an adjusted SLRP parameter which is written into the bit-stream received at terminal 20.
  • the adjusted SLRP parameter is substitued for the original read SLRP paramter.
  • the partial decoders 46 and 48 shown within the Network ALC Device are algorithms executed by processor 40 and are codec-dependent. In the case of GSM
  • EFR the decoder post-filtering operations except for upscaling are unnecessary.
  • GSM FR the complete decoder is implemented.
  • a modular approach has the advantage that any existing or new linear domain level control algorithm can be incorporated with little or no modification with the coded domain SLRP modification algorithms.
  • a coder-specific level control method might provide more accurate level adjustments. However, it may require a significant re-design of the existing linear domain level control algorithms to ensure smooth transitions when switching from native to linear mode (and vice versa). Note that there is a small risk that some undesirable artifacts may be occasionally introduced when switching between coded and linear modes when using the modular approach.
  • the preferred embodiment includes a minimal delay technique.
  • Large buffering, processing and transmission delays are already present in cellular networks without any network voice quality enhancement processing. Further network processing of the coded speech for speech enhancement purposes will add additional delay. If linear domain processing is performed on coded speech during TFO, more than a frame of delay (20ms) will be added due to buffering and processing requirements for decoding and re-encoding.
  • CD-ALC can be performed with a buffering delay that is much less than one frame for FR and EFR coders.
  • the delay reduction under CD-ALC is achieved for FR and EFR by performing level control a subframe at a time rather than frame-by-frame.
  • the linear domain ALC algorithm can send the gain factor to the coded domain SLRP modification algorithm 44.
  • the first subframe requires more than 5ms of delay before decoding can begin.
  • Table 5 and Table 6 provide the earliest possible points at which decoding of samples can be performed as the bit-stream is received for the FR and EFR coders, respectively, and correspond to the illustration in Figure 23. Note that there are 260 bits/frame for the FR and 244 bits/frame for the EFR. The table assumes that the incoming bits are spread out evenly over 20ms, for the sake of simplicity. With this approximation, the first subframe requires 7.11ms for the FR and 7.46ms for the EFR. All other subframes require less delay.
  • table specifies a six bit index for each range of values. The six bit index is re-inserted in the appropriate positions for each subframe.
  • the quantized SLRP values are shown in Figure 6.
  • the range of the quantized values is 31 to 32767. This represents a dynamic range of about 60dB (201og 10 (32767/31)).
  • each subframe of the SLRP is as follows: (1) Both the near-end and far-end compression coded speech subframes are fully decoded by decoders 46 and 48. That is, the digital signals transmitted to terminals 20 and 34 are both fully decoded by decoders 46 and 48 to generate near end decoded signals and far end decoded signals indicative of audio level.
  • the x ⁇ ' value is read from the coded near end signal by partial decoder
  • the near end decoded signals and far end decoded signals are processed by the Linear Domain ALC (LD-ALC) algorithm 42 to determine the proper audio level.
  • LD-ALC Linear Domain ALC
  • only the double-talk information based on the far-end signal received at terminal 34 may be actually passed into the LD-ALC algorithm 42.
  • CD-ALC 44 This may be achieved by writing to a predetermined memory location to be read by CD-ALC.
  • CD-ALC 44 extracts the 6-bit table index for the current subframe according to
  • the decoder code may be modified to pass this value to CD-ALC 44.
  • ⁇ gc is considered the actual compression code SLRP because it is the only
  • E,(ri) depends only on the subframe' s fixed codebook vector
  • the quantized gain factor is computed using (12) as
  • the adaptive differential quantization of the SLRP, ⁇ gc is performed in the
  • R(n) is quantized by the
  • the quantization of the SLRP at the encoder is performed indirectly by using the mean-removed codebook vector energy each subframe.
  • E(n) denotes the mean-
  • the codebook vector ⁇ c(i) ⁇ is required in order to decode the SLRP. Note that
  • the decoding of the codebook vector is independent of the decoding of the SLRP.
  • E( ⁇ ) is the predicted energy given by
  • the decoder decodes the excitation vector and computes E,(n) using (16).
  • E( ) is computed using previously decoded gain correction factors using
  • the correction factor for the current subframe is used to obtain f from the look-up
  • the quantized SLRP values are shown in Figure 10. Differences between adjacent quantization levels are shown in Figure 22. The range of the quantized values is 159 to 27485. This represents a dynamic range of about 45dB (201og 10 (27485/159)).
  • the table of quantized SLRP values and the logarithms are also provided in Table 9. This table is necessary for re-encoding the SLRP.
  • Table 9 Table of SLRP quantization values for GSM EFR
  • the CD-ALC processing of the SLRP of each subframe is as follows:
  • the near end decoded and far end decoded signals are processed by the Linear Domain ALC (LD-ALC) algorithm to determine the proper audio level.
  • LD-ALC Linear Domain ALC
  • only the double-talk information based on the far-end signal may be actually passed into the LD-ALC algorithm 42.
  • CD-ALC 44 extracts the 5-bit table index for the current subframe according to
  • the decoder code may be modified to pass this value to CD-ALC 44.
  • R new (n) denotes the new or adjusted SLRP value.
  • Gain actual (n) 20 log 10 g ALC ( ⁇ ) - Gain predicted ( ⁇ ) (19)
  • PastDeltaR[3] PastDeltaR[2]
  • PastDeltaR[2] PastDeltaR[l]
  • PastDeltaR[l] PastDeltaR[0]
  • PastDeltaR[0] Gain aclual (n)
  • R new (n) is quantized to obtain an adjusted parameter R new (n) using Table 9.
  • R new (n) is assigned the value that is closest in terms of the absolute difference
  • R new (n) is inserted (e.g., written or substituted) appropriately back into the coded
  • 201og(g c x / ) 201og c + 201og g ⁇ C in the SLRP encoding process. Since our
  • the gain factor changes are generally small and
  • the gain factor adjustment steps should be limited to ⁇ 3 dB for operation in conjunction with GSM FR codecs, which is the same as the usual LD- ALC step size. (In some version of LD-ALC, 6dB steps were possible; this should be avoided.) Hence the possible dB gain values should be restricted to ⁇ -3, -6, 0, 3, 6, 9,
  • the gain factor adjustment steps should be limited to ⁇ 3.39 dB steps for operation in conjunction with GSM EFR codecs. (In some version of LD-ALC, 6dB steps were possible; this should be avoided.) This step size is optimized specifically for EFR to minimize the transient effects and maximize accuracy. Hence the possible dB gain values should be restricted to ⁇ -6.77, -3.39, 0, 3.39, 6.77, 10.16, 13.55, 16.93 ⁇ .
  • Any gain changes should be restricted to occur only at the beginning of a subframe boundary. This ensures that the sample at which a gain change occurs is identical in both the linear (upper 6 PCM bits) and coded signals.
  • a subframe (40 samples) of speech should be processed at a time for efficiency.
  • the CD-ALC algorithm utilizes an LD-ALC algorithm to determine the gain adjustments, the CD-ALC algorithm performance is, in a sense, upper bounded by the LD-ALC performance. Thus, even if the LD-ALC algorithm complies with
  • Figure 29 shows the results for a case when CD-ALC is used in conjunction with FR.
  • the upper plot shows power profiles of the original (dashed line) and processed (solid line) signals. A 40ms time constant was used in the recursive mean- square averaging of the signals to obtain the power profiles.
  • the lower plot shows the LD-ALC gain (blue, dashed line) at the end of each subframe; also shown is the ratio of the processed power to the original power at the end of each subframe. In the regions where the speech signal is strong, the amplification of the signal corresponds quite closely to the desired gain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
EP00946994A 1999-07-02 2000-06-30 Adaptive pegelregelung komprimierter sprache im kodebereich Withdrawn EP1190494A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US14213699P 1999-07-02 1999-07-02
US142136P 1999-07-02
PCT/US2000/018293 WO2001003317A1 (en) 1999-07-02 2000-06-30 Coded domain adaptive level control of compressed speech

Publications (1)

Publication Number Publication Date
EP1190494A1 true EP1190494A1 (de) 2002-03-27

Family

ID=22498680

Family Applications (3)

Application Number Title Priority Date Filing Date
EP00946994A Withdrawn EP1190494A1 (de) 1999-07-02 2000-06-30 Adaptive pegelregelung komprimierter sprache im kodebereich
EP00948555A Withdrawn EP1190495A1 (de) 1999-07-02 2000-06-30 Echo-regelung im kodebereich
EP00946954A Pending EP1208413A2 (de) 1999-07-02 2000-06-30 Kodierte domain rauschsteuerung.

Family Applications After (2)

Application Number Title Priority Date Filing Date
EP00948555A Withdrawn EP1190495A1 (de) 1999-07-02 2000-06-30 Echo-regelung im kodebereich
EP00946954A Pending EP1208413A2 (de) 1999-07-02 2000-06-30 Kodierte domain rauschsteuerung.

Country Status (5)

Country Link
EP (3) EP1190494A1 (de)
JP (3) JP2003503760A (de)
AU (3) AU6203300A (de)
CA (3) CA2378035A1 (de)
WO (3) WO2001003317A1 (de)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1301018A1 (de) * 2001-10-02 2003-04-09 Alcatel Verfahren und Vorrichtung zum Ändern eines digitalen Signals im Kodebereich
JP3946074B2 (ja) * 2002-04-05 2007-07-18 日本電信電話株式会社 音声処理装置
JP3876781B2 (ja) * 2002-07-16 2007-02-07 ソニー株式会社 受信装置および受信方法、記録媒体、並びにプログラム
EP1521242A1 (de) * 2003-10-01 2005-04-06 Siemens Aktiengesellschaft Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung
US7613607B2 (en) 2003-12-18 2009-11-03 Nokia Corporation Audio enhancement in coded domain
US8874437B2 (en) 2005-03-28 2014-10-28 Tellabs Operations, Inc. Method and apparatus for modifying an encoded signal for voice quality enhancement
US8078659B2 (en) * 2005-10-31 2011-12-13 Telefonaktiebolaget L M Ericsson (Publ) Reduction of digital filter delay
US7852792B2 (en) * 2006-09-19 2010-12-14 Alcatel-Lucent Usa Inc. Packet based echo cancellation and suppression
JP4915575B2 (ja) * 2007-05-28 2012-04-11 パナソニック株式会社 音声伝送システム
JP4915576B2 (ja) * 2007-05-28 2012-04-11 パナソニック株式会社 音声伝送システム
JP4915577B2 (ja) * 2007-05-28 2012-04-11 パナソニック株式会社 音声伝送システム
US8032365B2 (en) * 2007-08-31 2011-10-04 Tellabs Operations, Inc. Method and apparatus for controlling echo in the coded domain
WO2012106926A1 (zh) 2011-07-25 2012-08-16 华为技术有限公司 一种参数域回声控制装置和方法
TWI469135B (zh) * 2011-12-22 2015-01-11 Univ Kun Shan 調適性差分脈衝碼調變編碼解碼的方法
JP6011188B2 (ja) * 2012-09-18 2016-10-19 沖電気工業株式会社 エコー経路遅延測定装置、方法及びプログラム
US11031023B2 (en) 2017-07-03 2021-06-08 Pioneer Corporation Signal processing device, control method, program and storage medium

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0683114B2 (ja) * 1985-03-08 1994-10-19 松下電器産業株式会社 エコ−キヤンセラ
US4969192A (en) * 1987-04-06 1990-11-06 Voicecraft, Inc. Vector adaptive predictive coder for speech and audio
US5140543A (en) * 1989-04-18 1992-08-18 Victor Company Of Japan, Ltd. Apparatus for digitally processing audio signal
US5097507A (en) * 1989-12-22 1992-03-17 General Electric Company Fading bit error protection for digital cellular multi-pulse speech coder
US5680508A (en) * 1991-05-03 1997-10-21 Itt Corporation Enhancement of speech coding in background noise for low-rate speech coder
JP3353257B2 (ja) * 1993-08-30 2002-12-03 日本電信電話株式会社 音声符号化復号化併用型エコーキャンセラー
US5828995A (en) * 1995-02-28 1998-10-27 Motorola, Inc. Method and apparatus for intelligible fast forward and reverse playback of time-scale compressed voice messages
JPH0954600A (ja) * 1995-08-14 1997-02-25 Toshiba Corp 音声符号化通信装置
JPH0993132A (ja) * 1995-09-27 1997-04-04 Toshiba Corp 符号化・復号化装置及び方法
JPH10143197A (ja) * 1996-11-06 1998-05-29 Matsushita Electric Ind Co Ltd 再生装置
US5943645A (en) * 1996-12-19 1999-08-24 Northern Telecom Limited Method and apparatus for computing measures of echo
JP3283200B2 (ja) * 1996-12-19 2002-05-20 ケイディーディーアイ株式会社 符号化音声データの符号化レート変換方法および装置
US6064693A (en) * 1997-02-28 2000-05-16 Data Race, Inc. System and method for handling underrun of compressed speech frames due to unsynchronized receive and transmit clock rates
JP3317181B2 (ja) * 1997-03-25 2002-08-26 ヤマハ株式会社 カラオケ装置
US6112177A (en) * 1997-11-07 2000-08-29 At&T Corp. Coarticulation method for audio-visual text-to-speech synthesis
CN1737903A (zh) * 1997-12-24 2006-02-22 三菱电机株式会社 声音译码方法以及声音译码装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO0103317A1 *

Also Published As

Publication number Publication date
JP2003504669A (ja) 2003-02-04
WO2001003316A1 (en) 2001-01-11
JP2003533902A (ja) 2003-11-11
AU6067100A (en) 2001-01-22
CA2378012A1 (en) 2001-01-11
JP2003503760A (ja) 2003-01-28
EP1208413A2 (de) 2002-05-29
WO2001002929A2 (en) 2001-01-11
CA2378062A1 (en) 2001-01-11
AU6203300A (en) 2001-01-22
WO2001002929A3 (en) 2001-07-19
AU6063600A (en) 2001-01-22
WO2001003317A1 (en) 2001-01-11
EP1190495A1 (de) 2002-03-27
CA2378035A1 (en) 2001-01-11

Similar Documents

Publication Publication Date Title
US7362811B2 (en) Audio enhancement communication techniques
RU2325707C2 (ru) Способ и устройство для эффективного маскирования стертых кадров в речевых кодеках на основе линейного предсказания
US7539615B2 (en) Audio signal quality enhancement in a digital network
EP1190494A1 (de) Adaptive pegelregelung komprimierter sprache im kodebereich
US8543388B2 (en) Efficient speech stream conversion
US7907977B2 (en) Echo canceller with correlation using pre-whitened data values received by downlink codec
US20070160154A1 (en) Method and apparatus for injecting comfort noise in a communications signal
US6026356A (en) Methods and devices for noise conditioning signals representative of audio information in compressed and digitized form
US20040243404A1 (en) Method and apparatus for improving voice quality of encoded speech signals in a network
US20030065507A1 (en) Network unit and a method for modifying a digital signal in the coded domain
US8144862B2 (en) Method and apparatus for the detection and suppression of echo in packet based communication networks using frame energy estimation
EP1020848A2 (de) Verfahren zur Übertragung von zusätzlichen informationen in einem Vokoder-Datenstrom
US5812944A (en) Mobile speech level reduction circuit responsive to base transmitted signal
EP1544848A2 (de) Qualitätsverbesserung eines Audiosignals im Kodierbereich
Chandran et al. Compressed domain noise reduction and echo suppression for network speech enhancement
US20050102136A1 (en) Speech codecs
Kondoz et al. A high quality voice coder with integrated echo canceller and voice activity detector for VSAT systems
CN100369108C (zh) 编码域中的音频增强的方法和设备
Åkerberg et al. Audio Techniques

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20020111

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN

18W Application withdrawn

Effective date: 20040428