EP1190494A1 - Adaptive pegelregelung komprimierter sprache im kodebereich - Google Patents
Adaptive pegelregelung komprimierter sprache im kodebereichInfo
- Publication number
- EP1190494A1 EP1190494A1 EP00946994A EP00946994A EP1190494A1 EP 1190494 A1 EP1190494 A1 EP 1190494A1 EP 00946994 A EP00946994 A EP 00946994A EP 00946994 A EP00946994 A EP 00946994A EP 1190494 A1 EP1190494 A1 EP 1190494A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- parameter
- adjusted
- near end
- characteristic
- code
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 230000003044 adaptive effect Effects 0.000 title description 15
- 230000006835 compression Effects 0.000 claims abstract description 70
- 238000007906 compression Methods 0.000 claims abstract description 70
- 230000005236 sound signal Effects 0.000 claims abstract description 31
- 238000004891 communication Methods 0.000 claims abstract description 12
- 238000000034 method Methods 0.000 claims description 122
- 238000013139 quantization Methods 0.000 claims description 114
- 239000013598 vector Substances 0.000 claims description 41
- 238000012545 processing Methods 0.000 claims description 29
- 230000005284 excitation Effects 0.000 claims description 22
- 238000012937 correction Methods 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 17
- 238000001914 filtration Methods 0.000 claims description 6
- 238000010295 mobile communication Methods 0.000 claims description 3
- 230000007774 longterm Effects 0.000 claims 2
- 238000012360 testing method Methods 0.000 claims 2
- 230000008569 process Effects 0.000 description 41
- 230000015572 biosynthetic process Effects 0.000 description 28
- 238000010586 diagram Methods 0.000 description 28
- 238000003786 synthesis reaction Methods 0.000 description 28
- 230000005540 biological transmission Effects 0.000 description 19
- IVEKVTHFAJJKGA-BQBZGAKWSA-N (2s)-2-amino-5-[[(2r)-1-ethoxy-1-oxo-3-sulfanylpropan-2-yl]amino]-5-oxopentanoic acid Chemical compound CCOC(=O)[C@H](CS)NC(=O)CC[C@H](N)C(O)=O IVEKVTHFAJJKGA-BQBZGAKWSA-N 0.000 description 18
- 238000013459 approach Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 15
- 230000003139 buffering effect Effects 0.000 description 13
- 238000012986 modification Methods 0.000 description 13
- 230000004048 modification Effects 0.000 description 13
- 230000000694 effects Effects 0.000 description 11
- 230000001413 cellular effect Effects 0.000 description 9
- 238000012546 transfer Methods 0.000 description 8
- 238000005259 measurement Methods 0.000 description 6
- 230000015654 memory Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 230000002265 prevention Effects 0.000 description 4
- 230000009467 reduction Effects 0.000 description 4
- 230000003595 spectral effect Effects 0.000 description 4
- 230000006978 adaptation Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000001934 delay Effects 0.000 description 3
- 230000003534 oscillatory effect Effects 0.000 description 3
- 238000005311 autocorrelation function Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 239000000306 component Substances 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000003278 mimic effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000003831 deregulation Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 231100001261 hazardous Toxicity 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000002250 progressing effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M9/00—Arrangements for interconnection not involving centralised switching
- H04M9/08—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic
- H04M9/082—Two-way loud-speaking telephone systems with means for conditioning the signal, e.g. for suppressing echoes for one or both directions of traffic using echo cancellers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0316—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
- G10L21/0364—Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B3/00—Line transmission systems
- H04B3/02—Details
- H04B3/20—Reducing echo effects or singing; Opening or closing transmitting path; Conditioning for transmission in one direction or the other
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0001—Systems modifying transmission characteristics according to link quality, e.g. power backoff
- H04L1/0014—Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the source coding
Definitions
- the present invention relates to coded domain enhancement of compressed speech and in particular to coded domain adaptive level control and noise reduction in the coded domain.
- Network enhancement of coded speech would normally require decoding, linear processing and re-encoding of the processed signal. Such a method is illustrated in Figure 1 and is very expensive. Moreover, the encoding process is often an order of magnitude more computationally intensive than the speech enhancement methods. Speech compression is increasingly used in telecommunications, especially in cellular telephony and voice over packet networks.
- Past network speech enhancement techniques which operate in the linear domain have several shortcomings. For example, past network speech enhancement techniques which operate in the linear domain require decoding of compressed speech, performing the necessary enhancements and re-encoding of the speech.
- PSTN Public Switched Telephone Network
- Telephony customers expect a comfortable listening level to maximize comprehension of their conversation.
- the transmitted speech level from a telephone instrument depends on the speaker's volume and the position of the speaker relative to the microphone. If volume control is available on the telephone instrument, the listener could manually adjust it to a desirable level. However, for historical reasons, most telephone instruments do not have volume controls. Also, direct volume control by the listener does not address the need to maintain appropriate levels for network equipment. Furthermore, as technology is progressing towards the era of hands-free telephony especially in the case of mobile phones in vehicles, manual adjustment is considered cumbersome and potentially hazardous to the vehicle operators.
- Maintaining speech quality has generally been the responsibility of network service providers; telephone instrument manufacturers typically have played a relatively minor role in meeting such responsibility.
- network service providers have provided tight specifications for equipment and networks with regard to speech levels.
- the network service providers have to ensure the proper speech levels with lesser influence over specifications and equipment used in other networks.
- Figure 2 shows the network configuration of a linear domain ALC device 202.
- the ALC device processes the near-end speech signal (at port Sin).
- the far-end signal (at port Rin) is used for determining double-talk.
- ALC device 202 processes a digital near end speech signal in a typical transmission network and determines the gain required to attain a target speech level by measuring the current speech level. Numerous algorithms can be devised to determine a suitable gain. For example, the
- ALC device could use a voice activity detector and apply new gain values only at the beginning of speech bursts. Furthermore, the maximum and minimum gain, and the maximum rate of change of the gain may all be constrained. In general, ALC devices utilize (1) some form of power measurement scheme on the near end signal to determine the current speech level, (2) a voice activity detector on the near end signal to demarcate speech bursts, and possibly (3) a double-talk detector on the far and near end signals to determine whether the near end signal contains echo.
- the ALC device determines the gain required to attain the target speech level by measuring the current speech level. Each digitized speech sample is multiplied by a gain factor. The double-talk information is used to prevent adjusting the gain factor erroneously based on echo.
- Tellabs algorithms/products for level control include
- TFO Tandem Free Operation
- GSM Global System for Mobile Communications
- the TFO standard applies to mobile-to-mobile calls.
- the speech signal is conveyed between mobiles in a compressed form after a brief negotiation period.
- the compressed speech is contained in TFO frames which bypass the transcoders in the network.
- the elimination of tandem codecs is known to improve speech quality in the case where the original signal is clean. Even in the case of clean speech, it may still be desirable to adjust the speech level to a suitable loudness level.
- Traditional methods for such level control would require decoding, processing and re-encoding the speech, which results in tandeming and is computationally-intensive.
- the coded domain approach avoids such tandeming and eliminates the need for full re-encoding. This document describes methods for speech level control in the coded domain.
- level control in conjunction with the GSM FR and EFR coders is addressed.
- One preferred embodiment is useful in a communications system for transmitting digital signals using a compression code comprising a predetermined plurality of parameters including a first parameter, the parameters representing an audio signal comprising a plurality of audio characteristics including a first characteristic, the first parameter being related to the first characteristic, the compression code being decodable by a plurality of decoding steps including a first decoding step for decoding the parameters related to the first characteristic.
- the first characteristic may be adjusted by reading at least the first parameter in response to the digital signals. At least a first parameter value is derived from the first parameter. An adjusted first parameter value representing an adjustment of the first characteristic is generated in response to the digital signals and the first parameter value.
- An adjusted first parameter is derived in response to the adjusted first parameter value, and the first parameter of the compression code is replaced with the adjusted first parameter.
- the preceding steps of reading, deriving, generating and replacing preferably are performed by a processor. As a result of the foregoing technique, the delay required to adjust the first characteristic may be reduced.
- a second preferred embodiment is useful in a communication system for transmitting digital signals comprising code samples comprising first bits using a compression code and second bits using a linear code.
- the code samples represent an audio signal have a plurality of audio characteristics, including a first characteristic.
- the first characteristic may be adjusted without decoding the compression code by adjusting the first bits and the second bits in response to the second bits.
- the adjusting preferably is performed with a processor.
- Figure 1 is a schematic block diagram of a system for network enhancement of coded speech in the linear domain.
- Figure 2 is a schematic block diagram of a system for automatic level control (ALC).
- ALC automatic level control
- FIG. 3 is a schematic block diagram of a linear predictive coding (LPC) speech synthesis model.
- LPC linear predictive coding
- Figure 4 is a schematic block diagram distinguishing coded domain digital speech parameters from linear domain digital speech samples.
- Figure 5 is a schematic block diagram of a coded domain ALC system.
- Figure 6 is a graph illustrating GSM full rate codec quantization levels for block maxima.
- Figure 7a is a schematic block diagram of a backward adaptive standard deviation based quantizer.
- Figure 7b is a schematic block diagram of a backward adaptive differential based quantizer.
- Figure 8 is a schematic block diagram of an adaptive differential quantizer using a linear predictor.
- Figure 9 is a schematic block diagram of a GSM enhanced full rate SLRP quantizer.
- Figure 10 is a graph illustrating GSM enhanced full rate codec quantization levels for a gain correction factor.
- Figure 11 is a schematic block diagram of one technique for performing ALC.
- Figure 12 is a schematic block diagram of one technique for coded domain ALC.
- Figure 13 is a flow diagram illustrating a technique for overflow/underflow prevention.
- Figure 14 is a schematic block diagram of a preferred form of ALC system using feedback of the realized gain in ALC algorithms requiring past gain values.
- Figure 15 is a schematic block diagram of one form of a coded domain ALC device.
- Figure 16 is a schematic block diagram of a system for instantaneous scalar requantization for a GSM FR codec.
- Figure 17 is a schematic block diagram of a system for differential scalar requantization for a GSM EFR codec.
- Figure 18a is a graph showing a step in desired gain.
- Figure 18b is a graph showing actual realized gain superimposed on the desired gain with a quantizer in the feedback loop.
- Figure 18c is a graph showing actual realized gain superimposed on the desired gain resulting from placing a quantizer outside the feedback loop shown in Figure 19.
- Figure 19 is a schematic block diagram of an ALC device showing a quantizer placed outside the feedback loop.
- Figure 20 is a schematic block diagram of a simplified version of the ALC device shown in Figure 19.
- Figure 21a is a schematic block diagram of a coded domain ALC implementation for ALC algorithms using feedback of past gain values with a quantizer in the feedback loop.
- Figure 21b is a schematic block diagram of a coded domain ALC implementation for ALC algorithms using feedback of past gain values with a quantizer outside the feedback loop.
- Figure 22 is a graph showing spacing between adjacent Ri values in an EFR codec, and more specifically showing EFR Codec SLRPs: (Ri + ] - R; ) against i.
- Figure 23 a is a diagram of a compressed speech frame of an EFR encoder illustrating the times at which various bits are received and the earliest possible decoding of samples as a buffer is filled from left to right.
- Figure 23b is a diagram of a compressed speech frame of an FR encoder illustrating the times at which various bits are received and the earliest possible decoding of samples as a buffer is filled from left to right.
- Figure 24 is a schematic block diagram of a preferred form of coded domain ALC system made in accordance with the invention.
- Figure 25 is a schematic block diagram of a preferred form of SLRP quantization in GSM EFR.
- Figure 26 is a schematic block diagram of an alternative form of SLRP quantization in GSM EFR.
- Figure 27 is a schematic block diagram of a preferred form of re-encoding the SLRP in GSM EFR.
- Figure 28 is a graph illustrating an exemplary speech signal.
- Figure 29 is a graph illustrating exemplary speech level adjustment with CD- ALC for FR.
- GSM 06.10 Digital cellular telecommunication system (Phase 2); Full rate speech; Part 2: Transcoding", March 1998.
- GSM 06.60 Digital cellular telecommunications system (Phase 2); Enhanced Full Rate (EFR) speech transcoding", June 1998.
- speech signals are digitally sampled prior to transmission.
- Such digital (i.e. discrete-time discrete- valued) signals are referred to in this specification as being in the linear domain or in linear mode.
- the adjustment of the speech levels in such linear domain signals is accomplished by multiplying every sample of the signal by an appropriate gain factor to attain the desired target speech level.
- Linear echo or acoustic echo may be present in the near end signal depending on the type of end path in the network. If such echo has significant power and is not already cancelled by an echo canceller, then a double-talk detector may also be required. This is to ensure that the gain is not inadvertently increased due to the echo of the far end speech signal.
- Digital speech signals that are typically carried in telephony networks usually undergo a basic form of compression such as pulse code modulation (PCM) before transmission.
- PCM pulse code modulation
- Such compression schemes are very inexpensive in terms of computations and delay. It is a relatively simple matter for the ALC device to convert the compressed digital samples to the linear domain, process the linear samples, and then compress the processed samples before transmission. As such, these signals can effectively be considered to be in the linear domain.
- compressed, or coded speech will refer to speech that is compressed using advanced compression techniques that require significant computational complexity.
- linear code and compression code mean the following:
- Linear code By a linear code, we mean a compression technique that results in one coded parameter or coded sample for each sample of the audio signal. Examples of linear codes are PCM (A-law and ⁇ -law) ADPCM (adaptive differential
- Compression code By a compression code, we mean a technique that results in fewer than one coded parameter for each sample of the audio signal. Typically, compression codes result in a small set of coded parameters for each block or frame of audio signal samples. Examples of compression codes are linear predictive coding based vocoders such as the GSM vocoders (HR, FR, EFR).
- Speech compression which falls under the category of lossy source coding, is commonly referred to as speech coding.
- Speech coding is performed to minimize the bandwidth necessary for speech transmission. This is especially important in wireless telephony where bandwidth is a scarce resource.
- speech coding is still important to minimize network delay and jitter. This is because speech communication, unlike data, is highly intolerant of delay. Hence a smaller packet size eases the transmission through a packet network.
- Table 1 Several industry standard speech codecs (coder-decoder pairs) are listed in Table 1 for reference.
- a set of consecutive digital speech samples is referred to as a speech frame.
- a speech encoder determines a small set of parameters for a speech synthesis model. With these speech parameters and the speech synthesis model, a speech frame can be reconstructed that appears and sounds very similar to the original speech frame. The reconstruction is performed by the speech decoder. It should be noted that, in most speech coders, the encoding process is much more computationally intensive than the decoding process. Furthermore, the MIPs required to attain good quality speech coding is very high. The processing capabilities of digital signal processing chipsets have advanced sufficiently only in recent years to enable the widespread use of speech coding in applications such as cellular telephone handsets.
- the speech parameters determined by the speech encoder depend on the speech synthesis model used.
- the coders in Table 1 utilize linear predictive coding (LPC) models.
- LPC linear predictive coding
- a block diagram of a simplified view of the LPC speech synthesis model is shown in Figure 3.
- This model can be used to generate speech-like signals by specifying the model parameters appropriately.
- the parameters include the time-varying filter coefficients, pitch periods, excitation vectors and gain factors. Basically, the excitation vector, c(n), is first scaled by the gain factor, G. The result is then filtered by a pitch synthesis
- Other models such as the multiband excitation model are also used in speech coding. In this context, it suffices to note that the speech parameters together with the assumed model provide a means to remove the redundancies in the digital speech signal so as to achieve compression.
- the overall DC gain is provided by G and ALC would primarily involve modifying G.
- SLRPs speech level related parameters
- the first three GSM codecs in Table 1 will now be discussed. All of the first three coders process speech sampled at 8kHz and assume that the samples are obtained as 13-bit linear PCM values.
- the frame length is 160 samples (20ms).
- the SLRP may be specified each subframe (e.g. the GSM FR and EFR codecs) or once per frame (e.g. the GSM HR codec).
- ⁇ gc and y gc are the unquantized and quantized gain correction factors in the GSM
- the quantized and corresponding unquantized parameters are related through
- the quantization function is a many-to-one transformation and is not
- FIG 4 that Figure distinguishes the coded domain from the linear domain.
- the coded domain refers to the output of speech encoders or the input of the speech decoders, which should be identical if there are no channel errors.
- the coded domain includes both the speech parameters and the methods used to quantize or dequantize these parameters.
- the speech parameters that are determined by the encoder undergo a quantization process prior to transmission. This quantization is critical to achieving bit rates lower than that required by the original digital speech signal. The quantization process often involves the use of look-up tables. Furthermore, different speech parameters may be quantized using different techniques.
- Processing of speech in the coded domain involves directly modifying the quantized speech parameters to a different set of quantized values allowed by the quantizer for each of the parameters.
- the parameters being modified are the SLRPs.
- the coded domain counterpart to the linear domain ALC configuration of Figure 2 is shown in Figure 5. Note that the codecs used for the two directions of transmission shown may not be identical. Furthermore, the codecs used may change over time. Hence the coded domain ALC algorithm preferably operates robustly under such changing conditions.
- the quantization of a single speech parameter is termed scalar quantization.
- Vector quantization is usually applied to a set of parameters that are related to each other in some way such as the LPC coefficients.
- Scalar quantization is generally applied to a parameter that is relatively independent of the other parameters. A mixture of both types of quantization methods is also possible.
- SLRPs are usually scalar quantized, focus is placed on the most commonly used scalar quantization techniques.
- the quantization process is independent of the past and future values of the parameter. Only the current value of the parameter is used in the quantization process.
- the parameter to be quantized is compared to a set of permitted quantization levels.
- the quantization level that best matches the given parameter in terms of some closeness measure is chosen to represent that parameter.
- the permitted quantization levels are stored in a look-up table at both the encoder and the decoder.
- the index into the table of the chosen quantization level is transmitted by the encoder to the decoder.
- the quantization level may be determined using a mathematical formula.
- the quantization levels are usually spaced non-uniformly in the case of
- Adaptive quantizers are often used in speech coding to minimize the quantization error at the cost of greater computational complexity.
- Adaptive quantizers may utilize forward adaptation or backward adaptation.
- forward adaptation schemes extra side information regarding the dynamic range has to be transmitted periodically to the decoder in addition to the quantization table index. Thus, such schemes are usually not used in speech coders.
- Backward adaptive quantizers are preferred because they do not require transmission of any side information.
- Two general types of backward adaptive quantizers are commonly used: standard deviation based and differential. These are depicted in
- a quantized version of the normalization factor is used at both the quantizer and dequantizer. In some variations of this scheme, decisions to expand or compress the quantization intervals may be based simply on the previous parameter input only.
- differential quantization scheme can also be represented as in Figure 7 when a linear predictor, P(z), is used. Note that if we approximate the transfer function P(z)/[1-P(z)] by the linear predictor,
- g c (n) denotes the gain factor that is used to scale the
- a 32-level non-uniform quantization is performed on ⁇ gc ( ⁇ ) to obtain ⁇ gc (n) .
- the decoder thus, can obtain the predicted gain in the same manner as the encoder using (3) once the current subframe
- R(n) denotes the prediction error given by
- the actual information transmitted from the encoder to the decoder are the bits representing the look-up table index of the quantized R(n) parameter,
- the quantization of the SLRP at the encoder is performed indirectly by using the mean-removed excitation vector energy each subframe.
- E( ⁇ ) denotes the mean-
- second line of equation (5) is the mean excitation vector energy, ⁇ _ (n) , i.e.
- the excitation vector ⁇ c(i) ⁇ is decoded at the decoder prior to the
- E( ⁇ ) is the predicted
- the decoder decodes the excitation vector and computes E, (n) using equation
- g c (n) Y gc (n)g c '( ⁇ ) .
- the 32 quantization levels for f gc (n) are
- Figure 10 Note that the vertical axis in Figure 10 which represents the quantization levels is plotted on a logarithmic scale.
- Figure 5 illustrated a preferred location of an ALC device operating on coded speech. With reference to this Figure, possible implementations of the ALC device will be discussed.
- the quantized SLRP is decoded (e.g., read) from the coded domain signal (e.g., compression code signal) and multiplied (e.g., adjusted) by a gain factor determined by the ALC algorithm.
- the SLRP may be considered an adjusted SLRP value.
- the result is then requantized (e.g., to form an adjusted SLRP).
- the coded domain signal is appropriately modified to reflect the change in the SLRP.
- the adjusted SLRP may be substituted for the original SLRP.
- any form of error protection used on the coded domain signal must be appropriately reinstated.
- the ALC device may require measures of the speech level, voice activity and double-talk activity to determine the gain that is to be applied to the SLRP. This may require the decoding of the coded domain signal to some extent. For most codecs, only a partial decoding of the coded speech is necessary to perform ALC. The speech is decoded to the extent necessary to extract (e.g., read) the
- SLRP as well as other parameters essential for obtaining sufficiently accurate speech level, voice activity and double-talk measurements.
- Some examples of situations where only partial decoding suffices include:
- a post-filtering process i.e., decoding step
- This post-filtering helps to reduce quantization noise but does not change the overall power level of the signal.
- the post-filtering process i.e., decoding step
- silence suppression scheme is often used in cellular telephony and voice over packet networks.
- coded speech frames are transmitted only during voice activity and very little transmission is performed during silence.
- the decoders automatically insert some comfort noise during the silence periods to mimic the background noise from the other end.
- One example of such a scheme used in GSM cellular networks is called discontinuous transmission (DTX).
- DTX discontinuous transmission
- the decoder in the ALC device can completely avoid decoding the signal during silence. In such cases, the determination of voice and double-talk activities can also be simplified in the ALC device.
- the coded speech bits for each channel will be carried through the wireline network between base stations at 64 kbits/sec. This bitstream can be divided into 8-bit samples. The 2 least significant bits of each sample will contain the coded speech bits while the upper 6 bits will contain the bits corresponding to the appropriate PCM samples.
- the conversion of the PCM information to linear speech is very inexpensive and provides a somewhat noisy version of the linear speech signal. It is possible to use this noisy linear domain speech signal to perform the necessary voice activity, double-talk and speech level measurements as is usually done in linear domain ALC algorithms. Thus, in this case, only a minimal amount of decoding of the coded domain speech parameters is necessary.
- the SLRP and any other parameters that are required for the requantization of the SLRP would have to be decoded.
- the other parameters would be decoded only to the extent necessary for requantization of the SLRP. This will be clear from the examples that will follow in later sections.
- SLRP that usually differs from the desired value.
- the desired gain that was applied by the Gain Determination block will differ from the gain that will be realized when the signal is decoded.
- overflow or underflow problems may arise due to this difference because the speech signal may be over-amplified or over- suppressed, respectively.
- some ALC algorithms may utilize the past desired gain values to determine current and future desired gain values. Since the desired gain values do not reflect the actual realized gain values, such algorithms may perform erroneously when applied as shown in Figure 12.
- the requantization process can sometimes result in undesirable reverberations in the SLRP. This can cause the speech level to be modulated unintentionally, resulting in a distorted speech signal.
- Such SLRP reverberations are encountered in feedback quantization schemes such as differential quantization.
- the iterative scheme of Figure 13 can be incorporated in the Gain Determination block.
- the realized gain value after requantization of the SLRP may be computed.
- the realized gain is checked to see if overflow or underflow problems could occur. This could be accomplished, for example, by determining what the new speech level would be by multiplying the realized gain by the original speech level.
- a speech decoder could be used in the ALC device to see whether overflow/underflow actually occurs. Either way, if the realized gain value is deemed to be too high or too low, the new SLRP is reduced or increased, respectively, until the danger of overflow/underflow is considered to be no longer present.
- the gain that is fed back should be the realized gain after the SLRP requantization process, not the desired gain.
- a preferred approach is shown in Figure 14. If the desired gain was used in the feedback loop instead of the realized gain, the controller would not be tracking the actual decoded speech signal level, resulting in erroneous level control.
- these methods preferably include the integration of the gain determination and SLRP requantization techniques.
- FIG. 15 illustrates the general configuration of an ALC device that uses joint gain determination and SLRP requantization. The details will depend on the particular ALC device.
- the requantization of the SLRPs for these particular cases will be described while noting that the approaches may be easily extended to any other quantization scheme.
- the joint determination of the gain and SLRP requantization in the ALC device configuration of Figure 15 may utilize the requantization techniques described here.
- the original value of the quantized SLRP will be denoted by ⁇ (n) , where n is
- the desired gain determined by the ALC device will be denoted by g(n) .
- the realized gain after SLRP requantization will be denoted by
- overflow and underflow prevention are desired, then the iterative scheme described in Figure 13 may be used.
- the partial decoding of the speech samples using the requantized SLRP may be performed to the extent necessary. This, of course, involves additional complexity in the algorithm. The decoded samples can then be directly inspected to ensure that overflow or underflow has not taken place.
- These desired gain values preferably have the same spacing as the SLRP quantization values, with OdB being one of the gains. This ensures that the desired and realized gain values will always be aligned so that equation (8) would not have to be evaluated for each table value. Hence the requantization is greatly simplified.
- the original quantization index of the SLRP is simply increased or decreased by a value corresponding to the desired gain value divided by the SLRP quantization table spacing. For instance, suppose that the SLRP quantization table spacing is denoted by ⁇ .
- the discrete set of permitted desired gain values would be l+ ⁇ ..., -2 ⁇ , - ⁇ , 0, ⁇ , 2 ⁇ , ... ⁇ if the SLRP quantization table values are uniformly spaced linearly, and 0+ ⁇ ..., -2 ⁇ , - ⁇ , 0, ⁇ , 2 ⁇ , ... ⁇ if the SLRP quantization table values are uniformly spaced linearly, and 0+ ⁇ ..., -2 ⁇ , - ⁇ , 0, ⁇ , 2 ⁇ , ... ⁇ if the SLRP quantization table values are
- ⁇ would be the average spacing between adjacent quantization table values, where the average is performed appropriately using either linear or logarithmic distances between the values.
- An example of instantaneous scalar requantization is shown for the GSM FR
- This codec's SLRP is the block maximum, x ⁇ , which is
- the Q and Q "1 blocks represent the SLRP requantization
- the index of the block maximum is first
- the index of the requantized x max is then substituted for the original value in the
- FIG. 17 shows a general coded domain ALC technique with only the compo- nents relevant to ALC being shown.
- G(n) denotes the original logarithmic gain value determined by the encoder.
- G(n) is equal to
- the SLRP, R(n) is modified by the ALC
- the device to R ALC (n) based on the desired gain.
- the realized gain, ⁇ R(n) is the
- AR(n) R ALC (n)-R(n) (9) Note that this is different from the actual gain realized at the decoder which,
- the actual realized gain is essentially an amplified version of the SLRP realized gain due to the decoding process, under steady-state conditions.
- steady-state it is meant that ⁇ G(n) is kept constant for a period of time that is sufficiently long so
- ⁇ R(n) is either steady or oscillates in a regular manner about a particular level.
- This method for differential scalar requantization basically attempts to mimic the operation of the encoder at the ALC device. If the presence of the quantizers at the encoder and the ALC device is ignored, then both the encoder and the ALC device
- G ALC (n) G(n) + AG(n) + quantization error
- the feedback of the SLRP realized gain, ⁇ R(n) , in the ALC device can cause
- Figure 18(a) shows the step in the desired gain.
- Figure 18(b) shows the actual realized gain superimposed on the desired gain.
- the reverberations in the SLRP realized gain shown in Figure 18(b) cause a modulation of the speech signal and can result in audible distortions. Thus, depending on the ALC specifications, such reverberations may be undesirable.
- the reverberations can be eliminated by 'moving' the quantizer outside the feedback loop
- the average error during steady state operation of the requantizer with and without the quantizer in the feedback loop are 0.39dB and 1.03dB, respectively.
- the ALC apparatus of Figure 19 can be simplified as shown in Figure 20, resulting in savings in computation. This is done by replacing the linear system
- Some ALC algorithms may utilize past gain values to determine current and future gain values.
- the gain that is fed back should be the actual realized gain after the SLRP requantization process, not the desired gain. This was discussed above in conjunction with Figure 14.
- Differential scalar requantization for such feedback-based ALC algorithms can be implemented as shown in Figure 21.
- the ALC device is mimicking the actions of the decoder to determine the actual realized gain.
- any of the methods described earlier that have quantizers within the feedback loop may be used.
- any of the methods described earlier that have quantizers outside the feedback loop may be used.
- the earliest point at which the first sample can be decoded is after the reception of bit 91 as shown in Figure 23(a). This represents a buffering delay of approximately 7.46ms. It turns out that sufficient information is received to decode not just the first sample but the entire first subframe at this point. Similarly, the entire first subframe can be decoded after about 7.11ms of buffering delay in the FR decoder.
- each subframe has an associated SLRP in both the EFR and FR coding schemes. This is generally true for most other codecs where the encoder operates at a subframe level.
- ALC in the coded domain can be performed subframe-by-subframe rather than frame-by-frame.
- the new SLRP computed by the ALC device can replace the original SLRP in the received bitstream.
- the delay incurred before the SLRP can be decoded is determined by the position of the bits corresponding to the SLRP in the received bitstream. In the case of the FR and EFR codecs, the position of the SLRP bits for the first subframe determines this delay.
- the ALC algorithm must be designed to determine the gain for the current subframe based on previous subframes only. In this way, almost no buffering delay will be necessary to modify the SLRP.
- the bits corresponding to the SLRP in a given subframe are received, they will first be decoded. Then the new SLRP will be computed based on the original SLRP and information from the previous subframes only. The original SLRP bits will be replaced with the new SLRP bits. There is no need to wait until all the bits necessary to decode the current subframe are received.
- the buffering delay incurred by the algorithm will depend on the processing delay which is small. Information about the speech level is derived from the current subframe only after replacement of the SLRP for the current subframe.
- the SLRP computed for the next subframe can be appropriately set to minimize the likelihood of continued overflows.
- This near-zero buffering delay method is especially applicable to the FR codec since the decoding of the SLRP for this codec does not involve decoding any other parameters.
- the subframe excitation vector is also needed to decode the SLRP and the more complex differential requantization techniques have to be used for requantizing the SLRP. Even in this case, significant reduction in the delay is attained by performing the speech level update based on the current subframe after the SLRP is replaced for the current subframe.
- the received bitstream can be divided into 8- bit samples.
- the 2 least significant bits of each sample will contain the coded speech bits while the upper 6 bits will contain the bits corresponding to the appropriate PCM samples.
- a noisy version of the linear speech samples is available to the ALC device in this case. It is possible to use this noisy linear domain speech signal to perform the necessary voice activity, double-talk and speech level measurements as is usually done in linear domain ALC algorithms.
- only a minimal amount of decoding of the coded domain speech parameters is necessary. Only parameters that are required for the determination and requantization of the SLRP would have to be decoded. Partial decoding of the speech signal is unnecessary as the noisy linear domain speech samples can be relied upon to measure the speech level as well as perform voice activity and double-talk detection.
- a processor which may include a microprocessor, a microcontroller or a digital signal processor, as well as other logic units capable of logical and arithmetic operations.
- Speech compression which falls under the category of lossy source coding, is commonly referred to as speech coding.
- Speech coding is performed to minimize the bandwidth necessary for speech transmission. This is especially important in wireless telephony where bandwidth is scarce. In the relatively bandwidth abundant packet networks, speech coding is still important to minimize network delay and jitter. This is because speech communication, unlike data, is highly intolerant of delay. Hence a smaller packet size eases the transmission through a packet network.
- the four ETSI GSM standards of concern are listed in Table 3. Each of the standards defines a linear predictive code. Table 3 is a subset of the speech codecs identified in Table 1. Table 3: GSM Speech Codecs
- a set of consecutive digital speech samples is referred to as a speech frame.
- the GSM coders operate on a frame size of 20ms (160 samples at 8kHz sampling rate). Given a speech frame, a speech encoder determines a small set of parameters for a speech synthesis model. With these speech parameters and the speech synthesis model, a speech frame can be reconstructed that appears and sounds very similar to the original speech frame. The reconstruction is performed by the speech decoder. In the GSM speech coders listed above, the encoding process is much more computationally intensive than the decoding process.
- the speech parameters determined by the speech encoder depend on the speech synthesis model used.
- the GSM coders in Table 3 utilize linear predictive coding (LPC) models.
- LPC linear predictive coding
- a block diagram of a simplified view of the LPC speech synthesis model is shown in Figure 3.
- the Figure 3 model can be used to generate speech-like signals by specifying the model parameters appropriately.
- the parameters include the time-varying filter coefficients, pitch periods, codebook vectors and the gain factors.
- the synthetic speech is generated as follows.
- An appropriate codebook vector, c( ⁇ ) is first scaled by the
- pitch synthesis filter whose parameters include the pitch gain, g .
- the pitch synthesis filter provides the harmonic
- the total excitation vector is then filtered by the LPC synthesis filter which specifies the broad spectral shape of the speech frame.
- the parameters are usually updated more than once. For instance, in the GSM FR and EFR coders, the codebook vector, codebook gain and the pitch synthesis filter parameters are determined every subframe (5ms).
- LPC synthesis filter parameters are determined twice per frame (every 10ms) in EFR and once per frame in FR.
- a typical speech encoder executes the following sequence of steps:
- a typical speech decoder executes the following sequence of steps:
- G specifies the DC gain of the transfer function. This, in turn, implies that G can be modified to adjust the overall speech level in an approximately linear manner. Hence, G is termed the
- GSM coders use speech level related parameters (SLRPs). These SLRPs correspond to G in the general speech synthesis model of Figure 3.
- CD-ALC coded domain ALC
- CD-ALC coded domain ALC
- SLRP modification algorithm For each codec, a different coded domain SLRP modification algorithm must be devised.
- preferred algorithms for the FR and EFR coders are described.
- the quantization of a single speech parameter is termed scalar quantization.
- vector quantization is usually applied to a set of parameters that are related to each other in some way such as the LPC coefficients.
- Scalar quantization is generally applied to a parameter that is relatively independent of the other parameters, such as the codebook gain. For the purposes of implementing CD-ALC, the discussion is limited to scalar quantization only.
- Both the FR and EFR coders utilize scalar quantization for their respective codebook gains (which we are also referring to as the SLRPs).
- the FR coder performs
- the EFR coder performs an adaptive differential scalar quantization
- ALC is shown in Figure 24.
- a communications system 10 transmits near end digital signals from a near end handset 12 over a network 14 using a compression code, such as any of the codes used by the Codecs identified inTable 2.
- the compression code is generated by an encoder 16 from linear audio signals generated by the near end handset 12.
- the compression code comprises parameters, such as the parameters labeled SLRP in Table 2.
- the parameters represent an audio signal comprising a plurality of audio characteristics, including audio level.
- the audio level is related to the parameters labeled SLRP inTable 2.
- the compression code is decodable by various decoding steps, including one or more steps for decoding the parameters related to audio level.
- system 10 adjusts the audio level with minimal delay and minimal, if any, decoding of the compression code parameter relating to audio level.
- Near end digital signals using the compression code are received on a near end terminal 20 and send in port Sin, and an adjusted compression code is transmitted by a near end terminal 22 and send out port Sout over a network 24 to a far end handset 26 which includes a decoder 28 of the compression code.
- a linear far end audio signal is encoded by a far end encoder 30 to generate far end digital signals using the same compression code as encoder 16, and is transmitted over a network 32 to a far end terminal 34 and receive in port Rin.
- Network 34 also transmits the far end signals to a terminal 36 and a receive out port Rout.
- a decoder 18 of near end handset 12 decodes the far end digital signals. As shown in Figure 24, echo signals from the far end signals may find their way to encoder 16 of the near end handset 12.
- a processor 40 performs various operations on the near end and far end compression code.
- Processor 40 may be a microprocessor, microcontroller, digital signal processor, or other type of logic unit capable of arithmetic and logical operations.
- a different coded domain SLRP modification algorithm is executed by processor 40.
- a linear domain level control algorithm 42 executed by processor 40 is in operation at all times - under native mode and linear mode, during TFO as well as non-TFO.
- a partial decoder 48 decodes enough of the compression code to form linear code from which the audio level of the audio signal represented by the compression code can be determined. Decoder 48 also reads a compression code parameter related to audio level, such as one of the parameters identified inTable 2. The read parameter is dequantized to form a parameter value.
- the linear domain level control algorithm determines the gain factor for level adjustment and writes it to a predetermined memory location within processor 40. This gain factor is read by the appropriate codec-dependent coded domain SLRP modification algorithm 44 also executed by processor 40. Algorithm 44 modifies the read SLRP parameter (i.e., the gain factor) to form an adjusted SLRP parameter value
- the adjusted parameter value is quantized to form an adjusted SLRP parameter which is written into the bit-stream received at terminal 20.
- the adjusted SLRP parameter is substitued for the original read SLRP paramter.
- the partial decoders 46 and 48 shown within the Network ALC Device are algorithms executed by processor 40 and are codec-dependent. In the case of GSM
- EFR the decoder post-filtering operations except for upscaling are unnecessary.
- GSM FR the complete decoder is implemented.
- a modular approach has the advantage that any existing or new linear domain level control algorithm can be incorporated with little or no modification with the coded domain SLRP modification algorithms.
- a coder-specific level control method might provide more accurate level adjustments. However, it may require a significant re-design of the existing linear domain level control algorithms to ensure smooth transitions when switching from native to linear mode (and vice versa). Note that there is a small risk that some undesirable artifacts may be occasionally introduced when switching between coded and linear modes when using the modular approach.
- the preferred embodiment includes a minimal delay technique.
- Large buffering, processing and transmission delays are already present in cellular networks without any network voice quality enhancement processing. Further network processing of the coded speech for speech enhancement purposes will add additional delay. If linear domain processing is performed on coded speech during TFO, more than a frame of delay (20ms) will be added due to buffering and processing requirements for decoding and re-encoding.
- CD-ALC can be performed with a buffering delay that is much less than one frame for FR and EFR coders.
- the delay reduction under CD-ALC is achieved for FR and EFR by performing level control a subframe at a time rather than frame-by-frame.
- the linear domain ALC algorithm can send the gain factor to the coded domain SLRP modification algorithm 44.
- the first subframe requires more than 5ms of delay before decoding can begin.
- Table 5 and Table 6 provide the earliest possible points at which decoding of samples can be performed as the bit-stream is received for the FR and EFR coders, respectively, and correspond to the illustration in Figure 23. Note that there are 260 bits/frame for the FR and 244 bits/frame for the EFR. The table assumes that the incoming bits are spread out evenly over 20ms, for the sake of simplicity. With this approximation, the first subframe requires 7.11ms for the FR and 7.46ms for the EFR. All other subframes require less delay.
- table specifies a six bit index for each range of values. The six bit index is re-inserted in the appropriate positions for each subframe.
- the quantized SLRP values are shown in Figure 6.
- the range of the quantized values is 31 to 32767. This represents a dynamic range of about 60dB (201og 10 (32767/31)).
- each subframe of the SLRP is as follows: (1) Both the near-end and far-end compression coded speech subframes are fully decoded by decoders 46 and 48. That is, the digital signals transmitted to terminals 20 and 34 are both fully decoded by decoders 46 and 48 to generate near end decoded signals and far end decoded signals indicative of audio level.
- the x ⁇ ' value is read from the coded near end signal by partial decoder
- the near end decoded signals and far end decoded signals are processed by the Linear Domain ALC (LD-ALC) algorithm 42 to determine the proper audio level.
- LD-ALC Linear Domain ALC
- only the double-talk information based on the far-end signal received at terminal 34 may be actually passed into the LD-ALC algorithm 42.
- CD-ALC 44 This may be achieved by writing to a predetermined memory location to be read by CD-ALC.
- CD-ALC 44 extracts the 6-bit table index for the current subframe according to
- the decoder code may be modified to pass this value to CD-ALC 44.
- ⁇ gc is considered the actual compression code SLRP because it is the only
- E,(ri) depends only on the subframe' s fixed codebook vector
- the quantized gain factor is computed using (12) as
- the adaptive differential quantization of the SLRP, ⁇ gc is performed in the
- R(n) is quantized by the
- the quantization of the SLRP at the encoder is performed indirectly by using the mean-removed codebook vector energy each subframe.
- E(n) denotes the mean-
- the codebook vector ⁇ c(i) ⁇ is required in order to decode the SLRP. Note that
- the decoding of the codebook vector is independent of the decoding of the SLRP.
- E( ⁇ ) is the predicted energy given by
- the decoder decodes the excitation vector and computes E,(n) using (16).
- E( ) is computed using previously decoded gain correction factors using
- the correction factor for the current subframe is used to obtain f from the look-up
- the quantized SLRP values are shown in Figure 10. Differences between adjacent quantization levels are shown in Figure 22. The range of the quantized values is 159 to 27485. This represents a dynamic range of about 45dB (201og 10 (27485/159)).
- the table of quantized SLRP values and the logarithms are also provided in Table 9. This table is necessary for re-encoding the SLRP.
- Table 9 Table of SLRP quantization values for GSM EFR
- the CD-ALC processing of the SLRP of each subframe is as follows:
- the near end decoded and far end decoded signals are processed by the Linear Domain ALC (LD-ALC) algorithm to determine the proper audio level.
- LD-ALC Linear Domain ALC
- only the double-talk information based on the far-end signal may be actually passed into the LD-ALC algorithm 42.
- CD-ALC 44 extracts the 5-bit table index for the current subframe according to
- the decoder code may be modified to pass this value to CD-ALC 44.
- R new (n) denotes the new or adjusted SLRP value.
- Gain actual (n) 20 log 10 g ALC ( ⁇ ) - Gain predicted ( ⁇ ) (19)
- PastDeltaR[3] PastDeltaR[2]
- PastDeltaR[2] PastDeltaR[l]
- PastDeltaR[l] PastDeltaR[0]
- PastDeltaR[0] Gain aclual (n)
- R new (n) is quantized to obtain an adjusted parameter R new (n) using Table 9.
- R new (n) is assigned the value that is closest in terms of the absolute difference
- R new (n) is inserted (e.g., written or substituted) appropriately back into the coded
- 201og(g c x / ) 201og c + 201og g ⁇ C in the SLRP encoding process. Since our
- the gain factor changes are generally small and
- the gain factor adjustment steps should be limited to ⁇ 3 dB for operation in conjunction with GSM FR codecs, which is the same as the usual LD- ALC step size. (In some version of LD-ALC, 6dB steps were possible; this should be avoided.) Hence the possible dB gain values should be restricted to ⁇ -3, -6, 0, 3, 6, 9,
- the gain factor adjustment steps should be limited to ⁇ 3.39 dB steps for operation in conjunction with GSM EFR codecs. (In some version of LD-ALC, 6dB steps were possible; this should be avoided.) This step size is optimized specifically for EFR to minimize the transient effects and maximize accuracy. Hence the possible dB gain values should be restricted to ⁇ -6.77, -3.39, 0, 3.39, 6.77, 10.16, 13.55, 16.93 ⁇ .
- Any gain changes should be restricted to occur only at the beginning of a subframe boundary. This ensures that the sample at which a gain change occurs is identical in both the linear (upper 6 PCM bits) and coded signals.
- a subframe (40 samples) of speech should be processed at a time for efficiency.
- the CD-ALC algorithm utilizes an LD-ALC algorithm to determine the gain adjustments, the CD-ALC algorithm performance is, in a sense, upper bounded by the LD-ALC performance. Thus, even if the LD-ALC algorithm complies with
- Figure 29 shows the results for a case when CD-ALC is used in conjunction with FR.
- the upper plot shows power profiles of the original (dashed line) and processed (solid line) signals. A 40ms time constant was used in the recursive mean- square averaging of the signals to obtain the power profiles.
- the lower plot shows the LD-ALC gain (blue, dashed line) at the end of each subframe; also shown is the ratio of the processed power to the original power at the end of each subframe. In the regions where the speech signal is strong, the amplification of the signal corresponds quite closely to the desired gain.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Quality & Reliability (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Reduction Or Emphasis Of Bandwidth Of Signals (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14213699P | 1999-07-02 | 1999-07-02 | |
US142136P | 1999-07-02 | ||
PCT/US2000/018293 WO2001003317A1 (en) | 1999-07-02 | 2000-06-30 | Coded domain adaptive level control of compressed speech |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1190494A1 true EP1190494A1 (de) | 2002-03-27 |
Family
ID=22498680
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00946994A Withdrawn EP1190494A1 (de) | 1999-07-02 | 2000-06-30 | Adaptive pegelregelung komprimierter sprache im kodebereich |
EP00948555A Withdrawn EP1190495A1 (de) | 1999-07-02 | 2000-06-30 | Echo-regelung im kodebereich |
EP00946954A Pending EP1208413A2 (de) | 1999-07-02 | 2000-06-30 | Kodierte domain rauschsteuerung. |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP00948555A Withdrawn EP1190495A1 (de) | 1999-07-02 | 2000-06-30 | Echo-regelung im kodebereich |
EP00946954A Pending EP1208413A2 (de) | 1999-07-02 | 2000-06-30 | Kodierte domain rauschsteuerung. |
Country Status (5)
Country | Link |
---|---|
EP (3) | EP1190494A1 (de) |
JP (3) | JP2003503760A (de) |
AU (3) | AU6203300A (de) |
CA (3) | CA2378035A1 (de) |
WO (3) | WO2001003317A1 (de) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1301018A1 (de) * | 2001-10-02 | 2003-04-09 | Alcatel | Verfahren und Vorrichtung zum Ändern eines digitalen Signals im Kodebereich |
JP3946074B2 (ja) * | 2002-04-05 | 2007-07-18 | 日本電信電話株式会社 | 音声処理装置 |
JP3876781B2 (ja) * | 2002-07-16 | 2007-02-07 | ソニー株式会社 | 受信装置および受信方法、記録媒体、並びにプログラム |
EP1521242A1 (de) * | 2003-10-01 | 2005-04-06 | Siemens Aktiengesellschaft | Verfahren zur Sprachkodierung mit Geräuschunterdrückung durch Modifizierung der Kodebuchverstärkung |
US7613607B2 (en) | 2003-12-18 | 2009-11-03 | Nokia Corporation | Audio enhancement in coded domain |
US8874437B2 (en) | 2005-03-28 | 2014-10-28 | Tellabs Operations, Inc. | Method and apparatus for modifying an encoded signal for voice quality enhancement |
US8078659B2 (en) * | 2005-10-31 | 2011-12-13 | Telefonaktiebolaget L M Ericsson (Publ) | Reduction of digital filter delay |
US7852792B2 (en) * | 2006-09-19 | 2010-12-14 | Alcatel-Lucent Usa Inc. | Packet based echo cancellation and suppression |
JP4915575B2 (ja) * | 2007-05-28 | 2012-04-11 | パナソニック株式会社 | 音声伝送システム |
JP4915576B2 (ja) * | 2007-05-28 | 2012-04-11 | パナソニック株式会社 | 音声伝送システム |
JP4915577B2 (ja) * | 2007-05-28 | 2012-04-11 | パナソニック株式会社 | 音声伝送システム |
US8032365B2 (en) * | 2007-08-31 | 2011-10-04 | Tellabs Operations, Inc. | Method and apparatus for controlling echo in the coded domain |
WO2012106926A1 (zh) | 2011-07-25 | 2012-08-16 | 华为技术有限公司 | 一种参数域回声控制装置和方法 |
TWI469135B (zh) * | 2011-12-22 | 2015-01-11 | Univ Kun Shan | 調適性差分脈衝碼調變編碼解碼的方法 |
JP6011188B2 (ja) * | 2012-09-18 | 2016-10-19 | 沖電気工業株式会社 | エコー経路遅延測定装置、方法及びプログラム |
US11031023B2 (en) | 2017-07-03 | 2021-06-08 | Pioneer Corporation | Signal processing device, control method, program and storage medium |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0683114B2 (ja) * | 1985-03-08 | 1994-10-19 | 松下電器産業株式会社 | エコ−キヤンセラ |
US4969192A (en) * | 1987-04-06 | 1990-11-06 | Voicecraft, Inc. | Vector adaptive predictive coder for speech and audio |
US5140543A (en) * | 1989-04-18 | 1992-08-18 | Victor Company Of Japan, Ltd. | Apparatus for digitally processing audio signal |
US5097507A (en) * | 1989-12-22 | 1992-03-17 | General Electric Company | Fading bit error protection for digital cellular multi-pulse speech coder |
US5680508A (en) * | 1991-05-03 | 1997-10-21 | Itt Corporation | Enhancement of speech coding in background noise for low-rate speech coder |
JP3353257B2 (ja) * | 1993-08-30 | 2002-12-03 | 日本電信電話株式会社 | 音声符号化復号化併用型エコーキャンセラー |
US5828995A (en) * | 1995-02-28 | 1998-10-27 | Motorola, Inc. | Method and apparatus for intelligible fast forward and reverse playback of time-scale compressed voice messages |
JPH0954600A (ja) * | 1995-08-14 | 1997-02-25 | Toshiba Corp | 音声符号化通信装置 |
JPH0993132A (ja) * | 1995-09-27 | 1997-04-04 | Toshiba Corp | 符号化・復号化装置及び方法 |
JPH10143197A (ja) * | 1996-11-06 | 1998-05-29 | Matsushita Electric Ind Co Ltd | 再生装置 |
US5943645A (en) * | 1996-12-19 | 1999-08-24 | Northern Telecom Limited | Method and apparatus for computing measures of echo |
JP3283200B2 (ja) * | 1996-12-19 | 2002-05-20 | ケイディーディーアイ株式会社 | 符号化音声データの符号化レート変換方法および装置 |
US6064693A (en) * | 1997-02-28 | 2000-05-16 | Data Race, Inc. | System and method for handling underrun of compressed speech frames due to unsynchronized receive and transmit clock rates |
JP3317181B2 (ja) * | 1997-03-25 | 2002-08-26 | ヤマハ株式会社 | カラオケ装置 |
US6112177A (en) * | 1997-11-07 | 2000-08-29 | At&T Corp. | Coarticulation method for audio-visual text-to-speech synthesis |
CN1737903A (zh) * | 1997-12-24 | 2006-02-22 | 三菱电机株式会社 | 声音译码方法以及声音译码装置 |
-
2000
- 2000-06-30 EP EP00946994A patent/EP1190494A1/de not_active Withdrawn
- 2000-06-30 CA CA002378035A patent/CA2378035A1/en not_active Abandoned
- 2000-06-30 AU AU62033/00A patent/AU6203300A/en not_active Abandoned
- 2000-06-30 JP JP2001508064A patent/JP2003503760A/ja active Pending
- 2000-06-30 AU AU60671/00A patent/AU6067100A/en not_active Abandoned
- 2000-06-30 JP JP2001508667A patent/JP2003504669A/ja active Pending
- 2000-06-30 AU AU60636/00A patent/AU6063600A/en not_active Abandoned
- 2000-06-30 CA CA002378012A patent/CA2378012A1/en not_active Abandoned
- 2000-06-30 JP JP2001508063A patent/JP2003533902A/ja active Pending
- 2000-06-30 CA CA002378062A patent/CA2378062A1/en not_active Abandoned
- 2000-06-30 WO PCT/US2000/018293 patent/WO2001003317A1/en not_active Application Discontinuation
- 2000-06-30 EP EP00948555A patent/EP1190495A1/de not_active Withdrawn
- 2000-06-30 WO PCT/US2000/018165 patent/WO2001002929A2/en not_active Application Discontinuation
- 2000-06-30 WO PCT/US2000/018104 patent/WO2001003316A1/en not_active Application Discontinuation
- 2000-06-30 EP EP00946954A patent/EP1208413A2/de active Pending
Non-Patent Citations (1)
Title |
---|
See references of WO0103317A1 * |
Also Published As
Publication number | Publication date |
---|---|
JP2003504669A (ja) | 2003-02-04 |
WO2001003316A1 (en) | 2001-01-11 |
JP2003533902A (ja) | 2003-11-11 |
AU6067100A (en) | 2001-01-22 |
CA2378012A1 (en) | 2001-01-11 |
JP2003503760A (ja) | 2003-01-28 |
EP1208413A2 (de) | 2002-05-29 |
WO2001002929A2 (en) | 2001-01-11 |
CA2378062A1 (en) | 2001-01-11 |
AU6203300A (en) | 2001-01-22 |
WO2001002929A3 (en) | 2001-07-19 |
AU6063600A (en) | 2001-01-22 |
WO2001003317A1 (en) | 2001-01-11 |
EP1190495A1 (de) | 2002-03-27 |
CA2378035A1 (en) | 2001-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7362811B2 (en) | Audio enhancement communication techniques | |
RU2325707C2 (ru) | Способ и устройство для эффективного маскирования стертых кадров в речевых кодеках на основе линейного предсказания | |
US7539615B2 (en) | Audio signal quality enhancement in a digital network | |
EP1190494A1 (de) | Adaptive pegelregelung komprimierter sprache im kodebereich | |
US8543388B2 (en) | Efficient speech stream conversion | |
US7907977B2 (en) | Echo canceller with correlation using pre-whitened data values received by downlink codec | |
US20070160154A1 (en) | Method and apparatus for injecting comfort noise in a communications signal | |
US6026356A (en) | Methods and devices for noise conditioning signals representative of audio information in compressed and digitized form | |
US20040243404A1 (en) | Method and apparatus for improving voice quality of encoded speech signals in a network | |
US20030065507A1 (en) | Network unit and a method for modifying a digital signal in the coded domain | |
US8144862B2 (en) | Method and apparatus for the detection and suppression of echo in packet based communication networks using frame energy estimation | |
EP1020848A2 (de) | Verfahren zur Übertragung von zusätzlichen informationen in einem Vokoder-Datenstrom | |
US5812944A (en) | Mobile speech level reduction circuit responsive to base transmitted signal | |
EP1544848A2 (de) | Qualitätsverbesserung eines Audiosignals im Kodierbereich | |
Chandran et al. | Compressed domain noise reduction and echo suppression for network speech enhancement | |
US20050102136A1 (en) | Speech codecs | |
Kondoz et al. | A high quality voice coder with integrated echo canceller and voice activity detector for VSAT systems | |
CN100369108C (zh) | 编码域中的音频增强的方法和设备 | |
Åkerberg et al. | Audio Techniques |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20020111 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20040428 |