EP0954851A1 - Mehrstufiger sprachkodierer mit transformationskodierung von prädiktionsresiduen mittels quantisierung anhand auditiver modelle - Google Patents
Mehrstufiger sprachkodierer mit transformationskodierung von prädiktionsresiduen mittels quantisierung anhand auditiver modelleInfo
- Publication number
- EP0954851A1 EP0954851A1 EP97907830A EP97907830A EP0954851A1 EP 0954851 A1 EP0954851 A1 EP 0954851A1 EP 97907830 A EP97907830 A EP 97907830A EP 97907830 A EP97907830 A EP 97907830A EP 0954851 A1 EP0954851 A1 EP 0954851A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- signal
- speech
- pitch
- lpc
- quantized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000013139 quantization Methods 0.000 title abstract description 22
- 238000000034 method Methods 0.000 claims description 32
- 238000001228 spectrum Methods 0.000 abstract description 21
- 230000007774 longterm Effects 0.000 abstract description 8
- 238000007906 compression Methods 0.000 abstract description 5
- 230000006835 compression Effects 0.000 abstract description 5
- 230000008447 perception Effects 0.000 abstract description 4
- 238000005070 sampling Methods 0.000 abstract description 4
- 239000013598 vector Substances 0.000 description 62
- 238000007493 shaping process Methods 0.000 description 32
- 230000000873 masking effect Effects 0.000 description 23
- 230000004044 response Effects 0.000 description 21
- 230000015572 biosynthetic process Effects 0.000 description 18
- 238000003786 synthesis reaction Methods 0.000 description 18
- 230000003044 adaptive effect Effects 0.000 description 15
- 230000015654 memory Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 14
- 230000008569 process Effects 0.000 description 8
- 230000003595 spectral effect Effects 0.000 description 8
- 230000007480 spreading Effects 0.000 description 7
- 238000003892 spreading Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000006243 chemical reaction Methods 0.000 description 5
- 230000005284 excitation Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000012937 correction Methods 0.000 description 4
- 238000009499 grossing Methods 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 239000000872 buffer Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000002238 attenuated effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000003750 conditioning effect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 239000012536 storage buffer Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 210000000721 basilar membrane Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/002—Dynamic bit allocation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
Definitions
- the present invention relates to the compression (coding) of audio signals, for example, speech signals, using a predictive coding system.
- Speech coding such as telephone-bandwidth (3.4 kHz) speech coding at or below 16 kb/s
- speech coding has been dominated by time-domain predictive coders. These coders use speech production models to predict speech waveforms to be coded. Predicted waveforms are then subtracted from the actual (original) waveforms is (to be coded) to reduce redundancy in the original signal. Reduction in signal redundancy provides coding gain.
- Examples of such predictive speech coders include Adaptive Predictive Coding, Multi-Pulse Linear Predictive Coding, and Code-Excited Linear Prediction (CELP) Coding, all well known in the art of speech signal compression.
- CELP Code-Excited Linear Prediction
- music coders use elaborate human hearing models to code only those parts of the signal that are perceptually relevant. That is, unlike speech coders which commonly use speech production models, music coders employ hearing -- sound reception - models to obtain coding gain.
- noise masking capability refers to how much quantization noise can be introduced into a music signal without a listener noticing the noise. This noise masking capability is then used to set quantizer resolution (e.g., quantizer stepsize). Generally, the more "tonelike" music is, the poorer the music will be at masking quantization noise and, therefore, the smaller the required stepsize will be, and wee versa. Smaller stepsizes correspond to smaller coding gains, and vice versa. Examples of such music coders include AT&T's Perceptual Audio Coder (PAC) and the ISO MPEG audio coding standard.
- PAC Perceptual Audio Coder
- wideband speech coding In between telephone-bandwidth speech coding and wideband music coding, there lies wideband speech coding, where the speech signal is sampled at 16 kHz and has a bandwidth of 7 kHz.
- the advantage of 7 kHz wideband speech is that the resulting speech quality is much better than telephone-bandwidth speech, and yet it requires a much lower bit-rate to code than a 20 kHz audio signal.
- some use time-domain predictive coding some use frequency-domain transform or sub-band coding, and some use a mixture of time-domain and frequency-domain techniques.
- perceptual criteria in predictive speech coding, wideband or otherwise, has been limited to the use of a perceptual weighting filter in the context of selecting the best synthesized speech signal from among a plurality of candidate synthesized speech signals. See, e.g., U.S. Patent No. Re. 32,580 to Atal et al. Such filters accomplish a type of noise shaping which is useful reducing noise in the coding process.
- One known coder attempts to improve upon this technique by employing a perceptual model in the formation of that perceptual weighting filter.
- the present invention combines a predictive coding system with a quantization process which quantizes a signal based on a noise masking signal determined with a model of human auditory sensitivity to noise.
- the output of the predictive coding system is thus quantized with a quantizer having a resolution (e.g., stepsize in a uniform scalar quantizer, or the number of bits used to identify vectors in a vector quantizer) which is a function of a noise masking signal determined in accordance with a audio perceptual model.
- a signal is generated which represents an estimate (or prediction) of a signal representing speech information.
- original signal representing speech information is broad enough to refer not only to speech itself, but also to speech signal derivatives commonly found in speech coding systems (such as linear prediction and pitch prediction residual signals).
- the estimate signal is then compared to the original signal to form a signal representing the difference between said compared signals.
- This signal representing the difference between the compared signals is then quantized in accordance with a perceptual noise masking signal which is generated by a model of human audio perception.
- TPC Transform Predictive Coding
- TPC encodes 7 kHz wideband speech at a target bit-rate of 16 to 32 kb/s.
- TPC combines transform coding and predictive coding techniques in a single coder. More specifically, the coder uses linear prediction to remove the redundancy from the input speech waveform and then use transform coding techniques to encode the resulting prediction residual.
- the transformed prediction residual is quantized based on knowledge in human auditory perception, expressed in terms of a auditory perceptual model, to encode what is audible and discard what is inaudible.
- One important feature of the illustrative embodiment concerns the way in which perceptual noise masking capability (e.g., the perceptual threshold of "just noticeable distortion") of the signal is determined and subsequent bit allocation is performed.
- the noise masking threshold and bit allocation of the embodiment are determined based on the frequency response of a quantized synthesis filter - in the embodiment, a quantized LPC synthesis filter.
- This feature provides an advantage to the system of not having to communicate bit allocation signals, from the encoder to the decoder, in order for the decoder to replicate the perceptual threshold and bit allocation processing needed for decoding the received coded wideband speech information. Instead, synthesis filter coefficients, which are being communicated for other purposes, are exploited to save bit rate.
- Another important feature of the illustrative embodiment concerns how the TPC coder allocates bits among coder frequencies and how the decoder generates a quantized output signal based on the allocated bits.
- the TPC coder allocates bits only to a portion of the audio band (for example, bits may be allocated to coefficients between 0 and 4 kHz, only). No bits are allocated to represent coefficients between 4 kHz and 7 kHz and, thus, the decoder gets no coefficients in this frequency range.
- the TPC coder has to operate at 5 very low bit rates, e.g., 16 kb/s.
- the decoder Despite having no bits representing the coded signal in the 4 kHz and 7 kHz frequency range, the decoder must still synthesize a signal in this range if it is to provide a wideband response.
- the decoder generates - that is, synthesizes - coefficient signals in this range of frequencies based on other o available information - a ratio of an estimate of the signal spectrum (obtained from LPC parameters) to a noise masking threshold at frequencies in the range. Phase values for the coefficients are selected at random.
- the potential applications of a wideband speech coder include ISDN video-conferencing or audio-conferencing, multimedia audio, "hi-fi” telephony, and simultaneous voice and data (SVD) over dial-up lines using modems at 28.8 kb/s or higher.
- ISDN video-conferencing or audio-conferencing multimedia audio
- "hi-fi" telephony multimedia audio
- SMD simultaneous voice and data
- Figure 1 presents an illustrative coder embodiment of the present invention.
- Figure 2 presents an illustrative decoder embodiment of the present invention.
- FIG. 3 presents a detailed block diagram of the LPC parameter processor of Figure 1.
- processors For clarity of explanation; the illustrative embodiment of the present invention is presented as comprising individual functional blocks (including functional blocks labeled as "processors"). The functions these blocks represent may be provided through the use of either shared or dedicated hardware, including, but not limited to, hardware capable of executing software. For example, the functions of processors presented in Figures 1 to 4 may be provided by a single shared processor. (Use of the term "processor” should not be construed to refer exclusively to hardware capable of executing software.)
- Illustrative embodiments may comprise digital signal processor (DSP) hardware, such as the AT&T DSP16 or DSP32C, read-only memory (ROM) for storing software performing the operations discussed below, and random access memory (RAM) for storing DSP results.
- DSP digital signal processor
- ROM read-only memory
- RAM random access memory
- VLSI Very large scale integration
- the sequence of digital input speech samples is partitioned into consecutive 20 ms blocks called frames, and each frame is further subdivided into 5 equal subframes of 4 ms each. Assuming a sampling rate of 16 kHz, as is common for wideband speech signals, this corresponds to a frame size of 320 samples and a subframe size 10 of 64 samples.
- the TPC speech coder buffers and processes the input speech signal frame-by-frame, and within each frame certain encoding operations are performed subframe-by-subframe.
- FIG 1 presents an illustrative TPC speech coder embodiment of the present invention. Refer to the embodiment shown in Figure 1. Once every
- the LPC parameter processor 10 derives the Line Spectral Pair (LSP) parameters from the input speech signal s, quantizes such LSP parameters, interpolates them for each 4 ms subframe, and then converts to the LPC predictor coefficient array a for each subframe. Short-term redundancy is removed from the input speech signal, s, by the LPC prediction
- LSP Line Spectral Pair
- the shaping filter coefficient processor 30 derives the shaping filter coefficients awe from quantized LPC filter coefficients a.
- the shaping filter 40 filters the LPC prediction residual signal d to produce a perceptually weighted speech
- the zero-input response processor 50 calculates the zero-input response, zir, of the shaping filter.
- the subtracting unit 60 then subtracts zir from sw to obtain tp, the target signal for pitch prediction.
- the open-loop pitch extractor and interpolator 70 uses the LPC prediction residual d to extract a pitch period for each 20 ms frame, and then 3 o calculates the interpolated pitch period kpi for each 4 ms sub-frame.
- the closed-loop pitch tap quantizer and pitch predictor 80 uses this interpolated pitch period kpi to select a set of 3 pitch predictor taps from a codebook of candidate sets of pitch taps. The selection is done such that when the previously quantized LPC residual signal dt is filtered by the corresponding 3- tap pitch synthesis filter and then by a shaping filter with zero initial memory, the output signal hd is closest to the target signal tp in a mean-square error (MSE) sense.
- MSE mean-square error
- the subtracting unit 90 subtracts ndfrom tp to obtain tt, the target signal for transform coding.
- the shaping filter magnitude response processor 100 calculates the signal mag, the magnitude of the frequency response of the shaping filter.
- the transform processor 110 performs a linear transform, such as Fast Fourier Transform (FFT), on the signal tt. Then, it normalizes the transform coefficients using mag and the quantized versions of gain values which is calculated over three different frequency bands. The result is the normalized transform coefficient signal tc.
- the transform coefficient quantizer 120 then quantizes the signal tc using the adaptive bit allocation signal ba, which is determined by the hearing model quantizer control processor 130 according to the time-varying perceptual importance of transform coefficients at different frequencies.
- processor 130 At a lower bit-rate, such as 16 kb/s, processor 130 only allocates bits to the lower half of the frequency band (0 to 4 kHz). In this case, the high- 0 frequency synthesis processor 140 synthesizes the transform coefficients in the high-frequency band (4 to 8 kHz), and combine them with the quantized low-frequency transform coefficient signal dtc to produce the final quantized full-band transform coefficient signal qtc.
- a higher bit-rate such as 24 or 32 kb/s, each transform coefficient in the entire frequency band is allowed to 5 receive bits in the adaptive bit allocation process, although coefficients may eventually receive no bits at all due to the scarcity of the available bits.
- the high-frequency synthesis processor 140 simply detects those frequencies in the 4 to 8 kHz band that receive no bits, and fills in such "spectral holes” with low-level noise to avoid a type of "swirling" distortion o typically found in adaptive transform coders.
- the inverse transform processor 150 takes the quantized transform coefficient signal qtc, and applies a linear transform which is the inverse operation of the linear transform employed in the transform processor 110 (an inverse FFT in our particular illustrative embodiment here). This results in a time-domain signal qtt, which is the quantized version of tt, the target signal for transform coding.
- the inverse shaping filter 160 then filters qtt to obtain the quantized excitation signal et.
- the adder 170 adds ef to the signal dh (which is the pitch-predicted version of the LPC prediction residual d) produced by the pitch predictor inside block 80.
- the resulting signal dt is the quantized version of the LPC prediction residual d. It is used to update the filter memory of the shaping filter inside the zero-input response processor 50 and the memory of the pitch predictor inside block 80. This completes the signal loop.
- Codebook indices representing the LPC predictor parameters (IL), the pitch predictor parameters (IP and IT), the transform gain levels (IG), and the quantized transform coefficients (IC) are multiplexed into a bit stream by the multiplexer 180 and transmitted over a channel to a decoder.
- the channel may comprise any suitable communication channel, including wireless channels, computer and data networks, telephone networks; and may include or consist of memory, such as, solid state memories (for example, semiconductor memory), optical memory systems (such as CD-ROM), magnetic memories (for example, disk memory), etc.
- FIG. 2 presents an illustrative TPC speech decoder embodiment of the present invention.
- the demultiplexer 200 separates the codebook indices IL, IP, IT, IG, and IC.
- the pitch decoder and interpolator 205 decodes IP and calculates the interpolated pitch period kpi.
- the pitch tap decoder and pitch predictor 210 decodes IT to obtain the pitch predictor taps array b, and it also calculates the signal dh, or the pitch-predicted version of the LPC prediction residual d.
- the LPC parameter decoder and interpolator 215 decodes IL and then calculates the interpolated LPC filter coefficient array a.
- Blocks 220 through 255 perform exactly the same operations as their counterparts in Figure 1 to produce the quantized LPC residual signal dt.
- the long-term postfilter 260 enhances the pitch periodicity in dt and produces a filtered version fdt as its output.
- This signal is passed through the LPC synthesis filter 265, and the resulting signal st is further filtered by the short-term postfilter 270, which produces a final filtered output speech signal fst.
- Open-loop quantization means the quantizer attempts to minimize the difference between the unquantized parameter and its quantized version, without regard to the effects on the output speech quality. This is in contrast to, for example, CELP coders, where the pitch predictor, the gain, and the excitation are usually close-loop quantized.
- the quantizer codebook search attempts to minimize the distortion in the final reconstructed output speech. Naturally, this generally leads to a better output speech quality, but at the price of a higher codebook search complexity.
- the TPC coder uses closed-loop quantization only for the 3 pitch predictor taps.
- the quantization operations leading to the quantized excitation signal et is basically similar to open-loop quantization, but the effects on the output speech is close to that of closed-loop quantization.
- This approach is similar in spirit to the approach used in the TCX coder by Lefebvre et. al., " High Quality Coding of Wideband Audio Signals Using Transform Coded Excitation (TCX)", Proc. IEEE International Conf. Acoustics, Speech, Signal Processing, 1994, pp. 1-193 to 1-196, although there are also important differences.
- the features of the current invention that are not in the TCX coder include normalization of the transform coefficients by a shaping filter magnitude response, adaptive bit allocation controlled by a hearing model, and the high-frequency synthesis and noise fill-in procedures.
- Processor 10 comprises a windowing and autocorrelation processor 310; a spectral smoothing and white noise correction processor 315; a Levinson-Durbin recursion processor 320; a bandwidth expansion processor 325; an LPC to LSP conversion processor 330; and LPC power spectrum processor 335; an LSP quantizer 340; an LSP sorting processor 345; an LSP interpolation processor 350; and an LSP to LPC conversion processor 355.
- Windowing and autocorrelation processor 310 begins the process of LPC coefficient generation.
- Processor 310 generates autocorrelation coefficients, r, in conventional fashion, once every 20 ms from which LPC coefficients are subsequently computed, as discussed below. See Rabiner, L. R. er a/., Digital Processing of Speech Signals, Prentice-Hall, Inc.,
- the LPC frame size is 20 ms (or 320 speech samples at 16 kHz sampling rate). Each 20 ms frame is further divided into 5 subframes, each 4 ms (or 64 samples) long. LPC analysis processor uses a 24 ms Hamming window which is centered at the last 4 ms subframe of the current frame, in conventional fashion.
- SST spectral smoothing technique
- white noise correction processor 315 spectral smoothing and white noise correction processor 315 before LPC analysis.
- SST well- known in the art (Tohkura, Y. et al., "Spectral Smoothing Technique in
- PARCOR Speech Analysis-Synthesis involves multiplying an calculated autocorrelation coefficient array (from processor 310) by a Gaussian window whose Fourier transform corresponds to a 5 probability density function (pdf) of a Gaussian distribution with a standard deviation of 40 Hz.
- the white noise correction also conventional (Chen, J.- H., "A Robust Low-Delay CELP Speech Coder at 16 kbit/s, Proc. IEEE Global Comm. Conf., pp. 1237-1241 , Dallas, TX, November 1989.), increases the zero-lag autocorrelation coefficient (i.e., the energy term) by 0.001%. o
- the coefficients generated by processor 315 are then provided to
- LPC predictor coefficients are converted to the Line Spectral Pair (LSP) coefficients by LPC to LSP conversion processor 330 in conventional fashion.
- LSP Line Spectral Pair
- VQ Vector quantization
- LSP quantizer 340 provides LSP quantizer 340 to quantize the resulting LSP coefficients.
- the specific VQ technique employed by processor 240 is similar to the split VQ proposed in Paliwal, K. K. et al., "Efficient Vector Quantization of LPC Parameters at 24 bits/frame," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 661-664, Toronto, Canada, May 1991 (Paliwal et al.), which is inco ⁇ orated by reference as if set forth fully herein.
- the 16-dimensional LSP vector is split into 7 smaller sub-vectors having the dimensions of 2, 2, 2, 2, 2, 3, 3, counting from the low- frequency end.
- Each of the 7 sub-vectors are quantized to 7 bits (i.e., using a VQ codebook of 128 codevectors).
- codebook indices IL( ⁇ ) - IL(7), each index being seven bits in length, for a total of 49 bits per frame used in LPC parameter quantization.
- These 49 bits are provided to the multiplexer 180 for transmission to the decoder as side information.
- Processor 340 performs its search through the VQ codebook using a conventional weighted mean-square error (WMSE) distortion measure, as described in Paliwal et al.
- the LPC power spectrum processor 335 is used to calculate the weights in this WMSE distortion measure.
- the codebook used in processor 340 is designed with conventional codebook generation techniques well-known in the art.
- a conventional MSE distortion measure can also be used instead of the WMSE measure to reduce the coder's complexity without significant degradation in the output speech quality.
- the LSP sorting processor 345 sorts the quantized LSP coefficients to restore the monotonically increasing order and ensure stability.
- the quantized LSP coefficients are used in the last subframe of the current frame. Linear interpolation between these LSP coefficients and those from the last subframe of the previous frame is performed to provide LSP coefficients for the first four subframes by LSP interpolation processor 350, as is conventional. The interpolated and quantized LSP coefficients are then converted back to the LPC predictor coefficients for use in each subframe by LSP to LPC conversion processor 355 in conventional fashion. This is done in both the encoder and the decoder. The LSP inte ⁇ olation is important in maintaining the smooth reproduction of the output speech. The LSP interpolation allows the LPC predictor coefficients to be updated once a subframe (4 ms) in a smooth fashion. The resulting LPC predictor coefficient array a are used in the LPC prediction error filter 20 to predict the coder's input signal. The difference between the input signal and its predicted version is the LPC prediction residual, d.
- the shaping filter coefficient processor 30 computes the first three autocorrelation coefficients of the LPC predictor coefficient array a, then uses the Levinson-Durbin recursion to solve for the coefficients c ⁇ ,j - 0,1,2 for the corresponding optimal second-order all-pole predictor. These predictor coefficients are then bandwidth-expanded by a factor of 0.7 (i.e. the;-th coefficient c, is replaced by c s (0.7)0- N e ⁇ processor 30 also performs bandwidth expansion of the 16 th -order all-pole LPC predictor coefficient array a, but this time by a factor of 0.8.
- the shaping filter coefficient array awe is calculated by convolving the two bandwidth-expanded coefficient arrays (2 nd -order and 16 th - order) mentioned above to get a direct-form 18 th -order filter.
- the shaping filter 40 are cascaded with the LPC prediction error filter, as is shown in Figure 1, the two filters effectively form a perceptual weighting filter whose frequency response is roughly the inverse of the desired coding noise spectrum.
- the output of the shaping filter 40 is called the perceptually weighted speech signal sw.
- the zero-input response processor 50 has a shaping filter in it. At the beginning of each 4 ms subframe, it performs shaping filtering by feeding the filter with 4 ms worth of zero input signal. In general, the corresponding output signal vector zir is non-zero because the filter generally has non-zero memory (except during the very first subframe after coder initialization, or 5 when the coder's input signal has been exactly zero since the coder starts up).
- Processor 60 subtracts zir from the weighted speech vector sw ; the resulting signal vector tp is the target vector for closed-loop pitch prediction.
- the pitch period of the LPC prediction residual is determined by the open-loop pitch extractor and interpolator 70 using a modified version of the efficient two-stage search technique discussed in U.S.Patent No. 5,327,520, entitled “Method of Use of Voice Message Coder/Decoder,” and incorporated o by reference as if set forth fully herein.
- Processor 70 first passes the LPC residual through a third-order elliptic lowpass filter to limit the bandwidth to about 700 Hz, and then performs 8:1 decimation of the lowpass filter output.
- the correlation coefficients of the decimated signal are calculated for time lags ranging from 3 to 34, which correspond to time lags of 24 to 272 samples in the undecimated signal domain.
- the allowable range for the pitch period is 1.5 ms to 17 ms, or 59 Hz to 667 Hz in terms of the pitch frequency. This is sufficient to cover the normal pitch range of most speakers, including low-pitched males and high-pitched children.
- the first major peak of the correlation coefficients which has the lowest time lag is identified. This is the first-stage search. Let the resulting time lag be t. This value t is multiplied by 8 to obtain the time lag in the undecimated signal domain. The resulting time lag, 8t, points to the neighborhood where the true pitch period is most likely to lie. To retain the original time resolution in the undecimated signal domain, a second-stage pitch search is conducted in the range of t-4 to t+4. The correlation coefficients of the original undecimated LPC residual, d, are calculated for the time lags of t-4 to t+4 (subject to the lower bound of 24 samples and upper bound of 272 samples).
- the time lag corresponding to the maximum correlation coefficient in this range is then identified as the final pitch period.
- Processor 70 determines the pitch period kpi for each subframe in the following way. If the difference between the extracted pitch period of the current frame and that of the last frame is greater than 20%, the extracted pitch period described above is used for every subframe in the current frame. On the other hand, if this relative pitch change is less than 20%, then the o extracted pitch period is used for the last 3 subframes of the current frame, while the pitch periods of the first 2 subframes are obtained by a linear inte ⁇ olation between the extracted pitch period of the last frame and that of the current frame.
- the closed-loop pitch tap quantizer and pitch predictor 80 performs the following operations subframe-by-subframe: (1) closed-loop quantization of the 3 pitch taps, (2) generation of dh, the pitch-predicted version of the LPC prediction residual d in the current subframe, and (3) generation of hd, the closest match to the target signal tp. 0
- Processor 80 has an internal buffer that stores previous samples of the signal dt, which can be regarded as the quantized version of the LPC prediction residual d. For each subframe, processor 80 uses the pitch period kpi to extract three 64-dimensional vectors from the dt buffer.
- the inner product of the resulting vector with each of the 64 pre-computed and stored 9-dimensional vectors is calculated.
- the vector in the stored table which gives the maximum inner product is the winner, and the three quantized pitch predictor taps are derived from it. Since there are 64 vectors in the stored table, a 6-bit index, IT(m) for the m-th subframe, is sufficient to represent the three quantized pitch predictor taps. Since there are 5 subframes in each frame, a total of 30 bits per frame are used to represent the three pitch taps used for all subframes. These 30 bits are provided to the multiplexer 180 for transmission to the decoder as side information.
- the pitch-predicted version of d is calculated as
- the output signal vector hd is calculated as
- This vector hd is subtracted from the vector tp by the subtracting unit 90.
- the result is tt, the target vector for transform coding.
- the target vector tt is encoded subframe-by-subframe by blocks 100 through 150 using a transform coding approach.
- the shaping filter magnitude response processor 100 calculates the signal mag in the following way. First, it takes the shaping filter coefficient array awe of the last subframe of the current frame, zero-pads it to 64 samples, and then performs a 64-point FFT on the resulting 64-dimensional vector. Then, it calculates the magnitudes of the 33 FFT coefficients which correspond to the frequency range of 0 to 8 kHz.
- the result vector mag is the magnitude response of the shaping filter for the last subframe. To save computation, the mag vectors for the first four subframes are obtained by a linear interpolation between the mag vector of the last subframe of the last frame and that of the last subframe of the current frame.
- the transform processor 110 performs several operations, as described below. It first transforms the 64-dimensional vector tt in the current subframe by using a 64-point FFT. This transform size of 64 samples (or 4 ms) avoids the so-called "pre-echo” distortion well-known in the audio coding art. See Jayant, N. et al., "Signal Compression Based on Models of Human Perception," Proc. IEEE, pp. 1385-1422, October 1993 which is incorporated by reference as if set forth fully herein. Each of the first 33 complex FFT coefficients is then divided by the corresponding element in the mag vector.
- the resulting normalized FFT coefficient vector is partitioned into 3 frequency bands: (1) the low-frequency band consisting of the first 6 normalized FFT coefficients (i.e. from 0 to 1250 Hz), (2) the mid-frequency band consisting of the next 10 normalized FFT coefficients (from 1500 to 3750 Hz), and (3) the high-frequency band consisting of the remaining 17 normalized FFT coefficients (from 4000 to 8000 Hz).
- the total energy in each of the 3 bands are calculated and then converted to dB value, called the log gain of each band.
- the log gain of the 5 low-frequency band is quantized using a 5-bit scalar quantizer designed using the Lloyd algorithm well known in the art.
- the quantized low-frequency log gain is subtracted from the log gains of the mid- and high- frequency bands.
- the resulting level-adjusted mid- and high-frequency log gains are concatenated to form a 2-dimensional vector, which is then quantized by a 7- o bit vector quantizer, with a codebook designed by the generalized Lloyd algorithm, again well-known in the art.
- the quantized low-frequency log gain is then added back to the quantized versions of the level-adjusted mid- and high-frequency log gains to obtain the quantized log gains of the mid- and high-frequency bands.
- all three quantized log gains are converted from the logarithmic (dB) domain back to the linear domain.
- Each of the 33 normalized FFT coefficients (normalized by mag as described above) is then further divided by the corresponding quantized linear gain of the frequency band where the FFT coefficient lies in.
- the result is the final normalized transform coefficient vector tc, which contains 33 complex numbers representing frequencies from 0 to 8000 Hz.
- the coder might be used at one of three different rates: 16, 24 and 32 kb/s. At a sampling rate of 16 kHz, these three target rates translate to 1 , 1.5, and 2 bits/sample, or 64, 96, and 128 bits/subframe, respectively.
- Adaptive Bit Allocation in accordance with the principles of the present invention, adaptive bit allocation is performed to assign these remaining bits to various parts of the frequency spectrum with different quantization accuracy, in order enhance the perceptual quality of the output speech at the TPC decoder. This is done by using a model of human sensitivity to noise in audio signals. Such models are known in the art of perceptual audio coding. See, e.g., Tobias, J.
- Hearing model and quantizer control processor 130 performs adaptive bit allocation and generate an output vector ba which tells the transform coefficient quantizer 120 how many bits should be used to quantize each of the 33 normalized transform coefficients contained in tc. While adaptive bit allocation might be performed once every subframe, the illustrative embodiment of the present invention performs bit allocation once per frame in order to reduce computational complexity. Rather than using the unquantized input signal to derive the noise masking threshold and bit allocation, as is done in conventional music coders, the noise masking threshold and bit allocation of the illustrative embodiment are determined from the frequency response of the quantized LPC synthesis filter (which is often referred to as the "LPC spectrum").
- the LPC spectrum can be considered an approximation of the spectral envelope of the input signal within the 24 ms LPC analysis window.
- the LPC spectrum is determined based on the quantized LPC coefficients.
- the quantized LPC coefficients are provided by the LPC parameter processor 10 to the hearing model and quantizer control processor 130, which determines the LPC spectrum as follows.
- the quantized LPC filter coefficients a are first transformed by a 64-point FFT.
- the power of each of the first 33 FFT coefficients is determined and the reciprocal is then calculated.
- the result is the LPC power spectrum which has the frequency resolution of a 64-point FFT.
- an estimated noise masking threshold, T M is calculated using a modified version of the method described in U.S. Patent No. 5,314,457, which is incorporated by reference as if fully set forth herein.
- Processor 130 scales the 33 samples of LPC power spectrum by a frequency-dependent attenuation function empirically determined from subjective listening experiments. The attenuation function starts at 12 dB for the DC term of the LPC power spectrum, increases to about 15 dB between 700 and 800 Hz, then decreases monotonically toward high frequencies, and finally reduces to 6 dB at 8000 Hz.
- Each of the 33 attenuated LPC power spectrum samples is then used to scale a "basilar membrane spreading function" derived for that particular frequency to calculate the masking threshold.
- a spreading function for a given frequency corresponds to the shape of the masking threshold in response to a single-tone masker signal at that frequency. Equation (5) of Schroeder, et al. describes such spreading functions in terms of the "bark" frequency scale, or critical-band frequency scale is incorporated by reference as if set forth fully herein.
- the scaling process begins with the first 33 frequencies of a 64-point FFT (i.e., 0 Hz, 250 Hz, 500 Hz, . . . , 8000 Hz) being converted to the "bark" frequency scale.
- the corresponding spreading function is sampled at these 33 bark values using equation (5) of Schroeder et al.
- the 33 resulting spreading functions are stored in a table, which may be done as part of an 5 off-line process.
- each of the 33 spreading functions is multiplied by the corresponding sample value of the attenuated LPC power spectrum, and the resulting 33 scaled spreading functions are summed together. The result is the estimated masking threshold function. It should be noted that this technique for estimating the o masking threshold is not the only technique available.
- processor 130 uses a "greedy” algorithm to perform adaptive bit allocation.
- the technique is “greedy” in the sense that it allocates one bit at a time to the most "needy" frequency component without regard to its potential influence on future bit allocation.
- the LPC power spectrum is assumed to be the power spectrum of the coding noise.
- the noise loudness at each of the 33 frequencies of a 64-point FFT is estimated using the masking threshold calculated above and a simplified version of the noise loudness calculation method in Schroeder et al.
- the simplified noise loudness at each of the 33 frequencies is calculated as follows. First, the critical bandwidth Bj at the i-th frequency is calculated using linear inte ⁇ olation of the critical bandwidth listed in table 1 of Scharf s book chapter in Tobias. The result is the approximated value of the term df/dx in equation (3) of Schroeder et al.
- the 33 critical bandwidth values are pre-computed and stored in a table. Then, for the i-th frequency, the noise power N f is compared with the masking threshold Mj. If N, ⁇ M
- Sj is the sample value of the LPC power spectrum at the /-th frequency.
- the frequency with the maximum noise loudness is identified and one bit is assigned to this frequency.
- the noise power at this frequency is then reduced by a factor which is empirically determined from the signal-to-noise ratio (SNR) obtained during the design of the VQ codebook for quantizing the normalized FFT coefficients. (Illustrative values for the reduction factor are between 4 and 5 dB).
- SNR signal-to-noise ratio
- the noise loudness at this frequency is then updated using the reduced noise power.
- the maximum is again identified from the updated noise loudness array, and one bit is assign to the corresponding frequency. This process continues until all available bits are exhausted. For the 32 and 24 kb/s TPC coder, each of the 33 frequencies can receive bits during adaptive bit allocation.
- the coder assigns bits only to the frequency range of 0 to 4 kHz (i.e., the first 16 FFT coefficients) and synthesizes the residual FFT coefficients in the higher frequency band of 4 to 8 kHz using the high-frequency synthesis processor 140.
- the TPC decoder can locally duplicate the encoder's adaptive bit allocation operation to obtain such bit allocation information.
- the transform coefficient quantizer 120 quantizes the transform coefficients contained in tc using the bit allocation signal ba.
- the DC term of the FFT is a real number, and it is scalar quantized if it ever receives any bit during bit allocation.
- the maximum number of bits it can receive is 4.
- a conventional two-dimensional vector quantizer is used to quantize the real and imaginary parts jointly.
- the maximum number of bits for this 2-dimension VQ is 6 bits.
- a conventional 4-dimensional vector quantizer is used to jointly quantize the real and imaginary parts of two adjacent FFT coefficients.
- the resulting VQ codebook index array IC contains the main information of the TPC encoder. This index array IC is provided to the multiplexer 180, where it is combined with side information bits. The result is the final bit-stream, which is transmitted through a communication channel to the TPC decoder.
- the transform coefficient quantizer 120 also decodes the quantized values of the normalized transform coefficients. It then restores the original o gain levels of these transform coefficients by multiplying each of these coefficients by the corresponding elements of mag and the quantized linear gain of the corresponding frequency band. The result is the output vector dtc.
- the hearing model quantizer control processor 130 first calculates the ratio between the LPC power spectrum and the masking threshold, or the signal- to-masking-threshold ratio (SMR), for the frequencies in the 4 to 7 kHz band.
- SMR signal- to-masking-threshold ratio
- the 17 th through the 29 th FFT coefficients (4 to 7 kHz) are synthesized using phases which are random and magnitude values that are controlled by the SMR.
- the magnitude of the FFT coefficients is set to the quantized linear gain of the high-frequency band.
- the magnitude is 2 dB below the quantized linear gain of the high-frequency band. From the 30 th through the 33" 1 FFT coefficients, the magnitude ramps down from 2 dB to 30 dB below the quantized linear gain of the high-frequency band, and the phase is again random.
- bit allocation is performed for the entire frequency band as described. However, some frequencies in the 4 to 8 kHz band may still receive no bits. In this case, the high-frequency synthesis and noise fill-in procedure described above is applied only to those frequencies receiving no bits.
- the resulting output vector qtc contains the quantized version of the transform coefficients before normalization.
- the inverse transform processor 150 performs the inverse FFT on the 64-element complex vector represented by the half-size 33-element vector qtc. This results in an output vector qtt, which is the quantized version of tt, the time-domain target vector for transform coding.
- the inverse shaping filter 160 which is an all-zero filter having awe as its coefficient array, filters the 5 vector qtt to produce an output vector et.
- the adder 170 then adds dh to et to obtain the quantized LPC prediction residual dt.
- This dt vector is then used to update the internal storage buffer in the closed-loop pitch tap quantizer and pitch predictor 80. It is also used to excite the internal shaping filter inside the zero-input response processor 50 in order to establish the correct filter i o memory in preparation for the zero-input response generation for the next subframe.
- FIG. 2 An illustrative decoder embodiment of the present invention is shown 15 in Figure 2.
- the demultiplexer 200 separates all main and side information components from the received bit-stream.
- the main information the transform coefficient index array IC, is provided to the transform coefficient decoder 235.
- adaptive bit allocation must be performed to determine how many of the main 20 information bits are associated with each quantized transform coefficient.
- the first step in adaptive bit allocation is the generation of quantized LPC coefficients (upon which allocation depends).
- the demultiplexer 200 provides the seven LSP codebook indices IL(1) to IL(7) to the LPC parameter decoder 215, which performs table look-up from the 7 LSP VQ codebooks to 25 obtain the 16 quantized LSP coefficients.
- the LPC parameter decoder 215 then performs the same sorting, interpolation, and LSP-to-LPC coefficient conversion operations as in blocks 345, 350, and 355 in Figure 3.
- the hearing model quantizer control processor 220 determines the bit allocation (based on the quantized 30 LPC parameters) for each FFT coefficient in the same way as processor 130 in the TPC encoder ( Figure 1).
- the shaping filter coefficient processor 225 and the shaping filter magnitude response processor 230 are also replicas of the corresponding processors 30 and 100, respectively, in the TPC encoder.
- Processor 230 produces mag, the magnitude response of the shaping filter, for use by the transform coefficient decoder 235.
- the transform coefficient decoder 235 can then correctly decode the main information and obtain the quantized versions of the normalized transform coefficients.
- the decoder 235 also decodes the gains using the gain index array IG. For each subframe, there are two gain indices (5 and 7 bits), which are decoded into the quantized log gain of the low-frequency band and the quantized versions of the level-adjusted log gains of the mid-and high-frequency log gains. The quantized low-frequency log gain is then added back to the quantized versions of the level-adjusted mid- and high-frequency log gains to obtain the quantized log gains of the mid- and high-frequency bands.
- All three quantized log gains are then converted from the logarithmic (dB) domain back to the linear domain.
- Each of the three quantized linear gains is used to multiply the quantized versions of the normalized transform coefficients in the corresponding frequency band.
- Each of the resulting 33 gain-scaled, quantized transform coefficients is then further multiplied by the corresponding element in shaping filter magnitude response array mag. After these two stages of scaling, the result is the decoded transform coefficient array dtc.
- the high-frequency synthesis processor 240, inverse transform processor 245, and the inverse shaping filter 250 are again exact replicas of the corresponding blocks (140, 150, and 160) in the TPC encoder. Together they perform high-frequency synthesis, noise fill-in, inverse transformation, and inverse shaping filtering to produce the quantized excitation vector et.
- the pitch decoder and interpolator 205 decodes the 8-bit pitch index IP to get the pitch period for the last 3 subframes, and then interpolate the pitch period for the first two subframes in the same way as is done in the corresponding block 70 of the TPC encoder.
- the pitch tap decoder and pitch predictor 210 decodes the pitch tap index IT ior each subframe to get the three quantized pitch predictor taps b lk ,b 2k , and b 3k . It then uses the interpolated pitch period kpi to extract the same three vectors x ] ,x 2 , and x 3 as described in the encoder section. (These three vectors are respectively kpi - 1, kpi, and kpi + 1 samples earlier than the current frame of dt.) Next, it computes the pitch-predicted version of the LPC residual as
- the adder 255 adds dh and et to get dt, the quantized version of the LPC prediction residual d.
- This dt vector is fed back to the pitch predictor inside block 210 to update its internal storage buffer for dt (the filter memory of the pitch predictor).
- the long-term postfilter 260 is basically similar to the long-term postfilter used in the ITU-T G.728 standard 16 kb/s Low-Delay CELP coder.
- the main difference is that it uses ⁇ b lk , the sum of the three quantized pitch i-l taps, as the voicing indicator, and that the scaling factor for the long-term postfilter coefficient is 0.4 rather than 0.15 as in G.728. If this voicing indicator is less than 0.5, the postfiltering operation is skipped, and the output vector fdt is identical to the input vector dt. If this indicator is 0.5 or more, the postfiltering operation is carried out.
- the LPC synthesis filter 265 is the standard LPC filter — an all-pole, direct-form filter with the quantized LPC coefficient array a. It filters the signal fdt and produces the long-term postfiltered, quantized speech vector st.
- This st vector is passed through the short-term postfilter 270 to produce the final TPC decoder output speech signal fst.
- this short-term postfilter 270 is very similar to the short-term postfilter used in G.728. The only differences are the following. First, the pole-controlling factor, the zero-controlling factor, and the spectral-tilt controlling factor are 0.7, 0.55, and 0.4, respectively, rather than the corresponding values of 0.75, 0.65, and 0.15 in G.728. Second, the coefficient of the first-order spectral-tilt compensation filter is linearly interpolated sample-by-sample between frames. This helps to avoid occasionally audible clicks due to discontinuity at frame boundaries.
- the long-term and short-term postfilters have the effect of reducing the perceived level of coding noise in the output signal fst, thus enhancing the speech quality.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US1229696P | 1996-02-26 | 1996-02-26 | |
US12296P | 1996-02-26 | ||
PCT/US1997/002898 WO1997031367A1 (en) | 1996-02-26 | 1997-02-26 | Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models |
Publications (2)
Publication Number | Publication Date |
---|---|
EP0954851A4 EP0954851A4 (de) | 1999-11-10 |
EP0954851A1 true EP0954851A1 (de) | 1999-11-10 |
Family
ID=21754300
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP97907830A Withdrawn EP0954851A1 (de) | 1996-02-26 | 1997-02-26 | Mehrstufiger sprachkodierer mit transformationskodierung von prädiktionsresiduen mittels quantisierung anhand auditiver modelle |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP0954851A1 (de) |
JP (1) | JPH11504733A (de) |
CA (1) | CA2219358A1 (de) |
MX (1) | MX9708203A (de) |
WO (1) | WO1997031367A1 (de) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6397178B1 (en) | 1998-09-18 | 2002-05-28 | Conexant Systems, Inc. | Data organizational scheme for enhanced selection of gain parameters for speech coding |
US6778953B1 (en) * | 2000-06-02 | 2004-08-17 | Agere Systems Inc. | Method and apparatus for representing masked thresholds in a perceptual audio coder |
CN1244904C (zh) * | 2001-05-08 | 2006-03-08 | 皇家菲利浦电子有限公司 | 声频信号编码方法和设备 |
EP1672618B1 (de) * | 2003-10-07 | 2010-12-15 | Panasonic Corporation | Verfahren zur entscheidung der zeitgrenze zur codierung der spektro-hülle und frequenzauflösung |
DE102006022346B4 (de) * | 2006-05-12 | 2008-02-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Informationssignalcodierung |
WO2012000882A1 (en) | 2010-07-02 | 2012-01-05 | Dolby International Ab | Selective bass post filter |
WO2012161675A1 (en) * | 2011-05-20 | 2012-11-29 | Google Inc. | Redundant coding unit for audio codec |
EP2772911B1 (de) * | 2011-10-24 | 2017-12-20 | LG Electronics Inc. | Verfahren und vorrichtung zur quantisierung von sprachsignalen in einer bandselektiven weise |
CN111862995A (zh) * | 2020-06-22 | 2020-10-30 | 北京达佳互联信息技术有限公司 | 一种码率确定模型训练方法、码率确定方法及装置 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5012517A (en) * | 1989-04-18 | 1991-04-30 | Pacific Communication Science, Inc. | Adaptive transform coder having long term predictor |
FR2700632B1 (fr) * | 1993-01-21 | 1995-03-24 | France Telecom | Système de codage-décodage prédictif d'un signal numérique de parole par transformée adaptative à codes imbriqués. |
-
1997
- 1997-02-26 WO PCT/US1997/002898 patent/WO1997031367A1/en not_active Application Discontinuation
- 1997-02-26 JP JP9530382A patent/JPH11504733A/ja active Pending
- 1997-02-26 MX MX9708203A patent/MX9708203A/es unknown
- 1997-02-26 EP EP97907830A patent/EP0954851A1/de not_active Withdrawn
- 1997-02-26 CA CA 2219358 patent/CA2219358A1/en not_active Abandoned
Non-Patent Citations (2)
Title |
---|
No further relevant documents disclosed * |
See also references of WO9731367A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO1997031367A1 (en) | 1997-08-28 |
EP0954851A4 (de) | 1999-11-10 |
JPH11504733A (ja) | 1999-04-27 |
MX9708203A (es) | 1997-12-31 |
CA2219358A1 (en) | 1997-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5790759A (en) | Perceptual noise masking measure based on synthesis filter frequency response | |
EP0764941B1 (de) | Quantisierung von Sprachsignalen in prädiktiven Kodiersystemen unter Verwendung von Modellen menschlichen Hörens | |
EP0764939B1 (de) | Synthese von Sprachsignalen in Abwesenheit kodierter Parameter | |
RU2262748C2 (ru) | Многорежимное устройство кодирования | |
US6735567B2 (en) | Encoding and decoding speech signals variably based on signal classification | |
US6574593B1 (en) | Codebook tables for encoding and decoding | |
US6581032B1 (en) | Bitstream protocol for transmission of encoded voice signals | |
JP4662673B2 (ja) | 広帯域音声及びオーディオ信号復号器における利得平滑化 | |
EP0503684B1 (de) | Verfahren zur adaptiven Filterung von Sprach- und Audiosignalen | |
EP0465057B1 (de) | 32 Kb/s codeangeregte prädiktive Codierung mit niedrigen Verzögerung für Breitband-Sprachsignal | |
US5307441A (en) | Wear-toll quality 4.8 kbps speech codec | |
JP3490685B2 (ja) | 広帯域信号の符号化における適応帯域ピッチ探索のための方法および装置 | |
US6098036A (en) | Speech coding system and method including spectral formant enhancer | |
US5699382A (en) | Method for noise weighting filtering | |
MXPA96004161A (en) | Quantification of speech signals using human auiditive models in predict encoding systems | |
EP1214706B1 (de) | Multimodaler sprachkodierer | |
KR20030046451A (ko) | 음성 코딩을 위한 코드북 구조 및 탐색 방법 | |
WO1997031367A1 (en) | Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models | |
JPH01261930A (ja) | 音声復号器のポスト雑音整形フィルタ | |
CA2303711C (en) | Method for noise weighting filtering | |
GB2352949A (en) | Speech coder for communications unit | |
AU2757602A (en) | Multimode speech encoder |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19971127 |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 19980519 |
|
AK | Designated contracting states |
Kind code of ref document: A4 Designated state(s): DE FR GB Kind code of ref document: A1 Designated state(s): DE FR GB |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20000901 |