WO1996027869A1 - Voice-band compression system - Google Patents
Voice-band compression system Download PDFInfo
- Publication number
- WO1996027869A1 WO1996027869A1 PCT/CA1996/000127 CA9600127W WO9627869A1 WO 1996027869 A1 WO1996027869 A1 WO 1996027869A1 CA 9600127 W CA9600127 W CA 9600127W WO 9627869 A1 WO9627869 A1 WO 9627869A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- band
- signal
- voice
- bands
- compression apparatus
- Prior art date
Links
- 230000006835 compression Effects 0.000 title claims abstract description 32
- 238000007906 compression Methods 0.000 title claims abstract description 32
- 230000000873 masking effect Effects 0.000 claims description 29
- 238000000034 method Methods 0.000 claims description 14
- 238000001228 spectrum Methods 0.000 claims description 14
- 238000013139 quantization Methods 0.000 claims description 13
- 238000004458 analytical method Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 3
- 230000001419 dependent effect Effects 0.000 claims 1
- 238000012545 processing Methods 0.000 abstract description 9
- 230000001934 delay Effects 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 29
- 238000005481 NMR spectroscopy Methods 0.000 description 17
- 238000000354 decomposition reaction Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 101150016601 INP2 gene Proteins 0.000 description 3
- 238000005056 compaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000010339 dilation Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000011888 foil Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
Definitions
- This invention relates to a telephone voice-band compression system.
- Digital telephone systems employ voice compression in order to make the most effective use of available bandwidth.
- Most common telephone voice-band compression systems rely on modeling of the vocal tract to eliminate redundant information.
- the psycho-acoustic approach relies on modeling the hearing mechanism to eliminate the redundant non-audible information.
- the psycho-acoustic based voice compression system makes use of the fact that the human hearing mechanism is incapable of perceiving some sounds in the presence of others. This phenomenon is called the masking effect and can be predominantly analyzed in the frequency domain.
- a voice compression apparatus comprising means for decomposing a voice signal into a bank of narrow frequency band signals; an estimator for estimating the instantaneous energy in each band; a comparator for comparing the energy in each band to of all other bands to estimate the audibility threshold of the signal in each band; and means for independently quantizing the signal in each band on the basis of the estimated audibility threshold.
- a final number coding stage can be used to further compress the voice signal.
- the present system uses an optimally selected short time analysis window based on each critical band.
- the novel system guarantees a low processing delay combined with low computational requirements. Also, because the compressed signal is composed of frequency independent information, the new system un-couples the information to be propagated, making it more adapted to packet transport techniques.
- the invention also provides a method of compressing a voice signal comprising the steps of decomposing the voice signal into a bank of narrow frequency band signals, estimating the instantaneous energy in each band, comparing the energy in each band to of all other bands to estimate the audibility threshold of the signal in each band, and independently quantizing the signal in each band on the basis of the estimated audibility threshold. More generally, and in its broadest aspect the invention also provides a voice compression scheme based on human auditory response characteristics as opposed to the vocal tract characteristics used in prior art compression schemes.
- Figure 1 shows an analysis/synthesis filter bank for wavelet transform
- Figure 2 shows the time-scale plane for wavelet transform showing multi-resolution representation
- Figure 3 shows a time-frequency grid for short-time Fourier transform showing same resolution at all frequencies/time
- Figure 4 shows masking threshold as a function of frequency
- Figure 5 shows the actual bands used in signal decomposition
- Figure 6 is a block diagram of an encoder for compressing a voice signal in accordance with the invention.
- FIG. 7 is a detailed diagram of an encoder for a single band. In order to assist in the understanding of the invention, a brief discussion of the underlying theory will be presented.
- Spectral analysis has long been based on Fourier analysis or more specifically the short-time Discrete Fourier Transform.
- a finite length segment X_(n) of a signal x(n) is defined, by multiplying the signal with a pre-selected window w(n) .
- the frequency content of this windowed signal is then determined using the DFT.
- the segment x ⁇ (n) is expressed as the weighted sum of complex exponentials, the weights being the coefficients of the discrete Fourier transform of that segment. Those coefficients indicate the presence or absence of specific sinusoids and their relative magnitude. However, the frequencies of the sinusoids used in the expansion are equally spaced across the frequency range providing the same frequency resolution at all frequencies. The higher the N, i.e. the longer the window in the time domain, the more points are calculated in the frequency domain, i.e. higher the frequency resolution of the DFT.
- the resolution in the time domain is also determined by the length of the window. The longer the window, the longer the effective observation period and the lower the time resolution. Even though the frequencies present in the signals with a higher resolution (more DFT points) would be known, it would not be possible to determine the actual time of their presence with lower resolution (since the observation period considered is longer) .
- variable length windows are needed: ones with shorter effective time-support for the high frequency components of the signal and others of longer effective time support to analyze the lower frequency components of the signal.
- the choice of the type (function) of these windows has to be made very carefully for the transform to be meaningful and invertible.
- the wavelet transform provides a trade-off between the time resolution and the frequency resolution by varying the length of the window.
- the duration of the window is selected shorter for higher time /lower frequency resolution for the higher frequency components, longer for higher frequency/lower time resolution for the lower frequency components for a pre-selected window function.
- the discrete wavelet transform is basically an expansion of a given signal in terms of a set of (almost) orthogonal basis functions.
- the signal is then expressed as the weighted sum of those functions, the weights being the coefficients of the wavelet transform.
- This provides the reconstruction equation for reproducing x(n) from its wavelet transform coefficients.
- the coefficients themselves can be computed as the inner product of the signal and each of the basis functions individually, up to this point, this is the same as any other signal expansion. It is the conditions on the basis functions (the wavelets) that make the wavelet transform different from other transforms.
- the wavelet transform uses a set of basis functions ⁇ j _ ⁇ (t) that are dilates (expanded versions) and translates of a mother wavelet ⁇ (n).
- the mother wavelet ⁇ (t) the wavelets are generated as ⁇ - j ⁇ ( ⁇ Jt-k).
- the scaling by 23 provides the dilation and the shift by k provides the translation.
- the different wavelets are identical in nature but have effective support in the time domain that depends on the 'scale' parameter j and a position that is a function of the scale j as well as of the translation parameter k .
- the effective time support of the wavelet is reduced and more shifted wavelets are used to cover the duration of the signal. This provides higher capacity for representing finer details for larger j .
- the opposite is true for lower j .
- the wavelets are orthogonal to each other so that together they span the whole signal space.
- the wavelets have to be chosen so that the wavelet transform or expansion provides information that can be directly related to the original signal and that it is invertible.
- a function f(t) may be expressed in terms of translates (shifted copies) of ⁇ (t) :
- j is a measure of the compression/expansion and k specifies the function's location in time.
- the wavelet transform can be recast in basic DSP (Digital Signal Processing) notation as shown in Figure 1, which goes from scale j+1 to scale j.
- DSP Digital Signal Processing
- the signal is basically split into high and low frequency components in splitters 1, 2.
- the decimated high frequency output 3 (detail) is then maintained as is while the decimated low frequency output 4 is split again in splitters 5, 6. This process is continued in further splitters 7, 8 as far down as needed.
- the first highpass filter output has the frequency range ⁇ /2- ⁇ .
- the second highpass filter output has the frequencies ⁇ /4- ⁇ /2 while the next highpass filter will have the frequencies ⁇ /8- ⁇ /4 and so on.
- the filters have a bandwidth that is a constant fraction of the filter center frequency (constant Q bandwidth) which basically provides narrow bandwidth ( higher freq.
- Figure 2 shows the corresponding time-scale division.
- T the original sampling period
- the c the output of the first highpass filter/decimator
- the second highpass filter/decimator output has N/4 samples separated by 4T and so on. Fewer points and higher sampling period are used to represent the lower frequencies. In the limit, the zero frequency is represented by just one point. The higher the frequency, the higher time resolution needed. This is exactly what we needed.
- This is shown in Figure 2 as grid points. This is generally referred as tiling of the time-scale plane.
- a typical short-time Fourier transform would result in uniform tiling with the same resolution maintained across time axis and frequency axis as shown in Figure 3.
- the wavelet transform is a smart way of splitting a signal into frequency bands on a logarithmic frequency scale using constant-Q filters (bandwidth proportional to center frequency) to provide higher frequency resolution and lower time resolution at the lower frequencies while providing lower frequency resolution and higher time resolution at the higher frequencies.
- the crucial point is the choice of the filter used (i.e. the wavelet function) to ensure invertability (ability to reconstruct the original signal) and optimality in some sense.
- the wavelet is chosen to provide the best energy compaction possible, i.e. requiring the fewest transform coefficients to represent a given signal.
- the Daubechies wavelet is optimized for just that.
- the coefficients for these Daubechies filters are generated using the Matlab program in Appendix A.
- the signal can be decomposed into other bands, not necessarily on a logarithmic scale, as needed.
- the logarithmic scale decomposition is obtained using a dyadic tree where only the low frequency band is repeatedly split .
- Basic sub-band coding would result if both high and low bands were split.
- Other trees result when different combinations of the tree branches are divided into smaller bands.
- the actual number of bands per branch is also a variable resulting in a general M- band decomposition rather than the basic two-band case in the dyadic tree.
- Audio compression is traditionally a very different field from speech compression due to the very wide range of possible sources of signals: speech as well as all possible music instruments.
- audio compression cannot directly build on the achievements of speech processing where in many cases the compression was based on some form of modeling the speech production system and utilizing the associated redundancies. Rather than model the source of the signal production, efforts in audio compression were directed at modeling the receiver of the music signal, the human ear.
- Psychoacoustic principles have been extensively applied to identify what the ear can and cannot hear.
- the wavelet Transform came in as an almost custom-made representation for audio coding since it provides the information in a form that directly emulates the way the ear hears and as such provides a very compatible representation. It should be noted that the quality required for music signals is significantly higher than the traditional toll quality required for speech transmission over telephone lines.
- CD quality audio signals have a bandwidth of «20 kHz and are sampled at 44Khz, 16 bits/sample. This results in an uncompressed bit rate of 702 Kb/s. It has been shown that using wavelets and psychoacoustic based bit allocation algorithms, a factor of over 10 compression can be achieved enabling transmission at the standard 64kb/s telephone rate. One thing that helped achieve such compression ratios was the fact that higher delays and considerable transmitter complexity can be afforded compared to what is possible with two-way speech communication.
- the fundamentals idea of psychoacoustics is that the ear has very definite masking abilities. A signal cannot be heard unless its amplitude exceeds a certain hearing threshold. This threshold specifies the absolute hearing level of the ear. However, the actual audibility threshold at a given frequency can increase depending on other signals present at neighbouring frequencies. As an example, a given tone at one frequency f 0 can effectively mask another tone at f ⁇ unless its magnitude exceeds a threshold as shown in Figure 4.
- the complex ability of the ear to mask certain frequencies in presence of others leads to a 20 db SNR (Signal-to-Noise ratio) resulting in near transparent coding for perceptually shaped noise (noise placed where it is masked by the ear) while more than 60 db SNR would be required for additive white noise.
- SNR Signal-to-Noise ratio
- transparency is achieved for a lKhz tone with white noise at 90 db SNR while the same transparency is achievable with 25 db SNR and a psychoacoustically shaped noise.
- the masking threshold in Figure 4 is calculated on a bark scale which is effectively a log frequency scale for higher frequencies, i.e. the same scale as provided by the dyadic tree structure wavelet transform.
- thresholds exist for tones masking noise as well as for noise masking tone/noise. Since normally the signal is neither pure tone nor pure noise, some measure of tonality of the signal is used to determine a compromise value for the threshold between those provided by the two extremes. This masking threshold has to be determined for each signal segment. To do that, the signal spectrum is determined and transformed to the bark scale. The masking threshold due to the individual signal component in each of the bands on that scale is then computed. It is assumed that all those masking thresholds add up to provide the total masking curve for that segment.
- the number of bits used to represent each transform coefficient is determined for a specified overall bit rate.
- the bit allocation algorithm is a dynamic one that has to be updated periodically based on the spectral content of the current signal segment. This information has to be transmitted to the receiver specifying the number of bits used to represent the coefficient in each of the bark bands to allow for reconstruction on the other side. It can be seen that the transmitter is significantly more complex since it requires the computation of the spectrum on the bark scale, the masking threshold as well as the bit allocation algorithm while the receiver simply reconstructs the signal from the various frequency bands.
- the quantized wavelet coefficients along with the number of bits used /coefficient are then transmitted.
- the inverse wavelet transform is implemented to reconstruct the speech.
- the masking threshold is obtained directly from the wavelet transform. This eliminates the need for the DFT-based spectrum computation performed (in parallel with the wavelet transform) and the associated translation of the regular spectrum into the bark scale spectrum needed for determination of the masking threshold. Quantization on the logarithm of the coefficient improves the quality of the signal for the same number of bits, since the ear hears on a logarithmic magnitude scale. In regular CODECS for PCM, 8-bit ⁇ -law quantization is effectively equivalent to 12-bit linear quantization. At least the same improved performance is expected in addition to the ear's logarithmic hearing property.
- the code to decompose and recombine a signal into bark bands and to reconstruct is given in appendix B.
- the input is one sentence from a voice file which has to be loaded prior to running the program.
- the program works on the file spll.lin (" Tom's birthday is in June” ; male speaker) .
- Figure 5 gives the actual bands used in the decomposition
- Appendix C gives details of the filters used for decomposition and reconstruction of the input signal.
- the bit allocation algorithm is crucial in obtaining the best quality signal by assigning the minimum possible number of bits for the coefficients in one frequency band such that the resulting noise in that band is masked by the signal.
- the number of bits assigned to each frequency band is determined so as to force the quantization noise/masked power ratio to be roughly the same for all bands. This measure is defined as:
- Noise to mask ratio a measure of noise power/masked power in a given band
- p ⁇ is the peak value
- ⁇ m#i is the masked power
- b is the number of bits assigned to the i-th band.
- a voice signal A to be compressed is applied to a frequency band splitter 20, which splits the signal into a series of narrow bands C.
- Instantaneous energy estimator 21 receives at its input the narrow band signals C and for each one outputs an estimate E of the energy content.
- the estimates E of the energy content are then applied to the respective inputs of the perceptual masking estimators 22, which in turn output signals to the bit allocator 23, which outputs a compressed voice signal G.
- each narrow band signal C is output to instantaneous energy estimator 21 whose output is applied to perceptual comparator 25 along with the energy estimates from the other bands.
- This in turn controls quantizer 26, which is input to number coder 27 that produces the coded, compressed output signal G.
- the decoder (not shown) recovers from the number coded signal the quantized signal of each band and regenerates a perceptually close representation of the original voice signal.
- the described system is particularly suitable for use in any voice transport system, such as modern telephones, where the voice signal is carried over a digital link.
- APPENDIX A MATLAB CODE FOR DETERMINATION OF DAUBECHIES
- %3-band and 5-band filters are binomial filters used for simplicity. Others are Daubechies. % Filter order or type is not optimized. Block processing is performed.
- % stage 1 0-2k; 2-4k;
- % stage 2 0-1, 1-2; 2-3,
- % stage 3 0-.5,.5-l; 1-1.5,1.5-2; 2-2.333,2.333-2.6667,2.667-3; 3-3.5,3.5-4; % stage 4 :0-.l, .1-.2, .2-.3, .3-.4, .4-.5;
- Ndb 16 .length of Daub filter for half-band lp/hp
- N 2 ⁇ 14 %number of data points (one full sentence)
- INP2 spll(1:16384) ;
- % stage 5 i/p is y51-y52 .75-1
- x45 intfilt(hlps,y51,N/32)+intfilt(hhps,y52,N/32)
- stage 5 i/p is y53-y54 .5-.75
- x46 intfilt(hhps,y53,N/32)+intfilt(hlps,y54,N/32)
- stage 4 i/p is y41-y42 1.5-2K
- x36 intfilt(hlps,y41,N/16)+intfilt(hhps,y42,N/16);% 1.75-2 , %1.5-1.75 % stage 4 i/p is
- the analysis system implements the filters needed to split the signal into the bark bands of Figure 5 using the arrangement shown in Figure 1. However, it is to be noted that when splitting the high frequency bands, the locations of the highpass and lowpass filters are reversed to maintain the outputs as desired after.
- the specific filters are as follows:
- Lowpass and highpass filters for two band splits are all 12-th order Daubechies filters as generated by the function daub.
- 3- band splits are binomial filters as generated by the function band 3 binomial (Normalized) .
- the lowpass and highpass filters are both symmetric thus unaffected by coefficient reversal but the bandpass is.
- hb3bpa is used for the analysis filter while hb3bp is used for the synthesis one.
- hb31p [.25 .5 .25];
- hb3hp [.25 -.5 .25];
- hb3bp [.3536 0 -.3536];
- 5-band splits use binomial filters as generated by the function band 5 binomial (normalized) .
- hb [l 2 0 -2 -l]/8;
- hba [-l -2 0 2 l]/8;
- hc [.1531 0 -.3062 .1531];
- hca [.1531-.3062 0 .1531];
- hd [l -2 0 2 -l]/8;
- hda [-l 2 0 -2 l]/8;
- he [l -4 6 -4 1]/16;
- offset(i) alpha(14.5+i)+5.5(l-alpha) .
- %%%%%%%%%%%%%%%%%quantization % Sigma is defined as the sqrt masked power (linear)
- %masked power here is calculated based on Johnston. Need to verify or to use
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU47806/96A AU4780696A (en) | 1995-03-04 | 1996-03-01 | Voice-band compression system |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9504377A GB9504377D0 (en) | 1995-03-04 | 1995-03-04 | Voice-band compression system |
GB9504377.4 | 1995-03-04 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1996027869A1 true WO1996027869A1 (en) | 1996-09-12 |
Family
ID=10770654
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA1996/000127 WO1996027869A1 (en) | 1995-03-04 | 1996-03-01 | Voice-band compression system |
Country Status (4)
Country | Link |
---|---|
AU (1) | AU4780696A (en) |
CA (1) | CA2211402A1 (en) |
GB (1) | GB9504377D0 (en) |
WO (1) | WO1996027869A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2648567A1 (en) * | 1989-05-24 | 1990-12-21 | Inst Nat Sante Rech Med | Method for the digital processing of a signal by reversible transformation into wavelets |
US5388182A (en) * | 1993-02-16 | 1995-02-07 | Prometheus, Inc. | Nonlinear method and apparatus for coding and decoding acoustic signals with data compression and noise suppression using cochlear filters, wavelet analysis, and irregular sampling reconstruction |
-
1995
- 1995-03-04 GB GB9504377A patent/GB9504377D0/en active Pending
-
1996
- 1996-03-01 AU AU47806/96A patent/AU4780696A/en not_active Abandoned
- 1996-03-01 CA CA 2211402 patent/CA2211402A1/en not_active Abandoned
- 1996-03-01 WO PCT/CA1996/000127 patent/WO1996027869A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2648567A1 (en) * | 1989-05-24 | 1990-12-21 | Inst Nat Sante Rech Med | Method for the digital processing of a signal by reversible transformation into wavelets |
US5388182A (en) * | 1993-02-16 | 1995-02-07 | Prometheus, Inc. | Nonlinear method and apparatus for coding and decoding acoustic signals with data compression and noise suppression using cochlear filters, wavelet analysis, and irregular sampling reconstruction |
Non-Patent Citations (3)
Title |
---|
D'ALESSANDRO C ET AL: "TRANSFORMATION EN ONDELETTES SUR UNE ECHELLE FREQUENTIELLE AUDITIVE", 16 September 1991, COLLOQUE SUR LE TRAITEMENT DU SIGNAL ET DES IMAGES, JUAN LES PINS, SEPT. 16 - 20, 1991, VOL. VOL. 2, NR. COLLOQUE 13, PAGE(S) 745 - 748, GRETSI, XP000242883 * |
MOHAN VISHWANATH: "THE RECURSIVE PYRAMID ALGORITHM FOR THE DISCRETE WAVELET TRANSFORM", IEEE TRANSACTIONS ON SIGNAL PROCESSING, vol. 42, no. 3, 1 March 1994 (1994-03-01), pages 673 - 676, XP000450724 * |
SEN D ET AL: "USE OF AN AUDITORY MODEL TO IMPROVE SPEECH CODERS", 27 April 1993, SPEECH PROCESSING, MINNEAPOLIS, APR. 27 - 30, 1993, VOL. 2 OF 5, PAGE(S) II-411 - 414, INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, XP000427813 * |
Also Published As
Publication number | Publication date |
---|---|
AU4780696A (en) | 1996-09-23 |
CA2211402A1 (en) | 1996-09-12 |
GB9504377D0 (en) | 1995-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1377966B1 (en) | Audio compression | |
Johnston | Transform coding of audio signals using perceptual noise criteria | |
US6680972B1 (en) | Source coding enhancement using spectral-band replication | |
CN1838239B (en) | Apparatus for enhancing audio source decoder and method thereof | |
US6240380B1 (en) | System and method for partially whitening and quantizing weighting functions of audio signals | |
CN100361405C (en) | Scalable audio coder and decoder | |
Sinha et al. | Audio compression at low bit rates using a signal adaptive switched filterbank | |
JP3277682B2 (en) | Information encoding method and apparatus, information decoding method and apparatus, and information recording medium and information transmission method | |
JPH07273657A (en) | Information coding method and device, information decoding method and device, and information transmission method and information recording medium | |
JPH1084284A (en) | Signal reproducing method and device | |
EP1249837A2 (en) | A method for decompressing a compressed audio signal | |
Purat et al. | Audio coding with a dynamic wavelet packet decomposition based on frequency-varying modulated lapped transforms | |
CN100349209C (en) | Perceptual normalization of digital audio signals | |
Krasner | Digital encoding of speech and audio signals based on the perceptual requirements of the auditory system | |
WO2005096508A1 (en) | Enhanced audio encoding and decoding equipment, method thereof | |
WO1996027869A1 (en) | Voice-band compression system | |
Sinha et al. | Low bit rate transparent audio compression using a dynamic dictionary and optimized wavelets | |
JP3478267B2 (en) | Digital audio signal compression method and compression apparatus | |
Kanade et al. | A Literature survey on Psychoacoustic models and Wavelets in Audio compression | |
JP3134383B2 (en) | Method and apparatus for highly efficient encoding of digital data | |
Sharma et al. | A novel hybrid DWPT and MDCT based coding technique for sounds of musical instruments | |
JP3134384B2 (en) | Encoding device and method | |
Lai et al. | ENEE624 Advanced Digital Signal Processing: Filter Bank Design and Subband Coding for Digital Audio | |
Malvar | Extended cosine bases and applications to audio coding | |
JPH07273656A (en) | Method and device for processing signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IS JP KE KG KP KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG US UZ VN AM AZ BY KG KZ MD RU |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): KE LS MW SD SZ UG AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
ENP | Entry into the national phase |
Ref document number: 2211402 Country of ref document: CA Kind code of ref document: A Ref document number: 2211402 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 1997 894940 Country of ref document: US Date of ref document: 19970904 Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
122 | Ep: pct application non-entry in european phase |