WO1997031367A1 - Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models - Google Patents

Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models Download PDF

Info

Publication number
WO1997031367A1
WO1997031367A1 PCT/US1997/002898 US9702898W WO9731367A1 WO 1997031367 A1 WO1997031367 A1 WO 1997031367A1 US 9702898 W US9702898 W US 9702898W WO 9731367 A1 WO9731367 A1 WO 9731367A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
speech
pitch
lpc
quantized
Prior art date
Application number
PCT/US1997/002898
Other languages
English (en)
French (fr)
Inventor
Juin-Hwey Chen
Original Assignee
At & T Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by At & T Corp. filed Critical At & T Corp.
Priority to JP9530382A priority Critical patent/JPH11504733A/ja
Priority to MX9708203A priority patent/MX9708203A/es
Priority to EP97907830A priority patent/EP0954851A1/en
Publication of WO1997031367A1 publication Critical patent/WO1997031367A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • This vector hd is subtracted from the vector tp by the subtracting unit 90.
  • the result is tt, the target vector for transform coding.
  • the resulting normalized FFT coefficient vector is partitioned into 3 frequency bands: (1) the low-frequency band consisting of the first 6 normalized FFT coefficients (i.e. from 0 to 1250 Hz), (2) the mid-frequency band consisting of the next 10 normalized FFT coefficients (from 1500 to 3750 Hz), and (3) the high-frequency band consisting of the remaining 17 normalized FFT coefficients (from 4000 to 8000 Hz).
  • processor 130 uses a "greedy” algorithm to perform adaptive bit allocation.
  • the technique is “greedy” in the sense that it allocates one bit at a time to the most "needy" frequency component without regard to its potential influence on future bit allocation.
  • the LPC power spectrum is assumed to be the power spectrum of the coding noise.
  • the noise loudness at each of the 33 frequencies of a 64-point FFT is estimated using the masking threshold calculated above and a simplified version of the noise loudness calculation method in Schroeder et al.
  • the simplified noise loudness at each of the 33 frequencies is calculated as follows. First, the critical bandwidth Bj at the i-th frequency is calculated using linear inte ⁇ olation of the critical bandwidth listed in table 1 of Scharf s book chapter in Tobias. The result is the approximated value of the term df/dx in equation (3) of Schroeder et al.
  • the 33 critical bandwidth values are pre-computed and stored in a table. Then, for the i-th frequency, the noise power N f is compared with the masking threshold Mj. If N, ⁇ M
  • the transform coefficient quantizer 120 quantizes the transform coefficients contained in tc using the bit allocation signal ba.
  • the DC term of the FFT is a real number, and it is scalar quantized if it ever receives any bit during bit allocation.
  • the maximum number of bits it can receive is 4.
  • a conventional two-dimensional vector quantizer is used to quantize the real and imaginary parts jointly.
  • the maximum number of bits for this 2-dimension VQ is 6 bits.
  • a conventional 4-dimensional vector quantizer is used to jointly quantize the real and imaginary parts of two adjacent FFT coefficients.
  • the resulting VQ codebook index array IC contains the main information of the TPC encoder. This index array IC is provided to the multiplexer 180, where it is combined with side information bits. The result is the final bit-stream, which is transmitted through a communication channel to the TPC decoder.
  • the transform coefficient quantizer 120 also decodes the quantized values of the normalized transform coefficients. It then restores the original o gain levels of these transform coefficients by multiplying each of these coefficients by the corresponding elements of mag and the quantized linear gain of the corresponding frequency band. The result is the output vector dtc.
  • FIG. 2 An illustrative decoder embodiment of the present invention is shown 15 in Figure 2.
  • the demultiplexer 200 separates all main and side information components from the received bit-stream.
  • the main information the transform coefficient index array IC, is provided to the transform coefficient decoder 235.
  • adaptive bit allocation must be performed to determine how many of the main 20 information bits are associated with each quantized transform coefficient.
  • the transform coefficient decoder 235 can then correctly decode the main information and obtain the quantized versions of the normalized transform coefficients.
  • the decoder 235 also decodes the gains using the gain index array IG. For each subframe, there are two gain indices (5 and 7 bits), which are decoded into the quantized log gain of the low-frequency band and the quantized versions of the level-adjusted log gains of the mid-and high-frequency log gains. The quantized low-frequency log gain is then added back to the quantized versions of the level-adjusted mid- and high-frequency log gains to obtain the quantized log gains of the mid- and high-frequency bands.
  • the high-frequency synthesis processor 240, inverse transform processor 245, and the inverse shaping filter 250 are again exact replicas of the corresponding blocks (140, 150, and 160) in the TPC encoder. Together they perform high-frequency synthesis, noise fill-in, inverse transformation, and inverse shaping filtering to produce the quantized excitation vector et.
  • the adder 255 adds dh and et to get dt, the quantized version of the LPC prediction residual d.
  • This dt vector is fed back to the pitch predictor inside block 210 to update its internal storage buffer for dt (the filter memory of the pitch predictor).
  • the long-term postfilter 260 is basically similar to the long-term postfilter used in the ITU-T G.728 standard 16 kb/s Low-Delay CELP coder.
  • the main difference is that it uses ⁇ b lk , the sum of the three quantized pitch i-l taps, as the voicing indicator, and that the scaling factor for the long-term postfilter coefficient is 0.4 rather than 0.15 as in G.728. If this voicing indicator is less than 0.5, the postfiltering operation is skipped, and the output vector fdt is identical to the input vector dt. If this indicator is 0.5 or more, the postfiltering operation is carried out.
  • the LPC synthesis filter 265 is the standard LPC filter — an all-pole, direct-form filter with the quantized LPC coefficient array a. It filters the signal fdt and produces the long-term postfiltered, quantized speech vector st.
  • This st vector is passed through the short-term postfilter 270 to produce the final TPC decoder output speech signal fst.
  • this short-term postfilter 270 is very similar to the short-term postfilter used in G.728. The only differences are the following. First, the pole-controlling factor, the zero-controlling factor, and the spectral-tilt controlling factor are 0.7, 0.55, and 0.4, respectively, rather than the corresponding values of 0.75, 0.65, and 0.15 in G.728. Second, the coefficient of the first-order spectral-tilt compensation filter is linearly interpolated sample-by-sample between frames. This helps to avoid occasionally audible clicks due to discontinuity at frame boundaries.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
PCT/US1997/002898 1996-02-26 1997-02-26 Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models WO1997031367A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP9530382A JPH11504733A (ja) 1996-02-26 1997-02-26 聴覚モデルによる量子化を伴う予測残余信号の変形符号化による多段音声符号器
MX9708203A MX9708203A (es) 1996-02-26 1997-02-26 Cuantificacion de señales vocales usando modelos de publico humano en sistemas de codificacion predictivas.
EP97907830A EP0954851A1 (en) 1996-02-26 1997-02-26 Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US1229696P 1996-02-26 1996-02-26
US60/012,296 1996-02-26

Publications (1)

Publication Number Publication Date
WO1997031367A1 true WO1997031367A1 (en) 1997-08-28

Family

ID=21754300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/002898 WO1997031367A1 (en) 1996-02-26 1997-02-26 Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models

Country Status (5)

Country Link
EP (1) EP0954851A1 (enrdf_load_stackoverflow)
JP (1) JPH11504733A (enrdf_load_stackoverflow)
CA (1) CA2219358A1 (enrdf_load_stackoverflow)
MX (1) MX9708203A (enrdf_load_stackoverflow)
WO (1) WO1997031367A1 (enrdf_load_stackoverflow)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000017858A1 (en) * 1998-09-18 2000-03-30 Conexant Systems, Inc. Robust fast search for two-dimensional gain vector quantizer
WO2002091363A1 (en) * 2001-05-08 2002-11-14 Koninklijke Philips Electronics N.V. Audio coding
US7451091B2 (en) 2003-10-07 2008-11-11 Matsushita Electric Industrial Co., Ltd. Method for determining time borders and frequency resolutions for spectral envelope coding
WO2012161675A1 (en) * 2011-05-20 2012-11-29 Google Inc. Redundant coding unit for audio codec
US9224403B2 (en) 2010-07-02 2015-12-29 Dolby International Ab Selective bass post filter
CN111862995A (zh) * 2020-06-22 2020-10-30 北京达佳互联信息技术有限公司 一种码率确定模型训练方法、码率确定方法及装置
CN116052695A (zh) * 2022-10-28 2023-05-02 陕西师范大学 一种基于波运算的音频听觉密码方法、系统及设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6778953B1 (en) * 2000-06-02 2004-08-17 Agere Systems Inc. Method and apparatus for representing masked thresholds in a perceptual audio coder
DE102006022346B4 (de) * 2006-05-12 2008-02-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Informationssignalcodierung
KR102052144B1 (ko) * 2011-10-24 2019-12-05 엘지전자 주식회사 음성 신호의 대역 선택적 양자화 방법 및 장치
KR20230116503A (ko) * 2022-01-28 2023-08-04 한국전자통신연구원 스칼라 양자화와 벡터 양자화를 이용한 부호화 방법 및 부호화 장치, 그리고 복호화 방법 및 복호화 장치

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012517A (en) * 1989-04-18 1991-04-30 Pacific Communication Science, Inc. Adaptive transform coder having long term predictor
US5583963A (en) * 1993-01-21 1996-12-10 France Telecom System for predictive coding/decoding of a digital speech signal by embedded-code adaptive transform

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012517A (en) * 1989-04-18 1991-04-30 Pacific Communication Science, Inc. Adaptive transform coder having long term predictor
US5583963A (en) * 1993-01-21 1996-12-10 France Telecom System for predictive coding/decoding of a digital speech signal by embedded-code adaptive transform

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ACOUSTICS, SPEECH & SIGNAL PROCESSING CONFERENCE, IEEE ICASSP '88, DAVIDSON G. et al., "Multiple-Stage Vector Excitation Coding of Speech Waveforms", pages 163-166. *
ACOUSTICS, SPEECH & SIGNAL PROCESSING CONFERENCE, IEEE ICASSP '89, OFER et al., "A Unified Framework for LPC Excitation Representation in Residual Speech Coders", pages 44-44. *
GLOBAL TELECOMMUNICATIONS CONFERENCE, IEEE GLOBECOM 90, JOHNSON et al., "Pitch-Orthogonal Code-Excited LPC", pages 542-546. *
See also references of EP0954851A4 *

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000017858A1 (en) * 1998-09-18 2000-03-30 Conexant Systems, Inc. Robust fast search for two-dimensional gain vector quantizer
US6397178B1 (en) 1998-09-18 2002-05-28 Conexant Systems, Inc. Data organizational scheme for enhanced selection of gain parameters for speech coding
WO2002091363A1 (en) * 2001-05-08 2002-11-14 Koninklijke Philips Electronics N.V. Audio coding
KR100871999B1 (ko) * 2001-05-08 2008-12-05 코닌클리케 필립스 일렉트로닉스 엔.브이. 오디오 코딩
US7483836B2 (en) 2001-05-08 2009-01-27 Koninklijke Philips Electronics N.V. Perceptual audio coding on a priority basis
US7451091B2 (en) 2003-10-07 2008-11-11 Matsushita Electric Industrial Co., Ltd. Method for determining time borders and frequency resolutions for spectral envelope coding
US9552824B2 (en) 2010-07-02 2017-01-24 Dolby International Ab Post filter
US9858940B2 (en) 2010-07-02 2018-01-02 Dolby International Ab Pitch filter for audio signals
US9343077B2 (en) 2010-07-02 2016-05-17 Dolby International Ab Pitch filter for audio signals
US9396736B2 (en) 2010-07-02 2016-07-19 Dolby International Ab Audio encoder and decoder with multiple coding modes
US11996111B2 (en) 2010-07-02 2024-05-28 Dolby International Ab Post filter for audio signals
US9558754B2 (en) 2010-07-02 2017-01-31 Dolby International Ab Audio encoder and decoder with pitch prediction
US9558753B2 (en) 2010-07-02 2017-01-31 Dolby International Ab Pitch filter for audio signals
US9595270B2 (en) 2010-07-02 2017-03-14 Dolby International Ab Selective post filter
US9830923B2 (en) 2010-07-02 2017-11-28 Dolby International Ab Selective bass post filter
US9224403B2 (en) 2010-07-02 2015-12-29 Dolby International Ab Selective bass post filter
US10236010B2 (en) 2010-07-02 2019-03-19 Dolby International Ab Pitch filter for audio signals
US10811024B2 (en) 2010-07-02 2020-10-20 Dolby International Ab Post filter for audio signals
US11610595B2 (en) 2010-07-02 2023-03-21 Dolby International Ab Post filter for audio signals
US11183200B2 (en) 2010-07-02 2021-11-23 Dolby International Ab Post filter for audio signals
WO2012161675A1 (en) * 2011-05-20 2012-11-29 Google Inc. Redundant coding unit for audio codec
CN111862995A (zh) * 2020-06-22 2020-10-30 北京达佳互联信息技术有限公司 一种码率确定模型训练方法、码率确定方法及装置
CN116052695A (zh) * 2022-10-28 2023-05-02 陕西师范大学 一种基于波运算的音频听觉密码方法、系统及设备

Also Published As

Publication number Publication date
EP0954851A1 (en) 1999-11-10
CA2219358A1 (en) 1997-08-28
MX9708203A (es) 1997-12-31
JPH11504733A (ja) 1999-04-27
EP0954851A4 (enrdf_load_stackoverflow) 1999-11-10

Similar Documents

Publication Publication Date Title
US5790759A (en) Perceptual noise masking measure based on synthesis filter frequency response
EP0764941B1 (en) Speech signal quantization using human auditory models in predictive coding systems
EP0764939B1 (en) Synthesis of speech signals in the absence of coded parameters
RU2262748C2 (ru) Многорежимное устройство кодирования
US6735567B2 (en) Encoding and decoding speech signals variably based on signal classification
US6574593B1 (en) Codebook tables for encoding and decoding
US6581032B1 (en) Bitstream protocol for transmission of encoded voice signals
JP4662673B2 (ja) 広帯域音声及びオーディオ信号復号器における利得平滑化
EP0503684B1 (en) Adaptive filtering method for speech and audio
EP0465057B1 (en) Low-delay code-excited linear predictive coding of wideband speech at 32kbits/sec
US5307441A (en) Wear-toll quality 4.8 kbps speech codec
JP3490685B2 (ja) 広帯域信号の符号化における適応帯域ピッチ探索のための方法および装置
US6098036A (en) Speech coding system and method including spectral formant enhancer
US5699382A (en) Method for noise weighting filtering
MXPA96004161A (en) Quantification of speech signals using human auiditive models in predict encoding systems
JP4176349B2 (ja) マルチモードの音声符号器
KR20030046451A (ko) 음성 코딩을 위한 코드북 구조 및 탐색 방법
EP0954851A1 (en) Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models
GB2352949A (en) Speech coder for communications unit
CA2303711C (en) Method for noise weighting filtering
AU2757602A (en) Multimode speech encoder

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP MX US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

ENP Entry into the national phase

Ref document number: 2219358

Country of ref document: CA

Ref country code: CA

Ref document number: 2219358

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: PA/a/1997/008203

Country of ref document: MX

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1997 530382

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1997907830

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1997907830

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997907830

Country of ref document: EP