WO1997031367A1 - Vocodeur multi-niveau a codage par transformee des signaux predictifs residuels et quantification sur modeles auditifs - Google Patents

Vocodeur multi-niveau a codage par transformee des signaux predictifs residuels et quantification sur modeles auditifs Download PDF

Info

Publication number
WO1997031367A1
WO1997031367A1 PCT/US1997/002898 US9702898W WO9731367A1 WO 1997031367 A1 WO1997031367 A1 WO 1997031367A1 US 9702898 W US9702898 W US 9702898W WO 9731367 A1 WO9731367 A1 WO 9731367A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
speech
pitch
lpc
quantized
Prior art date
Application number
PCT/US1997/002898
Other languages
English (en)
Inventor
Juin-Hwey Chen
Original Assignee
At & T Corp.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by At & T Corp. filed Critical At & T Corp.
Priority to EP97907830A priority Critical patent/EP0954851A1/fr
Priority to JP9530382A priority patent/JPH11504733A/ja
Priority to MX9708203A priority patent/MX9708203A/es
Publication of WO1997031367A1 publication Critical patent/WO1997031367A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • This vector hd is subtracted from the vector tp by the subtracting unit 90.
  • the result is tt, the target vector for transform coding.
  • the resulting normalized FFT coefficient vector is partitioned into 3 frequency bands: (1) the low-frequency band consisting of the first 6 normalized FFT coefficients (i.e. from 0 to 1250 Hz), (2) the mid-frequency band consisting of the next 10 normalized FFT coefficients (from 1500 to 3750 Hz), and (3) the high-frequency band consisting of the remaining 17 normalized FFT coefficients (from 4000 to 8000 Hz).
  • processor 130 uses a "greedy” algorithm to perform adaptive bit allocation.
  • the technique is “greedy” in the sense that it allocates one bit at a time to the most "needy" frequency component without regard to its potential influence on future bit allocation.
  • the LPC power spectrum is assumed to be the power spectrum of the coding noise.
  • the noise loudness at each of the 33 frequencies of a 64-point FFT is estimated using the masking threshold calculated above and a simplified version of the noise loudness calculation method in Schroeder et al.
  • the simplified noise loudness at each of the 33 frequencies is calculated as follows. First, the critical bandwidth Bj at the i-th frequency is calculated using linear inte ⁇ olation of the critical bandwidth listed in table 1 of Scharf s book chapter in Tobias. The result is the approximated value of the term df/dx in equation (3) of Schroeder et al.
  • the 33 critical bandwidth values are pre-computed and stored in a table. Then, for the i-th frequency, the noise power N f is compared with the masking threshold Mj. If N, ⁇ M
  • the transform coefficient quantizer 120 quantizes the transform coefficients contained in tc using the bit allocation signal ba.
  • the DC term of the FFT is a real number, and it is scalar quantized if it ever receives any bit during bit allocation.
  • the maximum number of bits it can receive is 4.
  • a conventional two-dimensional vector quantizer is used to quantize the real and imaginary parts jointly.
  • the maximum number of bits for this 2-dimension VQ is 6 bits.
  • a conventional 4-dimensional vector quantizer is used to jointly quantize the real and imaginary parts of two adjacent FFT coefficients.
  • the resulting VQ codebook index array IC contains the main information of the TPC encoder. This index array IC is provided to the multiplexer 180, where it is combined with side information bits. The result is the final bit-stream, which is transmitted through a communication channel to the TPC decoder.
  • the transform coefficient quantizer 120 also decodes the quantized values of the normalized transform coefficients. It then restores the original o gain levels of these transform coefficients by multiplying each of these coefficients by the corresponding elements of mag and the quantized linear gain of the corresponding frequency band. The result is the output vector dtc.
  • FIG. 2 An illustrative decoder embodiment of the present invention is shown 15 in Figure 2.
  • the demultiplexer 200 separates all main and side information components from the received bit-stream.
  • the main information the transform coefficient index array IC, is provided to the transform coefficient decoder 235.
  • adaptive bit allocation must be performed to determine how many of the main 20 information bits are associated with each quantized transform coefficient.
  • the transform coefficient decoder 235 can then correctly decode the main information and obtain the quantized versions of the normalized transform coefficients.
  • the decoder 235 also decodes the gains using the gain index array IG. For each subframe, there are two gain indices (5 and 7 bits), which are decoded into the quantized log gain of the low-frequency band and the quantized versions of the level-adjusted log gains of the mid-and high-frequency log gains. The quantized low-frequency log gain is then added back to the quantized versions of the level-adjusted mid- and high-frequency log gains to obtain the quantized log gains of the mid- and high-frequency bands.
  • the high-frequency synthesis processor 240, inverse transform processor 245, and the inverse shaping filter 250 are again exact replicas of the corresponding blocks (140, 150, and 160) in the TPC encoder. Together they perform high-frequency synthesis, noise fill-in, inverse transformation, and inverse shaping filtering to produce the quantized excitation vector et.
  • the adder 255 adds dh and et to get dt, the quantized version of the LPC prediction residual d.
  • This dt vector is fed back to the pitch predictor inside block 210 to update its internal storage buffer for dt (the filter memory of the pitch predictor).
  • the long-term postfilter 260 is basically similar to the long-term postfilter used in the ITU-T G.728 standard 16 kb/s Low-Delay CELP coder.
  • the main difference is that it uses ⁇ b lk , the sum of the three quantized pitch i-l taps, as the voicing indicator, and that the scaling factor for the long-term postfilter coefficient is 0.4 rather than 0.15 as in G.728. If this voicing indicator is less than 0.5, the postfiltering operation is skipped, and the output vector fdt is identical to the input vector dt. If this indicator is 0.5 or more, the postfiltering operation is carried out.
  • the LPC synthesis filter 265 is the standard LPC filter — an all-pole, direct-form filter with the quantized LPC coefficient array a. It filters the signal fdt and produces the long-term postfiltered, quantized speech vector st.
  • This st vector is passed through the short-term postfilter 270 to produce the final TPC decoder output speech signal fst.
  • this short-term postfilter 270 is very similar to the short-term postfilter used in G.728. The only differences are the following. First, the pole-controlling factor, the zero-controlling factor, and the spectral-tilt controlling factor are 0.7, 0.55, and 0.4, respectively, rather than the corresponding values of 0.75, 0.65, and 0.15 in G.728. Second, the coefficient of the first-order spectral-tilt compensation filter is linearly interpolated sample-by-sample between frames. This helps to avoid occasionally audible clicks due to discontinuity at frame boundaries.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

La présente invention concerne un système de compression de la parole dénommé 'Codage Prédictif par Transformée' ou TPC (pour 'Transform Predictive Coding') qui permet de coder la parole de la bande des 7 Khz (échantillonnée à 16 Khz) en atteignant un débit binaire de 16 ou 32 k-octets/s, à raison de 1 à 2 bits par échantillon. Pour annuler les redondances, le système utilise un dispositif prédictif à court terme et à long terme. Le résiduel de prédiction subit une transformation et un codage dans le domaine de fréquences représenté dans la figure, et ce, au niveau du processeur de transformée (110) après prise en compte des données du domaine temporel de l'additionneur (60) et l'entrée des paramètres depuis le processeur de réponse d'amplitude à filtre de mise en forme (100), ce qui corrige le spectre en vue de la perception auditive. Le vocodeur TPC n'utilise qu'une quantification en boucle ouverte comme le démontre la présence d'un extracteur/interpolateur de hauteur de son (70), ce qui fait que le vocodeur TPC n'est que faiblement complexe. La parole est de qualité transparente à 32 k-octets/s, de très bonne qualité à 24 k-octets/s, et acceptable à 16 k-octets/s.
PCT/US1997/002898 1996-02-26 1997-02-26 Vocodeur multi-niveau a codage par transformee des signaux predictifs residuels et quantification sur modeles auditifs WO1997031367A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP97907830A EP0954851A1 (fr) 1996-02-26 1997-02-26 Vocodeur multi-niveau a codage par transformee des signaux predictifs residuels et quantification sur modeles auditifs
JP9530382A JPH11504733A (ja) 1996-02-26 1997-02-26 聴覚モデルによる量子化を伴う予測残余信号の変形符号化による多段音声符号器
MX9708203A MX9708203A (es) 1996-02-26 1997-02-26 Cuantificacion de señales vocales usando modelos de publico humano en sistemas de codificacion predictivas.

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US1229696P 1996-02-26 1996-02-26
US60/012,296 1996-02-26

Publications (1)

Publication Number Publication Date
WO1997031367A1 true WO1997031367A1 (fr) 1997-08-28

Family

ID=21754300

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1997/002898 WO1997031367A1 (fr) 1996-02-26 1997-02-26 Vocodeur multi-niveau a codage par transformee des signaux predictifs residuels et quantification sur modeles auditifs

Country Status (5)

Country Link
EP (1) EP0954851A1 (fr)
JP (1) JPH11504733A (fr)
CA (1) CA2219358A1 (fr)
MX (1) MX9708203A (fr)
WO (1) WO1997031367A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000017858A1 (fr) * 1998-09-18 2000-03-30 Conexant Systems, Inc. Recherche rapide et robuste pour numeriseur de vecteurs de gain a deux dimensions
WO2002091363A1 (fr) * 2001-05-08 2002-11-14 Koninklijke Philips Electronics N.V. Codage audio
US7451091B2 (en) 2003-10-07 2008-11-11 Matsushita Electric Industrial Co., Ltd. Method for determining time borders and frequency resolutions for spectral envelope coding
WO2012161675A1 (fr) * 2011-05-20 2012-11-29 Google Inc. Unité de codage redondant pour codec audio
US9224403B2 (en) 2010-07-02 2015-12-29 Dolby International Ab Selective bass post filter
CN111862995A (zh) * 2020-06-22 2020-10-30 北京达佳互联信息技术有限公司 一种码率确定模型训练方法、码率确定方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6778953B1 (en) * 2000-06-02 2004-08-17 Agere Systems Inc. Method and apparatus for representing masked thresholds in a perceptual audio coder
DE102006022346B4 (de) * 2006-05-12 2008-02-28 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Informationssignalcodierung
US9390722B2 (en) * 2011-10-24 2016-07-12 Lg Electronics Inc. Method and device for quantizing voice signals in a band-selective manner

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012517A (en) * 1989-04-18 1991-04-30 Pacific Communication Science, Inc. Adaptive transform coder having long term predictor
US5583963A (en) * 1993-01-21 1996-12-10 France Telecom System for predictive coding/decoding of a digital speech signal by embedded-code adaptive transform

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5012517A (en) * 1989-04-18 1991-04-30 Pacific Communication Science, Inc. Adaptive transform coder having long term predictor
US5583963A (en) * 1993-01-21 1996-12-10 France Telecom System for predictive coding/decoding of a digital speech signal by embedded-code adaptive transform

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ACOUSTICS, SPEECH & SIGNAL PROCESSING CONFERENCE, IEEE ICASSP '88, DAVIDSON G. et al., "Multiple-Stage Vector Excitation Coding of Speech Waveforms", pages 163-166. *
ACOUSTICS, SPEECH & SIGNAL PROCESSING CONFERENCE, IEEE ICASSP '89, OFER et al., "A Unified Framework for LPC Excitation Representation in Residual Speech Coders", pages 44-44. *
GLOBAL TELECOMMUNICATIONS CONFERENCE, IEEE GLOBECOM 90, JOHNSON et al., "Pitch-Orthogonal Code-Excited LPC", pages 542-546. *
See also references of EP0954851A4 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000017858A1 (fr) * 1998-09-18 2000-03-30 Conexant Systems, Inc. Recherche rapide et robuste pour numeriseur de vecteurs de gain a deux dimensions
US6397178B1 (en) 1998-09-18 2002-05-28 Conexant Systems, Inc. Data organizational scheme for enhanced selection of gain parameters for speech coding
WO2002091363A1 (fr) * 2001-05-08 2002-11-14 Koninklijke Philips Electronics N.V. Codage audio
KR100871999B1 (ko) * 2001-05-08 2008-12-05 코닌클리케 필립스 일렉트로닉스 엔.브이. 오디오 코딩
US7483836B2 (en) 2001-05-08 2009-01-27 Koninklijke Philips Electronics N.V. Perceptual audio coding on a priority basis
US7451091B2 (en) 2003-10-07 2008-11-11 Matsushita Electric Industrial Co., Ltd. Method for determining time borders and frequency resolutions for spectral envelope coding
US9552824B2 (en) 2010-07-02 2017-01-24 Dolby International Ab Post filter
US10236010B2 (en) 2010-07-02 2019-03-19 Dolby International Ab Pitch filter for audio signals
US9343077B2 (en) 2010-07-02 2016-05-17 Dolby International Ab Pitch filter for audio signals
US9396736B2 (en) 2010-07-02 2016-07-19 Dolby International Ab Audio encoder and decoder with multiple coding modes
US11996111B2 (en) 2010-07-02 2024-05-28 Dolby International Ab Post filter for audio signals
US9558753B2 (en) 2010-07-02 2017-01-31 Dolby International Ab Pitch filter for audio signals
US9558754B2 (en) 2010-07-02 2017-01-31 Dolby International Ab Audio encoder and decoder with pitch prediction
US9595270B2 (en) 2010-07-02 2017-03-14 Dolby International Ab Selective post filter
US9830923B2 (en) 2010-07-02 2017-11-28 Dolby International Ab Selective bass post filter
US9858940B2 (en) 2010-07-02 2018-01-02 Dolby International Ab Pitch filter for audio signals
US9224403B2 (en) 2010-07-02 2015-12-29 Dolby International Ab Selective bass post filter
US10811024B2 (en) 2010-07-02 2020-10-20 Dolby International Ab Post filter for audio signals
US11610595B2 (en) 2010-07-02 2023-03-21 Dolby International Ab Post filter for audio signals
US11183200B2 (en) 2010-07-02 2021-11-23 Dolby International Ab Post filter for audio signals
WO2012161675A1 (fr) * 2011-05-20 2012-11-29 Google Inc. Unité de codage redondant pour codec audio
CN111862995A (zh) * 2020-06-22 2020-10-30 北京达佳互联信息技术有限公司 一种码率确定模型训练方法、码率确定方法及装置

Also Published As

Publication number Publication date
EP0954851A1 (fr) 1999-11-10
EP0954851A4 (fr) 1999-11-10
JPH11504733A (ja) 1999-04-27
MX9708203A (es) 1997-12-31
CA2219358A1 (fr) 1997-08-28

Similar Documents

Publication Publication Date Title
US5790759A (en) Perceptual noise masking measure based on synthesis filter frequency response
EP0764941B1 (fr) Quantification des signaux de parole dans des systèmes de codage de la parole utilisant des modèles d'audition humaine
EP0764939B1 (fr) Synthèse de signaux de parole en l'absence de paramètres codés
RU2262748C2 (ru) Многорежимное устройство кодирования
US6735567B2 (en) Encoding and decoding speech signals variably based on signal classification
US6574593B1 (en) Codebook tables for encoding and decoding
US6581032B1 (en) Bitstream protocol for transmission of encoded voice signals
JP4662673B2 (ja) 広帯域音声及びオーディオ信号復号器における利得平滑化
EP0503684B1 (fr) Procédé de filtrage adaptatif de la parole et de signaux audio
EP0465057B1 (fr) Codage par prédiction linéaire à excitation par code à 32 kb/s avec un faible retard d'un signal de parole à large bande
US5307441A (en) Wear-toll quality 4.8 kbps speech codec
JP3490685B2 (ja) 広帯域信号の符号化における適応帯域ピッチ探索のための方法および装置
US6098036A (en) Speech coding system and method including spectral formant enhancer
US5699382A (en) Method for noise weighting filtering
MXPA96004161A (en) Quantification of speech signals using human auiditive models in predict encoding systems
EP1214706B1 (fr) Codeur vocal multimode
KR20030046451A (ko) 음성 코딩을 위한 코드북 구조 및 탐색 방법
WO1997031367A1 (fr) Vocodeur multi-niveau a codage par transformee des signaux predictifs residuels et quantification sur modeles auditifs
JPH01261930A (ja) 音声復号器のポスト雑音整形フィルタ
CA2303711C (fr) Methode de filtrage pour la ponderation du bruit
GB2352949A (en) Speech coder for communications unit
AU2757602A (en) Multimode speech encoder

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): CA JP MX US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

ENP Entry into the national phase

Ref document number: 2219358

Country of ref document: CA

Ref country code: CA

Ref document number: 2219358

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: PA/a/1997/008203

Country of ref document: MX

ENP Entry into the national phase

Ref country code: JP

Ref document number: 1997 530382

Kind code of ref document: A

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1997907830

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWP Wipo information: published in national office

Ref document number: 1997907830

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1997907830

Country of ref document: EP