EP1431962B1 - Dispositif et procédé de codage de parole à large bande - Google Patents

Dispositif et procédé de codage de parole à large bande Download PDF

Info

Publication number
EP1431962B1
EP1431962B1 EP04100553A EP04100553A EP1431962B1 EP 1431962 B1 EP1431962 B1 EP 1431962B1 EP 04100553 A EP04100553 A EP 04100553A EP 04100553 A EP04100553 A EP 04100553A EP 1431962 B1 EP1431962 B1 EP 1431962B1
Authority
EP
European Patent Office
Prior art keywords
speech
highband
lowband
khz
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP04100553A
Other languages
German (de)
English (en)
Other versions
EP1431962A3 (fr
EP1431962A2 (fr
Inventor
Alan V Mccree
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Texas Instruments Inc
Original Assignee
Texas Instruments Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Texas Instruments Inc filed Critical Texas Instruments Inc
Priority claimed from EP01000172A external-priority patent/EP1158495B1/fr
Publication of EP1431962A2 publication Critical patent/EP1431962A2/fr
Publication of EP1431962A3 publication Critical patent/EP1431962A3/fr
Application granted granted Critical
Publication of EP1431962B1 publication Critical patent/EP1431962B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • the present invention relates to electronic devices, and, more particularly, to speech coding, transmission, storage, and decoding/synthesis methods and systems.
  • the performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications.
  • Both dedicated channel and packetized-over-network (VoIP) transmission benefit from compression of speech signals.
  • the widely-used linear prediction (LP) digital speech coding compression method models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech.
  • M the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network (PSTN) sampling for digital transmission); and the number of samples ⁇ s(n) ⁇ in a frame is often 80 or 160 (10 or 20 ms frames).
  • Various windowing operations may be applied to the samples of the input speech frame.
  • ⁇ r(n) 2 yields the ⁇ a(j) ⁇ which furnish the best linear prediction.
  • the coefficients ⁇ a(j) ⁇ may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage.
  • the ⁇ r (n) ⁇ form the LP residual for the frame, and ideally LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1).
  • the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an LP excitation from the encoded parameters.
  • the LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and the (quantized) gain.
  • a receiver regenerates the speech with the same perceptual characteristics as the input speech.
  • Figure 9 shows the blocks in an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second).
  • the ITU standard G.729 Annex E with a bit rate of 11.8 kb/s uses LP analysis with codebook excitation (CELP) to compress voiceband speech and has performance comparable to the 64 kb/s PCM used for PSTN digital transmission.
  • CELP codebook excitation
  • Another approach uses split-band CELP or MPLPC by coding a 4-8 kHz highband separately from the 0-4 kHz lowband and with fewer bits allocated to the highband; see Drogo de Jacovo et al, Some Experiments of 7 kHz Audio Coding at 16 kbit/s, IEEE ICASSP 1989, pp.192-195. Similarly, Tucker, Low Bit-Rate Frequency Extension Coding, IEE Colloquium on Audio and Music Technology 1998, pp.3/1-3/5, provides standard coding of the lowband 0-4 kHz plus codes the 4-8 kHz highband speech only for unvoiced frames (as determined in the lowband) and uses an LP filter of order 2-4 with noise excitation.
  • the present invention provides a method of wideband speech coding, comprising: (a) partitioning a frame of digital speech into a lowband and a highband; (b) decimating the sampling rate of both said lowband and said highband; (c) encoding said decimated lowband from step (b) including a first method of quantization; (d) reversing the spectrum of a baseband image of said decimated highband from step (b); and (e) encoding the results of step (d) including said first method of quantization.
  • a wideband speech decoder comprising: (a) a first speech decoder with an input for encoded narrowband speech and an LP codebook; (b) a second speech decoder with an input for encoded highband speech, said second decoder using said LP codebook.
  • the preferred embodiment systems include preferred embodiment encoders and decoders that process a wideband speech frame as the sum of a lowband signal and a highband signal in which the lowband signal has standalone speech encoding/decoding and the highband signal has encoding/decoding incorporating information from the lowband signal to modulate a noise excitation. This allows for a minimal number of bits to sufficiently encode the highband and yields an embedded coder.
  • Figure 1a shows in functional block format a first preferred embodiment system for wideband speech encoding, transmission (storage), and decoding including first preferred embodiment encoders and decoders.
  • the encoders and decoders use CELP lowband encoding and decoding plus a highband encoding and decoding incorporating information from the (decoded) lowband for modulation of a noise excitation with LP coding.
  • first preferred embodiment encoders proceed as follows.
  • the baseband of the decimated highband has a reversed spectrum because the baseband is an aliased image; see Figure 3b.
  • encode the first baseband (decimated lowband) signal with a (standard) narrowband speech coder.
  • Decoding reverses the encoding process by separating the highband and lowband code, using information from the decoded lowband to help decode the highband, and adding the decoded highband to the decoded lowband speech to synthesize wideband speech. See Figure 1c.
  • This split-band approach allows most of the code bits to be allocated to the lowband; for example, the lowband may consume 11.8 kb/s and the highband may add 2.2 kb/s for a total of 14 kb/s.
  • Figures 2a-2b illustrate the typical magnitudes of voiced and unvoiced speech, respectively, as functions of frequency over the range 0-8 kHz.
  • the bulk of the energy in voiced speech resides in the 0-3 kHz band.
  • the pitch structure (the fundamental frequency is about 125 Hz in Figure 2a) clearly appears in the range 0-3.5 kHz and persists (although jumbled) at higher frequencies.
  • the perceptual critical bandwidth at higher frequencies is roughly 10% of a band center frequency, so the individual pitch harmonics become indistinguishable and should require fewer bits for inclusion in a highband code.
  • the higher band (above 4 kHz) should require fewer bits to encode than the lower band (0-4 kHz).
  • This underlies the preferred embodiment methods of partitioning wideband (0-8 kHz) speech into a lowband (0-4 kHz) and a highband (4-8 kHz), recognizing that the lowband may be encoded by any convenient narrowband coder, and separately coding the highband with a relatively small number of bits as described in the following sections.
  • Figure 1b illustrates the flow of a first preferred embodiment speech coder which encodes at 14 kb/s with the following steps.
  • a first preferred embodiment decoding method essentially reverses the encoding steps for a bitstream encoded by the first preferred embodiment method.
  • a coded frame in the bitstream For a coded frame in the bitstream:
  • FIGS 8-9 show in functional block form preferred embodiment systems that use the preferred embodiment encoding and decoding.
  • the encoding and decoding can be performed with digital signal processors (DSPs) or general purpose programmable processors or application specific circuitry or systems on a chip such as both a DSP and RISC processor on the same chip with the RISC processor controlling.
  • Codebooks would be stored in memory at both the encoder and decoder, and a stored program in an onboard ROM or external flash EEPROM for a DSP or programmable processor could perform the signal processing.
  • Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, and modulators and demodulators (plus antennas for air interfaces) provide coupling for transmission waveforms.
  • the encoded speech can be packetized and transmitted over networks such as the Internet.
  • the preferred embodiments may be modified in various ways while retaining the features of separately coding a lowband from a wideband signal and using information from the lowband to help encode the highband (remainder of the wideband) and/or using spectrum reversal for decimated highband LP coefficient quantization in order to obtain efficiency comparable to that for the lowband LP coefficient quantization.
  • the upper (2.8-3.8 kHz) portion of the lowband (0-4 kHz) could be replaced by some other portion(s) of the lowband for use as a modulation for the highband excitation.
  • the wideband may be partitioned into a lowband plus two or more highbands; the lowband coder could be a parametric or even non-LP coder and a highband coder could be a waveform coder; and so forth.
  • the scope of the invention is hereby only limited by the appended claims.

Claims (4)

  1. Procédé de codage de la parole à large bande, comprenant :
    (a) la segmentation d'une séquence de parole numérique en une bande basse et une bande haute ;
    (b) la décimation du taux d'échantillonnage à la fois de ladite bande basse et de ladite bande haute ;
    (c) le codage de ladite bande basse décimée issue de l'étape (b), incluant un premier procédé de quantification ;
    (d) l'inversion du spectre d'une image de bande de base de ladite bande haute décimée, issue de l'étape (b) ; et
    (e) le codage des résultats de l'étape (d), incluant ledit premier procédé de quantification.
  2. Procédé de décodage de la parole à large bande, comprenant :
    (a) le décodage d'une première partie d'un signal d'entrée en tant que signal de parole à bande basse, incluant l'utilisation d'un premier livre de code ;
    (b) le décodage d'une deuxième partie d'un signal d'entrée en tant que signal de parole à bande haute, incluant l'utilisation dudit premier livre de code ; et
    (c) la combinaison des résultats des étapes (a) et (b) précédentes, pour former un signal de parole à large bande décodé.
  3. Codeur de parole à large bande, comprenant :
    (a) un filtre à bande basse et un filtre à bande haute pour la parole numérique ;
    (b) un premier codeur avec une entrée provenant dudit filtre à bande basse ; ledit premier codeur utilisant un premier quantificateur ;
    (c) un deuxième codeur avec une entrée venant dudit filtre à bande haute, ledit deuxième codeur incluant ledit premier quantificateur ; et
    (d) un combineur pour ledit premier codeur et ledit deuxième codeur, pour fournir une parole à large bande codée.
  4. Décodeur de parole à bande large, comprenant :
    (a) un premier décodeur de parole ayant une entrée pour la parole à bande étroite codée et un livre de code LP ;
    (b) un deuxième décodeur de parole avec une entrée pour une parole à bande haute codée, ledit deuxième décodeur utilisant ledit livre de code LP.
EP04100553A 2000-05-22 2001-05-22 Dispositif et procédé de codage de parole à large bande Expired - Lifetime EP1431962B1 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US20615600P 2000-05-22 2000-05-22
US206156P 2000-05-22
EP01000172A EP1158495B1 (fr) 2000-05-22 2001-05-22 Dispositif et procédé de codage de parole à large bande

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
EP01000172A Division EP1158495B1 (fr) 2000-05-22 2001-05-22 Dispositif et procédé de codage de parole à large bande
EP01000172A Division-Into EP1158495B1 (fr) 2000-05-22 2001-05-22 Dispositif et procédé de codage de parole à large bande

Publications (3)

Publication Number Publication Date
EP1431962A2 EP1431962A2 (fr) 2004-06-23
EP1431962A3 EP1431962A3 (fr) 2004-12-01
EP1431962B1 true EP1431962B1 (fr) 2006-04-05

Family

ID=32395343

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04100553A Expired - Lifetime EP1431962B1 (fr) 2000-05-22 2001-05-22 Dispositif et procédé de codage de parole à large bande

Country Status (1)

Country Link
EP (1) EP1431962B1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996362B2 (en) 2008-01-31 2015-03-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Device and method for a bandwidth extension of an audio signal

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006025313A1 (fr) 2004-08-31 2006-03-09 Matsushita Electric Industrial Co., Ltd. Appareil de codage audio, appareil de décodage audio, appareil de communication et procédé de codage audio
WO2006030864A1 (fr) 2004-09-17 2006-03-23 Matsushita Electric Industrial Co., Ltd. Appareil de codage audio, appareil de decodage audio, appareil de communication et procede de codage audio
EP2273494A3 (fr) 2004-09-17 2012-11-14 Panasonic Corporation Appareil de codage extensible, appareil de decodage extensible
WO2008081777A1 (fr) 2006-12-25 2008-07-10 Kyushu Institute Of Technology Dispositif d'interpolation de signal haute fréquence et procédé d'interpolation de signal haute fréquence

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996362B2 (en) 2008-01-31 2015-03-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Device and method for a bandwidth extension of an audio signal

Also Published As

Publication number Publication date
EP1431962A3 (fr) 2004-12-01
EP1431962A2 (fr) 2004-06-23

Similar Documents

Publication Publication Date Title
US7330814B2 (en) Wideband speech coding with modulated noise highband excitation system and method
US7136810B2 (en) Wideband speech coding system and method
US6795805B1 (en) Periodicity enhancement in decoding wideband signals
US8374853B2 (en) Hierarchical encoding/decoding device
CA2862715C (fr) Codec audio multimode et codage celp adapte a ce codec
US6260009B1 (en) CELP-based to CELP-based vocoder packet translation
US8260620B2 (en) Device for perceptual weighting in audio encoding/decoding
US7184953B2 (en) Transcoding method and system between CELP-based speech codes with externally provided status
CN103384900B (zh) 在预测编码与变换编码之间交替的低延迟声音编码
EP1273005B1 (fr) Codec de parole a large bande utilisant differentes frequences d'echantillonnage
KR20090104846A (ko) 디지털 오디오 신호에 대한 향상된 코딩/디코딩
JP2003514267A (ja) 広帯域音声及びオーディオ信号復号器における利得平滑化
EP0981816A1 (fr) Procedes et systemes de codage audio
EP1222659A1 (fr) Vocodeur harmonique a codage predictif lineaire (lpc) avec structure a supertrame
EP1158495B1 (fr) Dispositif et procédé de codage de parole à large bande
US6847929B2 (en) Algebraic codebook system and method
CN101131820A (zh) 编码设备、解码设备、编码方法和解码方法
TW463143B (en) Low-bit rate speech encoding method
US20040111257A1 (en) Transcoding apparatus and method between CELP-based codecs using bandwidth extension
EP1431962B1 (fr) Dispositif et procédé de codage de parole à large bande
JP3092653B2 (ja) 広帯域音声符号化装置及び音声復号装置並びに音声符号化復号装置
Schnitzler A 13.0 kbit/s wideband speech codec based on SB-ACELP
US6801887B1 (en) Speech coding exploiting the power ratio of different speech signal components
Esteban et al. 9.6/7.2 kbps voice excited predictive coder (VEPC)
Gournay et al. A 1200 bits/s HSX speech coder for very-low-bit-rate communications

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AC Divisional application: reference to earlier application

Ref document number: 1158495

Country of ref document: EP

Kind code of ref document: P

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

RIC1 Information provided on ipc code assigned before grant

Ipc: 7G 10L 21/02 A

Ipc: 7G 10L 19/02 B

17P Request for examination filed

Effective date: 20050601

AKX Designation fees paid

Designated state(s): DE FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AC Divisional application: reference to earlier application

Ref document number: 1158495

Country of ref document: EP

Kind code of ref document: P

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60118627

Country of ref document: DE

Date of ref document: 20060518

Kind code of ref document: P

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20070108

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 16

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 17

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20170426

Year of fee payment: 17

Ref country code: FR

Payment date: 20170418

Year of fee payment: 17

Ref country code: DE

Payment date: 20170531

Year of fee payment: 17

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 60118627

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20180522

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180522

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181201

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180531