EP1431962B1 - Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen - Google Patents

Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen Download PDF

Info

Publication number
EP1431962B1
EP1431962B1 EP04100553A EP04100553A EP1431962B1 EP 1431962 B1 EP1431962 B1 EP 1431962B1 EP 04100553 A EP04100553 A EP 04100553A EP 04100553 A EP04100553 A EP 04100553A EP 1431962 B1 EP1431962 B1 EP 1431962B1
Authority
EP
European Patent Office
Prior art keywords
speech
highband
lowband
khz
encoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP04100553A
Other languages
English (en)
French (fr)
Other versions
EP1431962A2 (de
EP1431962A3 (de
Inventor
Alan V Mccree
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Texas Instruments Inc
Original Assignee
Texas Instruments Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Texas Instruments Inc filed Critical Texas Instruments Inc
Priority claimed from EP01000172A external-priority patent/EP1158495B1/de
Publication of EP1431962A2 publication Critical patent/EP1431962A2/de
Publication of EP1431962A3 publication Critical patent/EP1431962A3/de
Application granted granted Critical
Publication of EP1431962B1 publication Critical patent/EP1431962B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • the present invention relates to electronic devices, and, more particularly, to speech coding, transmission, storage, and decoding/synthesis methods and systems.
  • the performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications.
  • Both dedicated channel and packetized-over-network (VoIP) transmission benefit from compression of speech signals.
  • the widely-used linear prediction (LP) digital speech coding compression method models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech.
  • M the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network (PSTN) sampling for digital transmission); and the number of samples ⁇ s(n) ⁇ in a frame is often 80 or 160 (10 or 20 ms frames).
  • Various windowing operations may be applied to the samples of the input speech frame.
  • ⁇ r(n) 2 yields the ⁇ a(j) ⁇ which furnish the best linear prediction.
  • the coefficients ⁇ a(j) ⁇ may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage.
  • the ⁇ r (n) ⁇ form the LP residual for the frame, and ideally LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1).
  • the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an LP excitation from the encoded parameters.
  • the LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and the (quantized) gain.
  • a receiver regenerates the speech with the same perceptual characteristics as the input speech.
  • Figure 9 shows the blocks in an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second).
  • the ITU standard G.729 Annex E with a bit rate of 11.8 kb/s uses LP analysis with codebook excitation (CELP) to compress voiceband speech and has performance comparable to the 64 kb/s PCM used for PSTN digital transmission.
  • CELP codebook excitation
  • Another approach uses split-band CELP or MPLPC by coding a 4-8 kHz highband separately from the 0-4 kHz lowband and with fewer bits allocated to the highband; see Drogo de Jacovo et al, Some Experiments of 7 kHz Audio Coding at 16 kbit/s, IEEE ICASSP 1989, pp.192-195. Similarly, Tucker, Low Bit-Rate Frequency Extension Coding, IEE Colloquium on Audio and Music Technology 1998, pp.3/1-3/5, provides standard coding of the lowband 0-4 kHz plus codes the 4-8 kHz highband speech only for unvoiced frames (as determined in the lowband) and uses an LP filter of order 2-4 with noise excitation.
  • the present invention provides a method of wideband speech coding, comprising: (a) partitioning a frame of digital speech into a lowband and a highband; (b) decimating the sampling rate of both said lowband and said highband; (c) encoding said decimated lowband from step (b) including a first method of quantization; (d) reversing the spectrum of a baseband image of said decimated highband from step (b); and (e) encoding the results of step (d) including said first method of quantization.
  • a wideband speech decoder comprising: (a) a first speech decoder with an input for encoded narrowband speech and an LP codebook; (b) a second speech decoder with an input for encoded highband speech, said second decoder using said LP codebook.
  • the preferred embodiment systems include preferred embodiment encoders and decoders that process a wideband speech frame as the sum of a lowband signal and a highband signal in which the lowband signal has standalone speech encoding/decoding and the highband signal has encoding/decoding incorporating information from the lowband signal to modulate a noise excitation. This allows for a minimal number of bits to sufficiently encode the highband and yields an embedded coder.
  • Figure 1a shows in functional block format a first preferred embodiment system for wideband speech encoding, transmission (storage), and decoding including first preferred embodiment encoders and decoders.
  • the encoders and decoders use CELP lowband encoding and decoding plus a highband encoding and decoding incorporating information from the (decoded) lowband for modulation of a noise excitation with LP coding.
  • first preferred embodiment encoders proceed as follows.
  • the baseband of the decimated highband has a reversed spectrum because the baseband is an aliased image; see Figure 3b.
  • encode the first baseband (decimated lowband) signal with a (standard) narrowband speech coder.
  • Decoding reverses the encoding process by separating the highband and lowband code, using information from the decoded lowband to help decode the highband, and adding the decoded highband to the decoded lowband speech to synthesize wideband speech. See Figure 1c.
  • This split-band approach allows most of the code bits to be allocated to the lowband; for example, the lowband may consume 11.8 kb/s and the highband may add 2.2 kb/s for a total of 14 kb/s.
  • Figures 2a-2b illustrate the typical magnitudes of voiced and unvoiced speech, respectively, as functions of frequency over the range 0-8 kHz.
  • the bulk of the energy in voiced speech resides in the 0-3 kHz band.
  • the pitch structure (the fundamental frequency is about 125 Hz in Figure 2a) clearly appears in the range 0-3.5 kHz and persists (although jumbled) at higher frequencies.
  • the perceptual critical bandwidth at higher frequencies is roughly 10% of a band center frequency, so the individual pitch harmonics become indistinguishable and should require fewer bits for inclusion in a highband code.
  • the higher band (above 4 kHz) should require fewer bits to encode than the lower band (0-4 kHz).
  • This underlies the preferred embodiment methods of partitioning wideband (0-8 kHz) speech into a lowband (0-4 kHz) and a highband (4-8 kHz), recognizing that the lowband may be encoded by any convenient narrowband coder, and separately coding the highband with a relatively small number of bits as described in the following sections.
  • Figure 1b illustrates the flow of a first preferred embodiment speech coder which encodes at 14 kb/s with the following steps.
  • a first preferred embodiment decoding method essentially reverses the encoding steps for a bitstream encoded by the first preferred embodiment method.
  • a coded frame in the bitstream For a coded frame in the bitstream:
  • FIGS 8-9 show in functional block form preferred embodiment systems that use the preferred embodiment encoding and decoding.
  • the encoding and decoding can be performed with digital signal processors (DSPs) or general purpose programmable processors or application specific circuitry or systems on a chip such as both a DSP and RISC processor on the same chip with the RISC processor controlling.
  • Codebooks would be stored in memory at both the encoder and decoder, and a stored program in an onboard ROM or external flash EEPROM for a DSP or programmable processor could perform the signal processing.
  • Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, and modulators and demodulators (plus antennas for air interfaces) provide coupling for transmission waveforms.
  • the encoded speech can be packetized and transmitted over networks such as the Internet.
  • the preferred embodiments may be modified in various ways while retaining the features of separately coding a lowband from a wideband signal and using information from the lowband to help encode the highband (remainder of the wideband) and/or using spectrum reversal for decimated highband LP coefficient quantization in order to obtain efficiency comparable to that for the lowband LP coefficient quantization.
  • the upper (2.8-3.8 kHz) portion of the lowband (0-4 kHz) could be replaced by some other portion(s) of the lowband for use as a modulation for the highband excitation.
  • the wideband may be partitioned into a lowband plus two or more highbands; the lowband coder could be a parametric or even non-LP coder and a highband coder could be a waveform coder; and so forth.
  • the scope of the invention is hereby only limited by the appended claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Claims (4)

  1. Verfahren zum Codieren von Breitbandsprache, das umfasst:
    (a) Partitionieren eines Rahmens digitaler Sprache in ein Tiefband und ein Hochband;
    (b) Verringem der Abtastrate sowohl des Tiefbandes als auch des Hochbandes;
    (c) Codieren des verringerten Tiefbandes des Schrittes (b) unter Einschluss eines ersten Quantisierungsverfahrens;
    (d) Umkehren des Spektrums eines Grundbandbildes des verringerten Hochbandes des Schrittes (b); und
    (e) Codieren der Ergebnisse des Schrittes (d) unter Einschluss des ersten Quantisierungsverfahrens.
  2. Verfahren zum Decodieren von Breitbandsprache, das umfasst:
    (a) Decodieren eines ersten Abschnitts eines Eingangssignals als ein Sprachsignal des Tiefbandes einschließlich der Verwendung eines ersten Codebuchs;
    (b) Decodieren eines zweiten Abschnitts eines Eingangssignals als ein Sprachsignal des Hochbandes einschließlich der Verwendung des ersten Codebuchs; und
    (c) Kombinieren der Ergebnisse der vorangehenden Schritte (a) und (b), um ein decodiertes Breitbandsprachsignal zu bilden.
  3. Codierer für Breitbandsprache, mit:
    (a) einem Tiefbandfilter und einem Hochbandfilter für digitale Sprache;
    (b) einem ersten Codierer mit einem Eingang von dem Tiefbandfilter, wobei der erste Codierer einen ersten Quantisierer verwendet;
    (c) einem zweiten Codierer mit einem Eingang von dem Hochbandfilter, wobei der zweite Codierer den ersten Quantisierer verwendet; und
    (d) einem Kombinierer für den ersten Codierer und den zweiten Codierer, um codierte Breitbandsprache auszugeben.
  4. Decodierer für Breitbandsprache, mit:
    (a) einem ersten Sprachdecodierer mit einem Eingang für codierte Schmalbandsprache und einem LP-Codebuch;
    (b) einem zweiten Sprachdecodierer mit einem Eingang für codierte Hochbandsprache, wobei der zweite Decodierer das LP-Codebuch verwendet.
EP04100553A 2000-05-22 2001-05-22 Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen Expired - Lifetime EP1431962B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US20615600P 2000-05-22 2000-05-22
US206156P 2000-05-22
EP01000172A EP1158495B1 (de) 2000-05-22 2001-05-22 Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen

Related Parent Applications (2)

Application Number Title Priority Date Filing Date
EP01000172A Division EP1158495B1 (de) 2000-05-22 2001-05-22 Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen
EP01000172A Division-Into EP1158495B1 (de) 2000-05-22 2001-05-22 Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen

Publications (3)

Publication Number Publication Date
EP1431962A2 EP1431962A2 (de) 2004-06-23
EP1431962A3 EP1431962A3 (de) 2004-12-01
EP1431962B1 true EP1431962B1 (de) 2006-04-05

Family

ID=32395343

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04100553A Expired - Lifetime EP1431962B1 (de) 2000-05-22 2001-05-22 Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen

Country Status (1)

Country Link
EP (1) EP1431962B1 (de)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996362B2 (en) 2008-01-31 2015-03-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Device and method for a bandwidth extension of an audio signal

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101006495A (zh) 2004-08-31 2007-07-25 松下电器产业株式会社 语音编码装置、语音解码装置、通信装置以及语音编码方法
WO2006030864A1 (ja) 2004-09-17 2006-03-23 Matsushita Electric Industrial Co., Ltd. 音声符号化装置、音声復号装置、通信装置及び音声符号化方法
JP4963963B2 (ja) 2004-09-17 2012-06-27 パナソニック株式会社 スケーラブル符号化装置、スケーラブル復号装置、スケーラブル符号化方法およびスケーラブル復号方法
WO2008081777A1 (ja) 2006-12-25 2008-07-10 Kyushu Institute Of Technology 高域信号補間装置及び高域信号補間方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996362B2 (en) 2008-01-31 2015-03-31 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Device and method for a bandwidth extension of an audio signal

Also Published As

Publication number Publication date
EP1431962A2 (de) 2004-06-23
EP1431962A3 (de) 2004-12-01

Similar Documents

Publication Publication Date Title
US7330814B2 (en) Wideband speech coding with modulated noise highband excitation system and method
US7136810B2 (en) Wideband speech coding system and method
US6795805B1 (en) Periodicity enhancement in decoding wideband signals
JP4662673B2 (ja) 広帯域音声及びオーディオ信号復号器における利得平滑化
US8374853B2 (en) Hierarchical encoding/decoding device
CA2862715C (en) Multi-mode audio codec and celp coding adapted therefore
US6260009B1 (en) CELP-based to CELP-based vocoder packet translation
US8260620B2 (en) Device for perceptual weighting in audio encoding/decoding
CN103384900B (zh) 在预测编码与变换编码之间交替的低延迟声音编码
EP1273005B1 (de) Breitband-sprach-codec mit verschiedenen abtastraten
US20050027517A1 (en) Transcoding method and system between celp-based speech codes
KR20090104846A (ko) 디지털 오디오 신호에 대한 향상된 코딩/디코딩
EP0981816A1 (de) Audio-kodier-systeme und verfahren
EP1158495B1 (de) Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen
EP1222659A1 (de) Lpc-harmonischer sprachkodierer mit überrahmenformat
US6847929B2 (en) Algebraic codebook system and method
CN101131820A (zh) 编码设备、解码设备、编码方法和解码方法
US6687667B1 (en) Method for quantizing speech coder parameters
US20040111257A1 (en) Transcoding apparatus and method between CELP-based codecs using bandwidth extension
EP1431962B1 (de) Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen
JP3092653B2 (ja) 広帯域音声符号化装置及び音声復号装置並びに音声符号化復号装置
Schnitzler A 13.0 kbit/s wideband speech codec based on SB-ACELP
US6801887B1 (en) Speech coding exploiting the power ratio of different speech signal components
Esteban et al. 9.6/7.2 kbps voice excited predictive coder (VEPC)
Vass et al. Adaptive forward-backward quantizer for low bit rate high-quality speech coding

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AC Divisional application: reference to earlier application

Ref document number: 1158495

Country of ref document: EP

Kind code of ref document: P

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR

RIC1 Information provided on ipc code assigned before grant

Ipc: 7G 10L 21/02 A

Ipc: 7G 10L 19/02 B

17P Request for examination filed

Effective date: 20050601

AKX Designation fees paid

Designated state(s): DE FR GB

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AC Divisional application: reference to earlier application

Ref document number: 1158495

Country of ref document: EP

Kind code of ref document: P

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60118627

Country of ref document: DE

Date of ref document: 20060518

Kind code of ref document: P

ET Fr: translation filed
PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20070108

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 16

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 17

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20170426

Year of fee payment: 17

Ref country code: FR

Payment date: 20170418

Year of fee payment: 17

Ref country code: DE

Payment date: 20170531

Year of fee payment: 17

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 60118627

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20180522

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180522

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20181201

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20180531