EP1431962B1 - Dispositif et procédé de codage de parole à large bande - Google Patents
Dispositif et procédé de codage de parole à large bande Download PDFInfo
- Publication number
- EP1431962B1 EP1431962B1 EP04100553A EP04100553A EP1431962B1 EP 1431962 B1 EP1431962 B1 EP 1431962B1 EP 04100553 A EP04100553 A EP 04100553A EP 04100553 A EP04100553 A EP 04100553A EP 1431962 B1 EP1431962 B1 EP 1431962B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- highband
- lowband
- khz
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
Definitions
- the present invention relates to electronic devices, and, more particularly, to speech coding, transmission, storage, and decoding/synthesis methods and systems.
- the performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications.
- Both dedicated channel and packetized-over-network (VoIP) transmission benefit from compression of speech signals.
- the widely-used linear prediction (LP) digital speech coding compression method models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech.
- M the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network (PSTN) sampling for digital transmission); and the number of samples ⁇ s(n) ⁇ in a frame is often 80 or 160 (10 or 20 ms frames).
- Various windowing operations may be applied to the samples of the input speech frame.
- ⁇ r(n) 2 yields the ⁇ a(j) ⁇ which furnish the best linear prediction.
- the coefficients ⁇ a(j) ⁇ may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage.
- the ⁇ r (n) ⁇ form the LP residual for the frame, and ideally LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1).
- the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an LP excitation from the encoded parameters.
- the LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and the (quantized) gain.
- a receiver regenerates the speech with the same perceptual characteristics as the input speech.
- Figure 9 shows the blocks in an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second).
- the ITU standard G.729 Annex E with a bit rate of 11.8 kb/s uses LP analysis with codebook excitation (CELP) to compress voiceband speech and has performance comparable to the 64 kb/s PCM used for PSTN digital transmission.
- CELP codebook excitation
- Another approach uses split-band CELP or MPLPC by coding a 4-8 kHz highband separately from the 0-4 kHz lowband and with fewer bits allocated to the highband; see Drogo de Jacovo et al, Some Experiments of 7 kHz Audio Coding at 16 kbit/s, IEEE ICASSP 1989, pp.192-195. Similarly, Tucker, Low Bit-Rate Frequency Extension Coding, IEE Colloquium on Audio and Music Technology 1998, pp.3/1-3/5, provides standard coding of the lowband 0-4 kHz plus codes the 4-8 kHz highband speech only for unvoiced frames (as determined in the lowband) and uses an LP filter of order 2-4 with noise excitation.
- the present invention provides a method of wideband speech coding, comprising: (a) partitioning a frame of digital speech into a lowband and a highband; (b) decimating the sampling rate of both said lowband and said highband; (c) encoding said decimated lowband from step (b) including a first method of quantization; (d) reversing the spectrum of a baseband image of said decimated highband from step (b); and (e) encoding the results of step (d) including said first method of quantization.
- a wideband speech decoder comprising: (a) a first speech decoder with an input for encoded narrowband speech and an LP codebook; (b) a second speech decoder with an input for encoded highband speech, said second decoder using said LP codebook.
- the preferred embodiment systems include preferred embodiment encoders and decoders that process a wideband speech frame as the sum of a lowband signal and a highband signal in which the lowband signal has standalone speech encoding/decoding and the highband signal has encoding/decoding incorporating information from the lowband signal to modulate a noise excitation. This allows for a minimal number of bits to sufficiently encode the highband and yields an embedded coder.
- Figure 1a shows in functional block format a first preferred embodiment system for wideband speech encoding, transmission (storage), and decoding including first preferred embodiment encoders and decoders.
- the encoders and decoders use CELP lowband encoding and decoding plus a highband encoding and decoding incorporating information from the (decoded) lowband for modulation of a noise excitation with LP coding.
- first preferred embodiment encoders proceed as follows.
- the baseband of the decimated highband has a reversed spectrum because the baseband is an aliased image; see Figure 3b.
- encode the first baseband (decimated lowband) signal with a (standard) narrowband speech coder.
- Decoding reverses the encoding process by separating the highband and lowband code, using information from the decoded lowband to help decode the highband, and adding the decoded highband to the decoded lowband speech to synthesize wideband speech. See Figure 1c.
- This split-band approach allows most of the code bits to be allocated to the lowband; for example, the lowband may consume 11.8 kb/s and the highband may add 2.2 kb/s for a total of 14 kb/s.
- Figures 2a-2b illustrate the typical magnitudes of voiced and unvoiced speech, respectively, as functions of frequency over the range 0-8 kHz.
- the bulk of the energy in voiced speech resides in the 0-3 kHz band.
- the pitch structure (the fundamental frequency is about 125 Hz in Figure 2a) clearly appears in the range 0-3.5 kHz and persists (although jumbled) at higher frequencies.
- the perceptual critical bandwidth at higher frequencies is roughly 10% of a band center frequency, so the individual pitch harmonics become indistinguishable and should require fewer bits for inclusion in a highband code.
- the higher band (above 4 kHz) should require fewer bits to encode than the lower band (0-4 kHz).
- This underlies the preferred embodiment methods of partitioning wideband (0-8 kHz) speech into a lowband (0-4 kHz) and a highband (4-8 kHz), recognizing that the lowband may be encoded by any convenient narrowband coder, and separately coding the highband with a relatively small number of bits as described in the following sections.
- Figure 1b illustrates the flow of a first preferred embodiment speech coder which encodes at 14 kb/s with the following steps.
- a first preferred embodiment decoding method essentially reverses the encoding steps for a bitstream encoded by the first preferred embodiment method.
- a coded frame in the bitstream For a coded frame in the bitstream:
- FIGS 8-9 show in functional block form preferred embodiment systems that use the preferred embodiment encoding and decoding.
- the encoding and decoding can be performed with digital signal processors (DSPs) or general purpose programmable processors or application specific circuitry or systems on a chip such as both a DSP and RISC processor on the same chip with the RISC processor controlling.
- Codebooks would be stored in memory at both the encoder and decoder, and a stored program in an onboard ROM or external flash EEPROM for a DSP or programmable processor could perform the signal processing.
- Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, and modulators and demodulators (plus antennas for air interfaces) provide coupling for transmission waveforms.
- the encoded speech can be packetized and transmitted over networks such as the Internet.
- the preferred embodiments may be modified in various ways while retaining the features of separately coding a lowband from a wideband signal and using information from the lowband to help encode the highband (remainder of the wideband) and/or using spectrum reversal for decimated highband LP coefficient quantization in order to obtain efficiency comparable to that for the lowband LP coefficient quantization.
- the upper (2.8-3.8 kHz) portion of the lowband (0-4 kHz) could be replaced by some other portion(s) of the lowband for use as a modulation for the highband excitation.
- the wideband may be partitioned into a lowband plus two or more highbands; the lowband coder could be a parametric or even non-LP coder and a highband coder could be a waveform coder; and so forth.
- the scope of the invention is hereby only limited by the appended claims.
Claims (4)
- Procédé de codage de la parole à large bande, comprenant :(a) la segmentation d'une séquence de parole numérique en une bande basse et une bande haute ;(b) la décimation du taux d'échantillonnage à la fois de ladite bande basse et de ladite bande haute ;(c) le codage de ladite bande basse décimée issue de l'étape (b), incluant un premier procédé de quantification ;(d) l'inversion du spectre d'une image de bande de base de ladite bande haute décimée, issue de l'étape (b) ; et(e) le codage des résultats de l'étape (d), incluant ledit premier procédé de quantification.
- Procédé de décodage de la parole à large bande, comprenant :(a) le décodage d'une première partie d'un signal d'entrée en tant que signal de parole à bande basse, incluant l'utilisation d'un premier livre de code ;(b) le décodage d'une deuxième partie d'un signal d'entrée en tant que signal de parole à bande haute, incluant l'utilisation dudit premier livre de code ; et(c) la combinaison des résultats des étapes (a) et (b) précédentes, pour former un signal de parole à large bande décodé.
- Codeur de parole à large bande, comprenant :(a) un filtre à bande basse et un filtre à bande haute pour la parole numérique ;(b) un premier codeur avec une entrée provenant dudit filtre à bande basse ; ledit premier codeur utilisant un premier quantificateur ;(c) un deuxième codeur avec une entrée venant dudit filtre à bande haute, ledit deuxième codeur incluant ledit premier quantificateur ; et(d) un combineur pour ledit premier codeur et ledit deuxième codeur, pour fournir une parole à large bande codée.
- Décodeur de parole à bande large, comprenant :(a) un premier décodeur de parole ayant une entrée pour la parole à bande étroite codée et un livre de code LP ;(b) un deuxième décodeur de parole avec une entrée pour une parole à bande haute codée, ledit deuxième décodeur utilisant ledit livre de code LP.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US20615600P | 2000-05-22 | 2000-05-22 | |
US206156P | 2000-05-22 | ||
EP01000172A EP1158495B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01000172A Division EP1158495B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
EP01000172A Division-Into EP1158495B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
Publications (3)
Publication Number | Publication Date |
---|---|
EP1431962A2 EP1431962A2 (fr) | 2004-06-23 |
EP1431962A3 EP1431962A3 (fr) | 2004-12-01 |
EP1431962B1 true EP1431962B1 (fr) | 2006-04-05 |
Family
ID=32395343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP04100553A Expired - Lifetime EP1431962B1 (fr) | 2000-05-22 | 2001-05-22 | Dispositif et procédé de codage de parole à large bande |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP1431962B1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8996362B2 (en) | 2008-01-31 | 2015-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Device and method for a bandwidth extension of an audio signal |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006025313A1 (fr) | 2004-08-31 | 2006-03-09 | Matsushita Electric Industrial Co., Ltd. | Appareil de codage audio, appareil de décodage audio, appareil de communication et procédé de codage audio |
WO2006030864A1 (fr) | 2004-09-17 | 2006-03-23 | Matsushita Electric Industrial Co., Ltd. | Appareil de codage audio, appareil de decodage audio, appareil de communication et procede de codage audio |
EP2273494A3 (fr) | 2004-09-17 | 2012-11-14 | Panasonic Corporation | Appareil de codage extensible, appareil de decodage extensible |
WO2008081777A1 (fr) | 2006-12-25 | 2008-07-10 | Kyushu Institute Of Technology | Dispositif d'interpolation de signal haute fréquence et procédé d'interpolation de signal haute fréquence |
-
2001
- 2001-05-22 EP EP04100553A patent/EP1431962B1/fr not_active Expired - Lifetime
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8996362B2 (en) | 2008-01-31 | 2015-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Device and method for a bandwidth extension of an audio signal |
Also Published As
Publication number | Publication date |
---|---|
EP1431962A3 (fr) | 2004-12-01 |
EP1431962A2 (fr) | 2004-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7330814B2 (en) | Wideband speech coding with modulated noise highband excitation system and method | |
US7136810B2 (en) | Wideband speech coding system and method | |
US6795805B1 (en) | Periodicity enhancement in decoding wideband signals | |
US8374853B2 (en) | Hierarchical encoding/decoding device | |
CA2862715C (fr) | Codec audio multimode et codage celp adapte a ce codec | |
US6260009B1 (en) | CELP-based to CELP-based vocoder packet translation | |
US8260620B2 (en) | Device for perceptual weighting in audio encoding/decoding | |
US7184953B2 (en) | Transcoding method and system between CELP-based speech codes with externally provided status | |
CN103384900B (zh) | 在预测编码与变换编码之间交替的低延迟声音编码 | |
EP1273005B1 (fr) | Codec de parole a large bande utilisant differentes frequences d'echantillonnage | |
KR20090104846A (ko) | 디지털 오디오 신호에 대한 향상된 코딩/디코딩 | |
JP2003514267A (ja) | 広帯域音声及びオーディオ信号復号器における利得平滑化 | |
EP0981816A1 (fr) | Procedes et systemes de codage audio | |
EP1222659A1 (fr) | Vocodeur harmonique a codage predictif lineaire (lpc) avec structure a supertrame | |
EP1158495B1 (fr) | Dispositif et procédé de codage de parole à large bande | |
US6847929B2 (en) | Algebraic codebook system and method | |
CN101131820A (zh) | 编码设备、解码设备、编码方法和解码方法 | |
TW463143B (en) | Low-bit rate speech encoding method | |
US20040111257A1 (en) | Transcoding apparatus and method between CELP-based codecs using bandwidth extension | |
EP1431962B1 (fr) | Dispositif et procédé de codage de parole à large bande | |
JP3092653B2 (ja) | 広帯域音声符号化装置及び音声復号装置並びに音声符号化復号装置 | |
Schnitzler | A 13.0 kbit/s wideband speech codec based on SB-ACELP | |
US6801887B1 (en) | Speech coding exploiting the power ratio of different speech signal components | |
Esteban et al. | 9.6/7.2 kbps voice excited predictive coder (VEPC) | |
Gournay et al. | A 1200 bits/s HSX speech coder for very-low-bit-rate communications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AC | Divisional application: reference to earlier application |
Ref document number: 1158495 Country of ref document: EP Kind code of ref document: P |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: 7G 10L 21/02 A Ipc: 7G 10L 19/02 B |
|
17P | Request for examination filed |
Effective date: 20050601 |
|
AKX | Designation fees paid |
Designated state(s): DE FR GB |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AC | Divisional application: reference to earlier application |
Ref document number: 1158495 Country of ref document: EP Kind code of ref document: P |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 60118627 Country of ref document: DE Date of ref document: 20060518 Kind code of ref document: P |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20070108 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 16 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 17 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20170426 Year of fee payment: 17 Ref country code: FR Payment date: 20170418 Year of fee payment: 17 Ref country code: DE Payment date: 20170531 Year of fee payment: 17 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60118627 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20180522 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180522 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181201 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180531 |