EP1431962B1 - Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen - Google Patents
Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen Download PDFInfo
- Publication number
- EP1431962B1 EP1431962B1 EP04100553A EP04100553A EP1431962B1 EP 1431962 B1 EP1431962 B1 EP 1431962B1 EP 04100553 A EP04100553 A EP 04100553A EP 04100553 A EP04100553 A EP 04100553A EP 1431962 B1 EP1431962 B1 EP 1431962B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- speech
- highband
- lowband
- khz
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims description 16
- 238000005070 sampling Methods 0.000 claims description 18
- 238000013139 quantization Methods 0.000 claims description 13
- 238000001228 spectrum Methods 0.000 claims description 9
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 230000005284 excitation Effects 0.000 description 30
- 238000013459 approach Methods 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 8
- 238000001914 filtration Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000010168 coupling process Methods 0.000 description 2
- 238000005859 coupling reaction Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 229940034880 tencon Drugs 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
Definitions
- the present invention relates to electronic devices, and, more particularly, to speech coding, transmission, storage, and decoding/synthesis methods and systems.
- the performance of digital speech systems using low bit rates has become increasingly important with current and foreseeable digital communications.
- Both dedicated channel and packetized-over-network (VoIP) transmission benefit from compression of speech signals.
- the widely-used linear prediction (LP) digital speech coding compression method models the vocal tract as a time-varying filter and a time-varying excitation of the filter to mimic human speech.
- M the order of the linear prediction filter, is taken to be about 10-12; the sampling rate to form the samples s(n) is typically taken to be 8 kHz (the same as the public switched telephone network (PSTN) sampling for digital transmission); and the number of samples ⁇ s(n) ⁇ in a frame is often 80 or 160 (10 or 20 ms frames).
- Various windowing operations may be applied to the samples of the input speech frame.
- ⁇ r(n) 2 yields the ⁇ a(j) ⁇ which furnish the best linear prediction.
- the coefficients ⁇ a(j) ⁇ may be converted to line spectral frequencies (LSFs) for quantization and transmission or storage.
- the ⁇ r (n) ⁇ form the LP residual for the frame, and ideally LP residual would be the excitation for the synthesis filter 1/A(z) where A(z) is the transfer function of equation (1).
- the LP residual is not available at the decoder; thus the task of the encoder is to represent the LP residual so that the decoder can generate an LP excitation from the encoded parameters.
- the LP compression approach basically only transmits/stores updates for the (quantized) filter coefficients, the (quantized) residual (waveform or parameters such as pitch), and the (quantized) gain.
- a receiver regenerates the speech with the same perceptual characteristics as the input speech.
- Figure 9 shows the blocks in an LP system. Periodic updating of the quantized items requires fewer bits than direct representation of the speech signal, so a reasonable LP coder can operate at bits rates as low as 2-3 kb/s (kilobits per second).
- the ITU standard G.729 Annex E with a bit rate of 11.8 kb/s uses LP analysis with codebook excitation (CELP) to compress voiceband speech and has performance comparable to the 64 kb/s PCM used for PSTN digital transmission.
- CELP codebook excitation
- Another approach uses split-band CELP or MPLPC by coding a 4-8 kHz highband separately from the 0-4 kHz lowband and with fewer bits allocated to the highband; see Drogo de Jacovo et al, Some Experiments of 7 kHz Audio Coding at 16 kbit/s, IEEE ICASSP 1989, pp.192-195. Similarly, Tucker, Low Bit-Rate Frequency Extension Coding, IEE Colloquium on Audio and Music Technology 1998, pp.3/1-3/5, provides standard coding of the lowband 0-4 kHz plus codes the 4-8 kHz highband speech only for unvoiced frames (as determined in the lowband) and uses an LP filter of order 2-4 with noise excitation.
- the present invention provides a method of wideband speech coding, comprising: (a) partitioning a frame of digital speech into a lowband and a highband; (b) decimating the sampling rate of both said lowband and said highband; (c) encoding said decimated lowband from step (b) including a first method of quantization; (d) reversing the spectrum of a baseband image of said decimated highband from step (b); and (e) encoding the results of step (d) including said first method of quantization.
- a wideband speech decoder comprising: (a) a first speech decoder with an input for encoded narrowband speech and an LP codebook; (b) a second speech decoder with an input for encoded highband speech, said second decoder using said LP codebook.
- the preferred embodiment systems include preferred embodiment encoders and decoders that process a wideband speech frame as the sum of a lowband signal and a highband signal in which the lowband signal has standalone speech encoding/decoding and the highband signal has encoding/decoding incorporating information from the lowband signal to modulate a noise excitation. This allows for a minimal number of bits to sufficiently encode the highband and yields an embedded coder.
- Figure 1a shows in functional block format a first preferred embodiment system for wideband speech encoding, transmission (storage), and decoding including first preferred embodiment encoders and decoders.
- the encoders and decoders use CELP lowband encoding and decoding plus a highband encoding and decoding incorporating information from the (decoded) lowband for modulation of a noise excitation with LP coding.
- first preferred embodiment encoders proceed as follows.
- the baseband of the decimated highband has a reversed spectrum because the baseband is an aliased image; see Figure 3b.
- encode the first baseband (decimated lowband) signal with a (standard) narrowband speech coder.
- Decoding reverses the encoding process by separating the highband and lowband code, using information from the decoded lowband to help decode the highband, and adding the decoded highband to the decoded lowband speech to synthesize wideband speech. See Figure 1c.
- This split-band approach allows most of the code bits to be allocated to the lowband; for example, the lowband may consume 11.8 kb/s and the highband may add 2.2 kb/s for a total of 14 kb/s.
- Figures 2a-2b illustrate the typical magnitudes of voiced and unvoiced speech, respectively, as functions of frequency over the range 0-8 kHz.
- the bulk of the energy in voiced speech resides in the 0-3 kHz band.
- the pitch structure (the fundamental frequency is about 125 Hz in Figure 2a) clearly appears in the range 0-3.5 kHz and persists (although jumbled) at higher frequencies.
- the perceptual critical bandwidth at higher frequencies is roughly 10% of a band center frequency, so the individual pitch harmonics become indistinguishable and should require fewer bits for inclusion in a highband code.
- the higher band (above 4 kHz) should require fewer bits to encode than the lower band (0-4 kHz).
- This underlies the preferred embodiment methods of partitioning wideband (0-8 kHz) speech into a lowband (0-4 kHz) and a highband (4-8 kHz), recognizing that the lowband may be encoded by any convenient narrowband coder, and separately coding the highband with a relatively small number of bits as described in the following sections.
- Figure 1b illustrates the flow of a first preferred embodiment speech coder which encodes at 14 kb/s with the following steps.
- a first preferred embodiment decoding method essentially reverses the encoding steps for a bitstream encoded by the first preferred embodiment method.
- a coded frame in the bitstream For a coded frame in the bitstream:
- FIGS 8-9 show in functional block form preferred embodiment systems that use the preferred embodiment encoding and decoding.
- the encoding and decoding can be performed with digital signal processors (DSPs) or general purpose programmable processors or application specific circuitry or systems on a chip such as both a DSP and RISC processor on the same chip with the RISC processor controlling.
- Codebooks would be stored in memory at both the encoder and decoder, and a stored program in an onboard ROM or external flash EEPROM for a DSP or programmable processor could perform the signal processing.
- Analog-to-digital converters and digital-to-analog converters provide coupling to the real world, and modulators and demodulators (plus antennas for air interfaces) provide coupling for transmission waveforms.
- the encoded speech can be packetized and transmitted over networks such as the Internet.
- the preferred embodiments may be modified in various ways while retaining the features of separately coding a lowband from a wideband signal and using information from the lowband to help encode the highband (remainder of the wideband) and/or using spectrum reversal for decimated highband LP coefficient quantization in order to obtain efficiency comparable to that for the lowband LP coefficient quantization.
- the upper (2.8-3.8 kHz) portion of the lowband (0-4 kHz) could be replaced by some other portion(s) of the lowband for use as a modulation for the highband excitation.
- the wideband may be partitioned into a lowband plus two or more highbands; the lowband coder could be a parametric or even non-LP coder and a highband coder could be a waveform coder; and so forth.
- the scope of the invention is hereby only limited by the appended claims.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Claims (4)
- Verfahren zum Codieren von Breitbandsprache, das umfasst:(a) Partitionieren eines Rahmens digitaler Sprache in ein Tiefband und ein Hochband;(b) Verringem der Abtastrate sowohl des Tiefbandes als auch des Hochbandes;(c) Codieren des verringerten Tiefbandes des Schrittes (b) unter Einschluss eines ersten Quantisierungsverfahrens;(d) Umkehren des Spektrums eines Grundbandbildes des verringerten Hochbandes des Schrittes (b); und(e) Codieren der Ergebnisse des Schrittes (d) unter Einschluss des ersten Quantisierungsverfahrens.
- Verfahren zum Decodieren von Breitbandsprache, das umfasst:(a) Decodieren eines ersten Abschnitts eines Eingangssignals als ein Sprachsignal des Tiefbandes einschließlich der Verwendung eines ersten Codebuchs;(b) Decodieren eines zweiten Abschnitts eines Eingangssignals als ein Sprachsignal des Hochbandes einschließlich der Verwendung des ersten Codebuchs; und(c) Kombinieren der Ergebnisse der vorangehenden Schritte (a) und (b), um ein decodiertes Breitbandsprachsignal zu bilden.
- Codierer für Breitbandsprache, mit:(a) einem Tiefbandfilter und einem Hochbandfilter für digitale Sprache;(b) einem ersten Codierer mit einem Eingang von dem Tiefbandfilter, wobei der erste Codierer einen ersten Quantisierer verwendet;(c) einem zweiten Codierer mit einem Eingang von dem Hochbandfilter, wobei der zweite Codierer den ersten Quantisierer verwendet; und(d) einem Kombinierer für den ersten Codierer und den zweiten Codierer, um codierte Breitbandsprache auszugeben.
- Decodierer für Breitbandsprache, mit:(a) einem ersten Sprachdecodierer mit einem Eingang für codierte Schmalbandsprache und einem LP-Codebuch;(b) einem zweiten Sprachdecodierer mit einem Eingang für codierte Hochbandsprache, wobei der zweite Decodierer das LP-Codebuch verwendet.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US20615600P | 2000-05-22 | 2000-05-22 | |
US206156P | 2000-05-22 | ||
EP01000172A EP1158495B1 (de) | 2000-05-22 | 2001-05-22 | Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen |
Related Parent Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP01000172A Division EP1158495B1 (de) | 2000-05-22 | 2001-05-22 | Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen |
EP01000172A Division-Into EP1158495B1 (de) | 2000-05-22 | 2001-05-22 | Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen |
Publications (3)
Publication Number | Publication Date |
---|---|
EP1431962A2 EP1431962A2 (de) | 2004-06-23 |
EP1431962A3 EP1431962A3 (de) | 2004-12-01 |
EP1431962B1 true EP1431962B1 (de) | 2006-04-05 |
Family
ID=32395343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP04100553A Expired - Lifetime EP1431962B1 (de) | 2000-05-22 | 2001-05-22 | Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen |
Country Status (1)
Country | Link |
---|---|
EP (1) | EP1431962B1 (de) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8996362B2 (en) | 2008-01-31 | 2015-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Device and method for a bandwidth extension of an audio signal |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101006495A (zh) | 2004-08-31 | 2007-07-25 | 松下电器产业株式会社 | 语音编码装置、语音解码装置、通信装置以及语音编码方法 |
WO2006030864A1 (ja) | 2004-09-17 | 2006-03-23 | Matsushita Electric Industrial Co., Ltd. | 音声符号化装置、音声復号装置、通信装置及び音声符号化方法 |
JP4963963B2 (ja) | 2004-09-17 | 2012-06-27 | パナソニック株式会社 | スケーラブル符号化装置、スケーラブル復号装置、スケーラブル符号化方法およびスケーラブル復号方法 |
WO2008081777A1 (ja) | 2006-12-25 | 2008-07-10 | Kyushu Institute Of Technology | 高域信号補間装置及び高域信号補間方法 |
-
2001
- 2001-05-22 EP EP04100553A patent/EP1431962B1/de not_active Expired - Lifetime
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8996362B2 (en) | 2008-01-31 | 2015-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Device and method for a bandwidth extension of an audio signal |
Also Published As
Publication number | Publication date |
---|---|
EP1431962A2 (de) | 2004-06-23 |
EP1431962A3 (de) | 2004-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7330814B2 (en) | Wideband speech coding with modulated noise highband excitation system and method | |
US7136810B2 (en) | Wideband speech coding system and method | |
US6795805B1 (en) | Periodicity enhancement in decoding wideband signals | |
JP4662673B2 (ja) | 広帯域音声及びオーディオ信号復号器における利得平滑化 | |
US8374853B2 (en) | Hierarchical encoding/decoding device | |
CA2862715C (en) | Multi-mode audio codec and celp coding adapted therefore | |
US6260009B1 (en) | CELP-based to CELP-based vocoder packet translation | |
US8260620B2 (en) | Device for perceptual weighting in audio encoding/decoding | |
CN103384900B (zh) | 在预测编码与变换编码之间交替的低延迟声音编码 | |
EP1273005B1 (de) | Breitband-sprach-codec mit verschiedenen abtastraten | |
US20050027517A1 (en) | Transcoding method and system between celp-based speech codes | |
KR20090104846A (ko) | 디지털 오디오 신호에 대한 향상된 코딩/디코딩 | |
EP0981816A1 (de) | Audio-kodier-systeme und verfahren | |
EP1158495B1 (de) | Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen | |
EP1222659A1 (de) | Lpc-harmonischer sprachkodierer mit überrahmenformat | |
US6847929B2 (en) | Algebraic codebook system and method | |
CN101131820A (zh) | 编码设备、解码设备、编码方法和解码方法 | |
US6687667B1 (en) | Method for quantizing speech coder parameters | |
US20040111257A1 (en) | Transcoding apparatus and method between CELP-based codecs using bandwidth extension | |
EP1431962B1 (de) | Vorrichtung und Verfahren zur Breitbandcodierung von Sprachsignalen | |
JP3092653B2 (ja) | 広帯域音声符号化装置及び音声復号装置並びに音声符号化復号装置 | |
Schnitzler | A 13.0 kbit/s wideband speech codec based on SB-ACELP | |
US6801887B1 (en) | Speech coding exploiting the power ratio of different speech signal components | |
Esteban et al. | 9.6/7.2 kbps voice excited predictive coder (VEPC) | |
Vass et al. | Adaptive forward-backward quantizer for low bit rate high-quality speech coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AC | Divisional application: reference to earlier application |
Ref document number: 1158495 Country of ref document: EP Kind code of ref document: P |
|
AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
PUAL | Search report despatched |
Free format text: ORIGINAL CODE: 0009013 |
|
AK | Designated contracting states |
Kind code of ref document: A3 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE TR |
|
RIC1 | Information provided on ipc code assigned before grant |
Ipc: 7G 10L 21/02 A Ipc: 7G 10L 19/02 B |
|
17P | Request for examination filed |
Effective date: 20050601 |
|
AKX | Designation fees paid |
Designated state(s): DE FR GB |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AC | Divisional application: reference to earlier application |
Ref document number: 1158495 Country of ref document: EP Kind code of ref document: P |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): DE FR GB |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REF | Corresponds to: |
Ref document number: 60118627 Country of ref document: DE Date of ref document: 20060518 Kind code of ref document: P |
|
ET | Fr: translation filed | ||
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed |
Effective date: 20070108 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 16 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 17 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20170426 Year of fee payment: 17 Ref country code: FR Payment date: 20170418 Year of fee payment: 17 Ref country code: DE Payment date: 20170531 Year of fee payment: 17 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R119 Ref document number: 60118627 Country of ref document: DE |
|
GBPC | Gb: european patent ceased through non-payment of renewal fee |
Effective date: 20180522 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180522 Ref country code: DE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20181201 Ref country code: FR Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20180531 |