WO2005114653A1 - Procede de quantification d'un codeur de parole a tres bas debit - Google Patents
Procede de quantification d'un codeur de parole a tres bas debit Download PDFInfo
- Publication number
- WO2005114653A1 WO2005114653A1 PCT/EP2005/051661 EP2005051661W WO2005114653A1 WO 2005114653 A1 WO2005114653 A1 WO 2005114653A1 EP 2005051661 W EP2005051661 W EP 2005051661W WO 2005114653 A1 WO2005114653 A1 WO 2005114653A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pitch
- voicing
- information
- parameters
- frames
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 239000013598 vector Substances 0.000 claims abstract description 29
- 238000011002 quantification Methods 0.000 claims abstract description 6
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 5
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 5
- 238000013139 quantization Methods 0.000 claims description 34
- 230000003595 spectral effect Effects 0.000 claims description 12
- 238000010586 diagram Methods 0.000 description 6
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000005284 excitation Effects 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012937 correction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/087—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/09—Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0004—Design or structure of the codebook
- G10L2019/0005—Multi-stage vector quantisation
Definitions
- the invention relates to a speech coding method. It applies in particular to the realization of vocoders very low bit rate, of the order of 600 bits per second. It is used, for example, for the MELP encoder (Mixed Excitation Linear Prediction), described for example in one of the references [1, 2,3,4].
- the method is for example implemented in satellite communications, internet telephony, static answering machines, voice pagers, etc.
- the objective of these vocoders is to reconstruct a signal that is as close as possible, in the sense of the perception by the human ear, of the original speech signal, using the lowest possible bit rate.
- the voicing which describes the harmonic character of the voiced sounds or the stochastic character of the unvoiced sounds, the fundamental frequency of the voiced sounds still known under the Anglo-Saxon term "PITCH", temporal evolution of the energy as well as the spectral envelope of the signal to excite and parameterize the synthesis filters.
- the spectral parameters used are the LSF coefficients (A Spectral Frequencies) derived from a linear prediction analysis, LPC linear predictive coding (Linear Predictive Coding). The analysis is done for a conventional rate of 2400 bit / sec every 22.5 ms.
- the additional information extracted during the modeling are: o the fundamental frequency or pitch, o the gains, o the subband voice information, o the Fourier coefficients calculated on the residual signal after linear prediction.
- the object of the present invention is, in particular, to extend the MELP model at a rate of 600bits / sec.
- the parameters selected are, for example, the pitch, the LSF spectral coefficients, the gains and the voicing.
- the frames are grouped for example in a super frame of 90 ms, that is to say 4 consecutive frames of 22.5 ms of the initial scheme (scheme usually used).
- a bit rate of 600 bits / sec is obtained from an optimization of the quantization scheme of the various parameters (pitch, coefficient LSF, gain, voicing).
- the invention relates to a speech coding and decoding method for voice communications using a very low bit rate vocoder having an analysis portion for coding and transmitting speech signal parameters, such as voice information. subband, pitch, gains, LSF spectral parameters and a synthesis part for receiving and decoding the transmitted parameters and reconstructing the speech signal.
- It is characterized in that it comprises at least the following steps: • group the parameters voicing, pitch, gains, LSF coefficients on N consecutive frames to form a super-frame, • perform a vector quantization of the voicing information for each super-frame by developing a classification using the information on the sequence in terms of voicing existing on a sub-multiple of N consecutive elementary frames, the voicing information makes it possible to identify classes of sounds for which the flow allocation and associated dictionaries will be optimized, • code the pitch, the gains and the LSF coefficients using the classification obtained.
- the classification is for example developed using the information on the sequence in terms of voicing existing on 2 consecutive elementary frames.
- FIG. 1 a general diagram of the method according to the invention for the encoder part
- FIG. 2 the block diagram of the vector quantization of the voicing information
- Figures 3 and 4 the block diagram of the vector quantization of the pitch
- Figure 5 the block diagram of the vector quantization of the parameters
- FIG. 6 is a block diagram of the vector quantization of the gains
- FIG. 7 is a diagram applied to the decoder part.
- the following detailed example relates to a MELP coder adapted to the bit rate of 600 bits / sec.
- the method according to the invention relates in particular to the encoding of the parameters which make it possible to reproduce at best with a minimum of bitrate all the complexity of the speech signal.
- the parameters selected are example: the pitch, the spectral coefficients LSF, the gains and the voicing.
- the method uses, in particular, a vector quantization procedure with classification.
- FIG. 1 schematizes overall the various implementations at the level of a speech coder.
- the process according to the invention takes place in 7 main steps. Step of analysis of the speech signal Step 1 analyzes the signal using a MELP type algorithm known to those skilled in the art.
- a voicing decision is made for each 22.5 ms frame and for 5 predefined frequency subbands.
- Parameter grouping step 2 the method groups the selected parameters: voicing, pitch, gains and LSF coefficients on N consecutive frames of 22.5 ms to form a 90 ms superframe.
- Quantization step of the voicing information - detailed in Figure 2 On the horizon of a superframe, the voicing information is represented by a binary component matrix (0: unvoiced; 1: voiced) of size (5 * 4), 5 MELP sub-bands, 4 frames.
- the distance used is a weighted Euclidean distance in order to favor the bands located at low frequencies.
- a weighting vector [1.0; 1.0; 0.7; 0.4; 0.1].
- Quantized voicing information can identify sound classes for which rate allocation and associated dictionaries will be optimized. This voicing information is then put for the vector quantization of spectral parameters and gains with pre-classification.
- the method may include a step of applying constraints.
- the method makes use, for example, of the following 4 vectors [0,0,0,0,0], [1, 0,0,0,0], [1, 1, 1, 0 , 0], [1, 1, 1, 1, 1] indicating the voicing of the low band towards the high band.
- Each column of the voicing matrix, associated with the voicing of one of the 4 frames constituting the superframe, is compared to each of these 4 vectors, and replaced by the nearest vector for learning the dictionary.
- the same constraint is applied (choice of the 4 preceding vectors) and QV vector quantization is performed by applying the previously found dictionary. This gives the hints of voicing.
- the classification information is therefore available at the level of the decoder without additional cost in terms of bit rate.
- dictionaries are optimized.
- the method defines for example 6 classes of voicing over a horizon of 2 elementary frames.
- the classification is for example determined using the information on the sequence in terms of voicing existing on a sub-multiple of N consecutive elementary frames, for example on 2 consecutive elementary frames. Each super frame is thus represented on 2 classes of voicing.
- the 6 classes of voicing thus defined are for example:
- Table 1 groups the different quantization modes according to the class of voicing and Table 2 the voicing information for each of the 6 quantization modes.
- the method implements a multi-stage type of quantification method, such as the MSVQ (Multi Stage Vector Quantization) method known from the 'Man of the trade.
- MSVQ Multi Stage Vector Quantization
- a superframe consists of 4 vectors of 10 LSF coefficients and the vector quantization is applied for each grouping of 2 elementary frames (2 sub-vectors of 20 coefficients).
- Pitch quantification step Figures 3 and 4 The pitch is quantized differently depending on the mode. o In the case of mode 1 (unvoiced, number of voiced frames equal to 0), no pitch information is transmitted.
- o In the case of mode 2, only one frame is considered as voiced and identified by the voicing information. The pitch is then represented on 6 bits (scalar quantization of the pitch period after logarithmic compression). o In the other modes: o 5 bits are used to transmit a pitch value (scalar quantization of the pitch period after logarithmic compression), o 2 bits are used to position the pitch value on one of the 4 frames o 1 bit is used to characterize the evolution profile.
- Figure 4 shows the evolution profile of the pitch. The transmitted pitch value, its position and the evolution profile are determined by minimizing a least squares criterion on the estimated pitch trajectory of the analysis.
- the trajectories considered are obtained for example by linear interpolation between the last pitch value of the previous super-frame and the pitch value that will be transmitted. If the value of transmitted pitch is not positioned on the last frame, the indicator of the evolution profile makes it possible to complete the trajectory either while maintaining the reached value, or by returning towards the value of "initial pitch" (the last pitch value of the previous super-frame).
- the set of positions are considered, as well as all the pitch values between the quantized pitch value immediately below the estimated minimum pitch on the superframe and the quantized pitch value immediately greater than the estimated maximum pitch on the superframe.
- the bit rate is allocated primarily to the higher voicing class, the notion of higher voicing corresponding to a greater or equal number of voiced sub-bands.
- mode 4 the two consecutive unvoiced frames will be represented from the dictionary (6, 4, 4) while the two consecutive voiced frames will be represented by the dictionary (7, 5, 4).
- mode 2 the two consecutive mixed frames are represented by the dictionary (7,5,4) and the two consecutive frames unvoiced by the dictionary (6,4,4).
- Table 4 contains the memory size associated with the dictionaries.
- the VQ abstract corresponds to vector quantization and MSVQ the multistage vector quantization method. Evaluation of the rate
- the table 6 groups the allocation of the bit rate for the realization of the speech coder type MELP to 600 bit / sec a super frame of 54 bits (90 ms). Table 6
- Figure 8 shows the schematic at the decoder portion of the vocoder.
- the voicing index transmitted by the coder part is used to generate the quantization modes.
- the voicing, pitch quantization, gain and LSF spectral parameters transmitted by the coder portion are de-quantized using the quantization modes obtained.
- the different steps are performed according to a scheme similar to that described for the encoder part of the system.
- the different dequantized parameters are then grouped before being transmitted to the synthesis part of the decoder to restore the speech signal.
- MELP The New Federal Standard at 2400 BPS
- L.Supplee The New Federal Standard at 2400 BPS
- R. Cohn The New Federal Standard at 2400 BPS
- J. Collura The New Federal Standard at 2400 BPS
- AN. McCree Proceedings of IEEE ICASSP, pp 1591-1594, 1997.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
- Magnetic Resonance Imaging Apparatus (AREA)
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP05733605A EP1756806B1 (fr) | 2004-04-19 | 2005-04-14 | Procede de quantification d'un codeur de parole a tres bas debit |
PL05733605T PL1756806T3 (pl) | 2004-04-19 | 2005-04-14 | Sposób kwantyzacji kodera mowy o bardzo małej przepływności |
AT05733605T ATE453909T1 (de) | 2004-04-19 | 2005-04-14 | Verfahren zum quantifizieren eines sprachcodierers mit ultraniedriger rate |
DE602005018637T DE602005018637D1 (de) | 2004-04-19 | 2005-04-14 | Verfahren zum quantifizieren eines sprachcodierers mit ultraniedriger rate |
CA2567162A CA2567162C (fr) | 2004-04-19 | 2005-04-14 | Procede de quantification d'un codeur de parole a tres bas debit |
US11/578,663 US7716045B2 (en) | 2004-04-19 | 2005-04-14 | Method for quantifying an ultra low-rate speech coder |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR0404105A FR2869151B1 (fr) | 2004-04-19 | 2004-04-19 | Procede de quantification d'un codeur de parole a tres bas debit |
FR04/04105 | 2004-04-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2005114653A1 true WO2005114653A1 (fr) | 2005-12-01 |
Family
ID=34945858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2005/051661 WO2005114653A1 (fr) | 2004-04-19 | 2005-04-14 | Procede de quantification d'un codeur de parole a tres bas debit |
Country Status (9)
Country | Link |
---|---|
US (1) | US7716045B2 (fr) |
EP (1) | EP1756806B1 (fr) |
AT (1) | ATE453909T1 (fr) |
CA (1) | CA2567162C (fr) |
DE (1) | DE602005018637D1 (fr) |
ES (1) | ES2338801T3 (fr) |
FR (1) | FR2869151B1 (fr) |
PL (1) | PL1756806T3 (fr) |
WO (1) | WO2005114653A1 (fr) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008092473A1 (fr) * | 2007-01-31 | 2008-08-07 | Telecom Italia S.P.A. | Procédé et système personnalisables de reconnaissance d'émotions |
PT2313887T (pt) * | 2008-07-10 | 2017-11-14 | Voiceage Corp | Dispositivo e método de quantificação de filtro de lpc de taxa de bits variável e quantificação inversa |
CN114333862B (zh) * | 2021-11-10 | 2024-05-03 | 腾讯科技(深圳)有限公司 | 音频编码方法、解码方法、装置、设备、存储介质及产品 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1995010760A2 (fr) * | 1993-10-08 | 1995-04-20 | Comsat Corporation | Codeurs vocaux a bas debit binaire ameliores et procedes pour leur utilisation |
US6263307B1 (en) * | 1995-04-19 | 2001-07-17 | Texas Instruments Incorporated | Adaptive weiner filtering using line spectral frequencies |
US5774837A (en) * | 1995-09-13 | 1998-06-30 | Voxware, Inc. | Speech coding system and method using voicing probability determination |
US5806027A (en) * | 1996-09-19 | 1998-09-08 | Texas Instruments Incorporated | Variable framerate parameter encoding |
US6081776A (en) * | 1998-07-13 | 2000-06-27 | Lockheed Martin Corp. | Speech coding system and method including adaptive finite impulse response filter |
US6377915B1 (en) * | 1999-03-17 | 2002-04-23 | Yrp Advanced Mobile Communication Systems Research Laboratories Co., Ltd. | Speech decoding using mix ratio table |
US7315815B1 (en) * | 1999-09-22 | 2008-01-01 | Microsoft Corporation | LPC-harmonic vocoder with superframe structure |
US6475145B1 (en) * | 2000-05-17 | 2002-11-05 | Baymar, Inc. | Method and apparatus for detection of acid reflux |
-
2004
- 2004-04-19 FR FR0404105A patent/FR2869151B1/fr not_active Expired - Fee Related
-
2005
- 2005-04-14 AT AT05733605T patent/ATE453909T1/de not_active IP Right Cessation
- 2005-04-14 WO PCT/EP2005/051661 patent/WO2005114653A1/fr active Application Filing
- 2005-04-14 EP EP05733605A patent/EP1756806B1/fr active Active
- 2005-04-14 ES ES05733605T patent/ES2338801T3/es active Active
- 2005-04-14 US US11/578,663 patent/US7716045B2/en not_active Expired - Fee Related
- 2005-04-14 PL PL05733605T patent/PL1756806T3/pl unknown
- 2005-04-14 CA CA2567162A patent/CA2567162C/fr not_active Expired - Fee Related
- 2005-04-14 DE DE602005018637T patent/DE602005018637D1/de active Active
Non-Patent Citations (4)
Title |
---|
NANDKUMAR S ET AL: "Robust speech mode based LSF vector quantization for low bit rate coders", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, PROCEEDINGS, 12 May 1998 (1998-05-12), SEATTLE, WA, USA, pages 41 - 44, XP010279049, ISBN: 0-7803-4428-6 * |
PADELLINI M ET AL: "Codage de la parole a très bas débit par indexation d'unités de taille variable", RENCONTRES JEUNES CHERCHEURS EN PAROLE, XX, XX, 23 September 2003 (2003-09-23), pages 1 - 3, XP002285303 * |
STACHURSKI J ET AL: "High quality MELP coding at bit-rates around 4 kb/s", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, 15 March 1999 (1999-03-15), PHOENIX, AZ, USA, pages 485 - 488, XP010327975, ISBN: 0-7803-5041-3 * |
ULPU SINERVO ET AL: "Multi-Mode Matrix Quantizer for Low Bit Rate LSF Quantization", EUROSPEECH, September 2003 (2003-09-01), GENEVA, CH, pages 1073 - 1076, XP007006802 * |
Also Published As
Publication number | Publication date |
---|---|
DE602005018637D1 (de) | 2010-02-11 |
US7716045B2 (en) | 2010-05-11 |
FR2869151B1 (fr) | 2007-01-26 |
EP1756806A1 (fr) | 2007-02-28 |
PL1756806T3 (pl) | 2010-06-30 |
CA2567162A1 (fr) | 2005-12-01 |
CA2567162C (fr) | 2013-07-23 |
FR2869151A1 (fr) | 2005-10-21 |
EP1756806B1 (fr) | 2009-12-30 |
US20070219789A1 (en) | 2007-09-20 |
ATE453909T1 (de) | 2010-01-15 |
ES2338801T3 (es) | 2010-05-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1222659B1 (fr) | Vocodeur harmonique a codage predictif lineaire (lpc) avec structure a supertrame | |
US6260009B1 (en) | CELP-based to CELP-based vocoder packet translation | |
CN101180676B (zh) | 用于谱包络表示的向量量化的方法和设备 | |
EP1576585B1 (fr) | Procede et dispositif pour une quantification fiable d'un vecteur de prediction de parametres de prediction lineaire dans un codage vocal a debit binaire variable | |
US8515767B2 (en) | Technique for encoding/decoding of codebook indices for quantized MDCT spectrum in scalable speech and audio codecs | |
US8364495B2 (en) | Voice encoding device, voice decoding device, and methods therefor | |
US7191125B2 (en) | Method and apparatus for high performance low bit-rate coding of unvoiced speech | |
US20150302859A1 (en) | Scalable And Embedded Codec For Speech And Audio Signals | |
EP1692689B1 (fr) | Procede de codage multiple optimise | |
US20100023324A1 (en) | Device and Method for Quanitizing and Inverse Quanitizing LPC Filters in a Super-Frame | |
US6754630B2 (en) | Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation | |
WO1999016050A1 (fr) | Codec a geometrie variable et integree pour signaux de parole et de son | |
EP1125283B1 (fr) | Procede de quantification des parametres d'un codeur de parole | |
EP1597721B1 (fr) | Transcodage 600 bps a prediction lineaire avec excitation mixte (melp) | |
CA2567162C (fr) | Procede de quantification d'un codeur de parole a tres bas debit | |
JPH09508479A (ja) | バースト励起線形予測 | |
KR0155798B1 (ko) | 음성신호 부호화 및 복호화 방법 | |
Ojala et al. | Variable model order LPC quantization | |
KR100757366B1 (ko) | Zinc 함수를 이용한 음성 부호화기 및 그의 표준파형추출 방법 | |
Marie | Docteur en Sciences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPEN | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11578663 Country of ref document: US Ref document number: 2007219789 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005733605 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2567162 Country of ref document: CA |
|
WWP | Wipo information: published in national office |
Ref document number: 2005733605 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11578663 Country of ref document: US |