WO2004072949A3 - Quantification de pas pour reconnaissance vocale repartie - Google Patents

Quantification de pas pour reconnaissance vocale repartie Download PDF

Info

Publication number
WO2004072949A3
WO2004072949A3 PCT/US2004/003425 US2004003425W WO2004072949A3 WO 2004072949 A3 WO2004072949 A3 WO 2004072949A3 US 2004003425 W US2004003425 W US 2004003425W WO 2004072949 A3 WO2004072949 A3 WO 2004072949A3
Authority
WO
WIPO (PCT)
Prior art keywords
frame
pitch
class
calculated
quantizing
Prior art date
Application number
PCT/US2004/003425
Other languages
English (en)
Other versions
WO2004072949A2 (fr
Inventor
Tenkasi V Ramabadran
Alexander Sorin
Original Assignee
Motorola Inc
Ibm
Tenkasi V Ramabadran
Alexander Sorin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc, Ibm, Tenkasi V Ramabadran, Alexander Sorin filed Critical Motorola Inc
Priority to EP04708630A priority Critical patent/EP1595244B1/fr
Priority to BRPI0406956-0A priority patent/BRPI0406956B1/pt
Priority to ES04708630T priority patent/ES2395717T3/es
Priority to CN2004800036741A priority patent/CN1748244B/zh
Publication of WO2004072949A2 publication Critical patent/WO2004072949A2/fr
Publication of WO2004072949A3 publication Critical patent/WO2004072949A3/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/09Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • G10L2025/935Mixed voiced class; Transitions

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un système, un procédé et un support lisible par ordinateur permettant de quantifier des données de pas de données audio. Le procédé comporte les étapes consistant à : saisir des données audio représentant une trame numérotée d'une pluralité de trames numérotées ; calculer une classe de trame, la classe étant soit voisée, soit non voisée ; calculer, si la trame est de classe voisée, un pas de la trame ; calculer, si la trame est une trame numérotée paire de classe voisée, un mot code d'une première longueur par la quantification absolue du pas de trame ; calculer, si la trame est une trame numérotée impaire de classe voisée et qu'une trame fiable est disponible, un mot code d'une deuxième longueur par la quantification différentielle du pas de trame ; calculer, si aucune trame fiable n'est disponible, un mot code présentant la deuxième longueur par la quantification absolue du pas de trame.
PCT/US2004/003425 2003-02-07 2004-02-05 Quantification de pas pour reconnaissance vocale repartie WO2004072949A2 (fr)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP04708630A EP1595244B1 (fr) 2003-02-07 2004-02-05 Quantification de la frequence fondamentale pour reconnaissance vocale repartie
BRPI0406956-0A BRPI0406956B1 (pt) 2003-02-07 2004-02-05 “Quantização de informação de pitch para reconhecimento de fala distribuído”
ES04708630T ES2395717T3 (es) 2003-02-07 2004-02-05 Cuantificación de la frecuencia fundamental para el reconocimiento de voz distribuido
CN2004800036741A CN1748244B (zh) 2003-02-07 2004-02-05 用于分布式语音识别的音高量化

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US10/360,581 2003-02-07
US10/360,581 US6915256B2 (en) 2003-02-07 2003-02-07 Pitch quantization for distributed speech recognition

Publications (2)

Publication Number Publication Date
WO2004072949A2 WO2004072949A2 (fr) 2004-08-26
WO2004072949A3 true WO2004072949A3 (fr) 2004-12-09

Family

ID=32867946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/003425 WO2004072949A2 (fr) 2003-02-07 2004-02-05 Quantification de pas pour reconnaissance vocale repartie

Country Status (9)

Country Link
US (1) US6915256B2 (fr)
EP (1) EP1595244B1 (fr)
KR (1) KR100641673B1 (fr)
CN (1) CN1748244B (fr)
BR (1) BRPI0406956B1 (fr)
ES (1) ES2395717T3 (fr)
RU (1) RU2331932C2 (fr)
TW (1) TWI333640B (fr)
WO (1) WO2004072949A2 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6961696B2 (en) * 2003-02-07 2005-11-01 Motorola, Inc. Class quantization for distributed speech recognition
US8249873B2 (en) 2005-08-12 2012-08-21 Avaya Inc. Tonal correction of speech
US7783488B2 (en) * 2005-12-19 2010-08-24 Nuance Communications, Inc. Remote tracing and debugging of automatic speech recognition servers by speech reconstruction from cepstra and pitch information
TWI299133B (en) * 2006-01-23 2008-07-21 Realtek Semiconductor Corp Webcasting system and audio regulating methods therefor
KR101317269B1 (ko) 2007-06-07 2013-10-14 삼성전자주식회사 정현파 오디오 코딩 방법 및 장치, 그리고 정현파 오디오디코딩 방법 및 장치
US8670990B2 (en) * 2009-08-03 2014-03-11 Broadcom Corporation Dynamic time scale modification for reduced bit rate audio coding
EP2596494B1 (fr) * 2010-07-20 2020-08-05 Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V. Décodeur audio, procédé de décodage audio et programme d'ordinateur
US8645128B1 (en) 2012-10-02 2014-02-04 Google Inc. Determining pitch dynamics of an audio signal
US9454976B2 (en) 2013-10-14 2016-09-27 Zanavox Efficient discrimination of voiced and unvoiced sounds

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5081681A (en) * 1989-11-30 1992-01-14 Digital Voice Systems, Inc. Method and apparatus for phase synthesis for speech processing
US5091944A (en) * 1989-04-21 1992-02-25 Mitsubishi Denki Kabushiki Kaisha Apparatus for linear predictive coding and decoding of speech using residual wave form time-access compression
US6041297A (en) * 1997-03-10 2000-03-21 At&T Corp Vocoder for coding speech by using a correlation between spectral magnitudes and candidate excitations
US6199037B1 (en) * 1997-12-04 2001-03-06 Digital Voice Systems, Inc. Joint quantization of speech subframe voicing metrics and fundamental frequencies
US6377916B1 (en) * 1999-11-29 2002-04-23 Digital Voice Systems, Inc. Multiband harmonic transform coder

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9811019D0 (en) * 1998-05-21 1998-07-22 Univ Surrey Speech coders
US6094629A (en) * 1998-07-13 2000-07-25 Lockheed Martin Corp. Speech coding system and method including spectral quantizer

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5091944A (en) * 1989-04-21 1992-02-25 Mitsubishi Denki Kabushiki Kaisha Apparatus for linear predictive coding and decoding of speech using residual wave form time-access compression
US5081681A (en) * 1989-11-30 1992-01-14 Digital Voice Systems, Inc. Method and apparatus for phase synthesis for speech processing
US5081681B1 (en) * 1989-11-30 1995-08-15 Digital Voice Systems Inc Method and apparatus for phase synthesis for speech processing
US6041297A (en) * 1997-03-10 2000-03-21 At&T Corp Vocoder for coding speech by using a correlation between spectral magnitudes and candidate excitations
US6199037B1 (en) * 1997-12-04 2001-03-06 Digital Voice Systems, Inc. Joint quantization of speech subframe voicing metrics and fundamental frequencies
US6377916B1 (en) * 1999-11-29 2002-04-23 Digital Voice Systems, Inc. Multiband harmonic transform coder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1595244A4 *

Also Published As

Publication number Publication date
BRPI0406956B1 (pt) 2018-02-27
RU2005127863A (ru) 2006-01-27
KR20050097929A (ko) 2005-10-10
WO2004072949A2 (fr) 2004-08-26
TW200506814A (en) 2005-02-16
RU2331932C2 (ru) 2008-08-20
CN1748244B (zh) 2010-09-29
KR100641673B1 (ko) 2006-11-10
TWI333640B (en) 2010-11-21
ES2395717T3 (es) 2013-02-14
EP1595244A2 (fr) 2005-11-16
US6915256B2 (en) 2005-07-05
EP1595244A4 (fr) 2006-03-08
EP1595244B1 (fr) 2012-11-14
CN1748244A (zh) 2006-03-15
US20040172243A1 (en) 2004-09-02
BRPI0406956A (pt) 2006-01-03

Similar Documents

Publication Publication Date Title
CY1114289T1 (el) Διακωδικοποιηση ηχου χαμηλης περιπλοκοτητας
CN101836253B (zh) 一种使用频谱倾斜控制成帧技术来计算带宽扩展数据的装置及方法
US7627471B2 (en) Providing translations encoded within embedded digital information
US8135590B2 (en) Position-dependent phonetic models for reliable pronunciation identification
EP1447792A3 (fr) Méthode et dispositif pour modéliser un système de reconnaissance de parole et pour prédire un taux d'erreur en mots à partir d'un texte
EP1262956A3 (fr) Procédé et dispositif de codage de la parole
US20150262587A1 (en) Pitch Synchronous Speech Coding Based on Timbre Vectors
CA2572715A1 (fr) Procede et appareil d'egalisation d'un signal vocal produit dans un systeme d'appareil respiratoire autonome
CN106098078B (zh) 一种可过滤扬声器噪音的语音识别方法及其系统
EP1168306A3 (fr) Procédé et dispositif pour améliorer l'intelligibilité de signaux vocaux comprimés numériquement
WO2005034080A3 (fr) Procede permettant de prendre une decision concernant le type de fenetre en fonction de donnees mdct lors du codage audio
WO2004072949A3 (fr) Quantification de pas pour reconnaissance vocale repartie
EP1533791A3 (fr) Détection d'activité vocale et amélioration de l'intelligibilité de la parole
Esposito et al. Text independent methods for speech segmentation
WO2004072948A3 (fr) Quantification de classe de voisement pour la reconnaissance vocale distribuee
DE602004007953D1 (de) System und verfahren zur audiosignalverarbeitung
WO2005031534A3 (fr) Procede et dispositif de codage de plage fractionne
US6704701B1 (en) Bi-directional pitch enhancement in speech coding systems
JPS63282795A (ja) マルチパルス符号化装置
WO2019216187A1 (fr) Dispositif d'augmentation de hauteur, et procédé et programme associés
WO2004034355A3 (fr) Systeme et procede de comparaison d'elements
WO2007111649A3 (fr) Lissage de lecture de hauteur tonale en boucle ouverte
JPH11175096A (ja) 音声信号処理装置
JP2644789B2 (ja) 画像伝送方式
WO2003042648A1 (fr) Codeur de signal vocal, decodeur de signal vocal, procede de codage de signal vocal et procede de decodage de signal vocal

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 1020057012455

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 20048036741

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2004708630

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2005127863

Country of ref document: RU

WWP Wipo information: published in national office

Ref document number: 1020057012455

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2004708630

Country of ref document: EP

ENP Entry into the national phase

Ref document number: PI0406956

Country of ref document: BR