WO2009055718A1 - Production de phonitos basée sur des vecteurs de particularité - Google Patents

Production de phonitos basée sur des vecteurs de particularité Download PDF

Info

Publication number
WO2009055718A1
WO2009055718A1 PCT/US2008/081187 US2008081187W WO2009055718A1 WO 2009055718 A1 WO2009055718 A1 WO 2009055718A1 US 2008081187 W US2008081187 W US 2008081187W WO 2009055718 A1 WO2009055718 A1 WO 2009055718A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
phoneme
cords
voiced
signal
Prior art date
Application number
PCT/US2008/081187
Other languages
English (en)
Inventor
Joel K. Nyquist
Erik N. Reckase
Matthew D. Robinson
John F. Remillard
James Goodnow
Original Assignee
Red Shift Company, Llc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US12/256,693 external-priority patent/US20090182556A1/en
Application filed by Red Shift Company, Llc filed Critical Red Shift Company, Llc
Publication of WO2009055718A1 publication Critical patent/WO2009055718A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • G10L2025/937Signal energy in various frequency bands

Definitions

  • a machine-readable medium can have stored thereon a series of instructions which, when executed by a processor, cause the processor to process a signal representing speech by receiving a first frame of the signal representing speech, the first frame comprising a voiced frame.
  • One or more cords can be extracted from the voiced frame based on occurrence of one or more events within the frame.
  • the one or more events can comprise one or more glottal pulses.
  • the one or more cords can collectively comprise less than all of the frame.
  • FIG. 10 is a flowchart illustrating a process for identifying a cord termination according to one embodiment of the present invention. DETAILED DESCRIPTION OF THE INVENTION
  • FIG. 1 is a graph illustrating an exemplary electrical signal representing speech.
  • This example illustrates an electrical signal 100 as may be received from a transducer such as a microphone or other device when detecting speech.
  • the signal 100 includes a series of high-amplitude spikes referred to herein as glottal pulses 105.
  • the term glottal pulse is used to described these spikes because they occur in the electrical signal 100 at a point when the glottis in the throat of the speaker causes a sound generating event.
  • the glottal pulse 105 can be used to identify frames of the signal to be sampled and/or analyzed to determine a spoken sound represented by the signal.
  • a pitch estimation and marking module 230 can be communicatively coupled with the classification module 225. Generally speaking, the pitch estimation and marking module 230 can parse or mark the voiced frame into one or more regions based on an estimated pitch for that region and the occurrence of events, i.e., glottal pulses within the signal. As used herein, the term "region" is used to refer to a portion of a frame of the electrical signal representing speech where the portion has been marked by the pitch marking process. Details of exemplary processes for pitch estimation and marking as may be performed by the pitch estimation and marking module 225 are described below with reference to FIGs. 7 and 8.
  • the cord finder 240 can further parse the region of the signal into one or more cords based on occurrence of one or more events, e.g., the glottal pulses. As will be discussed below with reference to FIG. 9, parsing the voiced region into one or more cords can comprise locating a first glottal pulse, and selecting a cord including the first glottal pulse.
  • FIG. 3 is a graph illustrating an exemplary electrical signal representing speech including delineation of portions used for speech recognition according to one embodiment of the present invention.
  • this example illustrates a signal 300 that includes a series of glottal pulses 310 and 330 followed by a series of lesser peaks and a period of transients or echoes just prior to the start of another glottal pulse.
  • pitch estimation and marking can be performed.
  • the pitch estimation and marking can comprise parsing or marking the voiced frame into one or more regions based on an estimated pitch for that region and the occurrence of events, i.e., glottal pulses within the signal. Details of exemplary processes for pitch estimation and marking are described below with reference to FIGs. 7 and 8.
  • the pitch marking process can be tuned or adjusted. More specifically, such tuning can check the gaps between the marked events within the region. If a gap between any two events exceeds an expected gap, a check can be made for an event occurring between the marked events. For example, the expected gap can be based on the expected distance between events for a given pitch estimate. If the gap equals a multiple of that expected gap, the gap can be considered to be excessive and a check can be made for an event falling within the gap. Also as noted above, such tuning is considered to be optional and may be excluded from some implementations.
  • locating 915 adjacent glottal pulse can comprise looking forward and backward in the signal. For example, looking backwards from B 0 can comprise considering the set of local maxima of the region in the range [B 0 - 1.2*EB 0 B O -O.8*E BO ] (a 20% neighborhood of B 0 -E B0 ). If there are glottal pulse candidates in this neighborhood, the largest, i.e., highest amplitude, candidate can be considered the next glottal pulse event, B 1 . This can be repeated using the new cord length (B n-1 - B n ) as the new pitch estimate for this location until no glottal pulses are detected or the beginning of the region is reached.

Abstract

L'invention concerne des procédés, systèmes et supports lisibles par machine pour traiter un signal représentant la parole. Selon un mode de réalisation, le traitement d'un signal représentant la parole peut comprendre la réception d'une première trame du signal, la première trame comprenant une trame vocale. Une ou plusieurs cordes peuvent être extraites de la trame vocale en se basant sur l'occurrence d'un ou plusieurs événements dans la trame. Par exemple, le ou les événements peuvent comprendre une ou plusieurs impulsions glottales. La ou les cordes peuvent comprendre collectivement moins de la totalité de la trame. Par exemple, chacune des cordes peut commencer avec le départ d'une impulsion glottale et se prolonger vers un point antérieur à un départ d'une impulsion glottale voisine, mais peut exclure une partie de la trame avant le départ de l'impulsion glottale voisine. Un phonème pour la trame vocale peut être déterminé en se basant sur au moins l'une des cordes extraites.
PCT/US2008/081187 2007-10-24 2008-10-24 Production de phonitos basée sur des vecteurs de particularité WO2009055718A1 (fr)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US98225707P 2007-10-24 2007-10-24
US60/982,257 2007-10-24
US12/256,693 US20090182556A1 (en) 2007-10-24 2008-10-23 Pitch estimation and marking of a signal representing speech
US12/256,710 2008-10-23
US12/256,706 US8315856B2 (en) 2007-10-24 2008-10-23 Identify features of speech based on events in a signal representing spoken sounds
US12/256,710 US8396704B2 (en) 2007-10-24 2008-10-23 Producing time uniform feature vectors
US12/256,729 US20090271196A1 (en) 2007-10-24 2008-10-23 Classifying portions of a signal representing speech
US12/256,706 2008-10-23
US12/256,729 2008-10-23
US12/256,716 2008-10-23
US12/256,716 US8326610B2 (en) 2007-10-24 2008-10-23 Producing phonitos based on feature vectors
US12/256,693 2008-10-23

Publications (1)

Publication Number Publication Date
WO2009055718A1 true WO2009055718A1 (fr) 2009-04-30

Family

ID=40580055

Family Applications (3)

Application Number Title Priority Date Filing Date
PCT/US2008/081180 WO2009055715A1 (fr) 2007-10-24 2008-10-24 Production de vecteurs de parole de particularité uniforme dans le temps
PCT/US2008/081187 WO2009055718A1 (fr) 2007-10-24 2008-10-24 Production de phonitos basée sur des vecteurs de particularité
PCT/US2008/081160 WO2009055701A1 (fr) 2007-10-24 2008-10-24 Traitement d'un signal représentant la parole

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/US2008/081180 WO2009055715A1 (fr) 2007-10-24 2008-10-24 Production de vecteurs de parole de particularité uniforme dans le temps

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/US2008/081160 WO2009055701A1 (fr) 2007-10-24 2008-10-24 Traitement d'un signal représentant la parole

Country Status (1)

Country Link
WO (3) WO2009055715A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103680516B (zh) * 2013-12-11 2017-07-28 深圳Tcl新技术有限公司 音频信号的处理方法和装置
US10735876B2 (en) * 2015-03-13 2020-08-04 Sonova Ag Method for determining useful hearing device features
EP3857541B1 (fr) 2018-09-30 2023-07-19 Microsoft Technology Licensing, LLC Génération de forme d'onde de parole

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5799276A (en) * 1995-11-07 1998-08-25 Accent Incorporated Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals
US20060136206A1 (en) * 2004-11-24 2006-06-22 Kabushiki Kaisha Toshiba Apparatus, method, and computer program product for speech recognition
US20070150277A1 (en) * 2005-12-28 2007-06-28 Samsung Electronics Co., Ltd. Method and system for segmenting phonemes from voice signals

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5400434A (en) * 1990-09-04 1995-03-21 Matsushita Electric Industrial Co., Ltd. Voice source for synthetic speech system
US6006175A (en) * 1996-02-06 1999-12-21 The Regents Of The University Of California Methods and apparatus for non-acoustic speech characterization and recognition
SE512719C2 (sv) * 1997-06-10 2000-05-02 Lars Gustaf Liljeryd En metod och anordning för reduktion av dataflöde baserad på harmonisk bandbreddsexpansion
US6233550B1 (en) * 1997-08-29 2001-05-15 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
US7272556B1 (en) * 1998-09-23 2007-09-18 Lucent Technologies Inc. Scalable and embedded codec for speech and audio signals

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5799276A (en) * 1995-11-07 1998-08-25 Accent Incorporated Knowledge-based speech recognition system and methods having frame length computed based upon estimated pitch period of vocalic intervals
US20060136206A1 (en) * 2004-11-24 2006-06-22 Kabushiki Kaisha Toshiba Apparatus, method, and computer program product for speech recognition
US20070150277A1 (en) * 2005-12-28 2007-06-28 Samsung Electronics Co., Ltd. Method and system for segmenting phonemes from voice signals

Also Published As

Publication number Publication date
WO2009055701A1 (fr) 2009-04-30
WO2009055715A1 (fr) 2009-04-30

Similar Documents

Publication Publication Date Title
US8326610B2 (en) Producing phonitos based on feature vectors
CN108198547B (zh) 语音端点检测方法、装置、计算机设备和存储介质
Hu et al. Pitch‐based gender identification with two‐stage classification
KR101247652B1 (ko) 잡음 제거 장치 및 방법
JPH0990974A (ja) 信号処理方法
Ghaemmaghami et al. Noise robust voice activity detection using features extracted from the time-domain autocorrelation function
US8086449B2 (en) Vocal fry detecting apparatus
JPH10133693A (ja) 音声認識装置
Hasan et al. Preprocessing of continuous bengali speech for feature extraction
KR101122590B1 (ko) 음성 데이터 분할에 의한 음성 인식 장치 및 방법
US6470311B1 (en) Method and apparatus for determining pitch synchronous frames
WO2009055718A1 (fr) Production de phonitos basée sur des vecteurs de particularité
KR100735417B1 (ko) 음성 신호에서의 피크 특징 추출이 가능한 윈도우를정렬하는 방법 및 그 시스템
Hasija et al. Recognition of Children Punjabi Speech using Tonal Non-Tonal Classifier
Sangeetha et al. Robust automatic continuous speech segmentation for indian languages to improve speech to speech translation
CN114724589A (zh) 语音质检的方法、装置、电子设备和存储介质
Sudhakar et al. Automatic speech segmentation to improve speech synthesis performance
JP4537821B2 (ja) オーディオ信号分析方法、その方法を用いたオーディオ信号認識方法、オーディオ信号区間検出方法、それらの装置、プログラムおよびその記録媒体
VH et al. A study on speech recognition technology
Sasou et al. Glottal excitation modeling using HMM with application to robust analysis of speech signal.
Cooper Speech detection using gammatone features and one-class support vector machine
Medhi et al. Different acoustic feature parameters ZCR, STE, LPC and MFCC analysis of Assamese vowel phonemes
Sung et al. A study of knowledge-based features for obstruent detection and classification in continuous Mandarin speech
Manjutha et al. Statistical Model-Based Tamil Stuttered Speech Segmentation Using Voice Activity Detection
Undhad et al. Exploiting speech source information for vowel landmark detection for low resource language

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08840852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08840852

Country of ref document: EP

Kind code of ref document: A1