EP1394769A3 - Automatic segmentation in speech synthesis - Google Patents

Automatic segmentation in speech synthesis Download PDF

Info

Publication number
EP1394769A3
EP1394769A3 EP03100795A EP03100795A EP1394769A3 EP 1394769 A3 EP1394769 A3 EP 1394769A3 EP 03100795 A EP03100795 A EP 03100795A EP 03100795 A EP03100795 A EP 03100795A EP 1394769 A3 EP1394769 A3 EP 1394769A3
Authority
EP
European Patent Office
Prior art keywords
phone
labels
hmms
corrected
speech synthesis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP03100795A
Other languages
German (de)
French (fr)
Other versions
EP1394769B1 (en
EP1394769A2 (en
Inventor
Alistair D. Conkie
Yeon-Jun Kim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Priority to EP07116265A priority Critical patent/EP1860645A3/en
Priority to EP07116266A priority patent/EP1860646A3/en
Publication of EP1394769A2 publication Critical patent/EP1394769A2/en
Publication of EP1394769A3 publication Critical patent/EP1394769A3/en
Application granted granted Critical
Publication of EP1394769B1 publication Critical patent/EP1394769B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules

Abstract

Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
EP03100795A 2002-03-29 2003-03-27 Automatic segmentation in speech synthesis Expired - Lifetime EP1394769B1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP07116265A EP1860645A3 (en) 2002-03-29 2003-03-27 Automatic segmentation in speech synthesis
EP07116266A EP1860646A3 (en) 2002-03-29 2003-03-27 Automatic segmentaion in speech synthesis

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US36904302P 2002-03-29 2002-03-29
US369043 2002-03-29
US10/341,869 US7266497B2 (en) 2002-03-29 2003-01-14 Automatic segmentation in speech synthesis
US341869 2003-01-14

Related Child Applications (4)

Application Number Title Priority Date Filing Date
EP07116266A Division EP1860646A3 (en) 2002-03-29 2003-03-27 Automatic segmentaion in speech synthesis
EP07116265A Division EP1860645A3 (en) 2002-03-29 2003-03-27 Automatic segmentation in speech synthesis
EP07116265.5 Division-Into 2007-09-12
EP07116266.3 Division-Into 2007-09-12

Publications (3)

Publication Number Publication Date
EP1394769A2 EP1394769A2 (en) 2004-03-03
EP1394769A3 true EP1394769A3 (en) 2004-06-09
EP1394769B1 EP1394769B1 (en) 2011-02-23

Family

ID=28457009

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03100795A Expired - Lifetime EP1394769B1 (en) 2002-03-29 2003-03-27 Automatic segmentation in speech synthesis

Country Status (4)

Country Link
US (3) US7266497B2 (en)
EP (1) EP1394769B1 (en)
CA (1) CA2423144C (en)
DE (1) DE60336102D1 (en)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7369994B1 (en) 1999-04-30 2008-05-06 At&T Corp. Methods and apparatus for rapid acoustic unit selection from a large speech corpus
US6684187B1 (en) 2000-06-30 2004-01-27 At&T Corp. Method and system for preselection of suitable units for concatenative speech
US6505158B1 (en) * 2000-07-05 2003-01-07 At&T Corp. Synthesis-based pre-selection of suitable units for concatenative speech
US7266497B2 (en) * 2002-03-29 2007-09-04 At&T Corp. Automatic segmentation in speech synthesis
JP4150645B2 (en) * 2003-08-27 2008-09-17 株式会社ケンウッド Audio labeling error detection device, audio labeling error detection method and program
TWI220511B (en) * 2003-09-12 2004-08-21 Ind Tech Res Inst An automatic speech segmentation and verification system and its method
US7496512B2 (en) * 2004-04-13 2009-02-24 Microsoft Corporation Refining of segmental boundaries in speech waveforms using contextual-dependent models
US20070203706A1 (en) * 2005-12-30 2007-08-30 Inci Ozkaragoz Voice analysis tool for creating database used in text to speech synthesis system
JP4246790B2 (en) * 2006-06-05 2009-04-02 パナソニック株式会社 Speech synthesizer
US9620117B1 (en) * 2006-06-27 2017-04-11 At&T Intellectual Property Ii, L.P. Learning from interactions for a spoken dialog system
US20080027725A1 (en) * 2006-07-26 2008-01-31 Microsoft Corporation Automatic Accent Detection With Limited Manually Labeled Data
US20080077407A1 (en) * 2006-09-26 2008-03-27 At&T Corp. Phonetically enriched labeling in unit selection speech synthesis
US8321222B2 (en) * 2007-08-14 2012-11-27 Nuance Communications, Inc. Synthesis by generation and concatenation of multi-form segments
CA2657087A1 (en) * 2008-03-06 2009-09-06 David N. Fernandes Normative database system and method
US8095365B2 (en) * 2008-12-04 2012-01-10 At&T Intellectual Property I, L.P. System and method for increasing recognition rates of in-vocabulary words by improving pronunciation modeling
JP5457706B2 (en) * 2009-03-30 2014-04-02 株式会社東芝 Speech model generation device, speech synthesis device, speech model generation program, speech synthesis program, speech model generation method, and speech synthesis method
US8457965B2 (en) * 2009-10-06 2013-06-04 Rothenberg Enterprises Method for the correction of measured values of vowel nasalance
US8630971B2 (en) * 2009-11-20 2014-01-14 Indian Institute Of Science System and method of using Multi Pattern Viterbi Algorithm for joint decoding of multiple patterns
US20140074465A1 (en) * 2012-09-11 2014-03-13 Delphi Technologies, Inc. System and method to generate a narrator specific acoustic database without a predefined script
US20140244240A1 (en) * 2013-02-27 2014-08-28 Hewlett-Packard Development Company, L.P. Determining Explanatoriness of a Segment
US9646613B2 (en) * 2013-11-29 2017-05-09 Daon Holdings Limited Methods and systems for splitting a digital signal
US9240178B1 (en) * 2014-06-26 2016-01-19 Amazon Technologies, Inc. Text-to-speech processing using pre-stored results
US9972300B2 (en) * 2015-06-11 2018-05-15 Genesys Telecommunications Laboratories, Inc. System and method for outlier identification to remove poor alignments in speech synthesis
CN105513597B (en) * 2015-12-30 2018-07-10 百度在线网络技术(北京)有限公司 Voiceprint processing method and processing device
CN108053828A (en) * 2017-12-25 2018-05-18 无锡小天鹅股份有限公司 Determine the method, apparatus and household electrical appliance of control instruction
CN110136691B (en) * 2019-05-28 2021-09-28 广州多益网络股份有限公司 Speech synthesis model training method and device, electronic equipment and storage medium
CN114547551B (en) * 2022-02-23 2023-08-29 阿波罗智能技术(北京)有限公司 Road surface data acquisition method based on vehicle report data and cloud server

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1035537A2 (en) * 1999-03-09 2000-09-13 Matsushita Electric Industrial Co., Ltd. Identification of unit overlap regions for concatenative speech synthesis system

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5390278A (en) * 1991-10-08 1995-02-14 Bell Canada Phoneme based speech recognition
EP0559349B1 (en) * 1992-03-02 1999-01-07 AT&T Corp. Training method and apparatus for speech recognition
US5317673A (en) * 1992-06-22 1994-05-31 Sri International Method and apparatus for context-dependent estimation of multiple probability distributions of phonetic classes with multilayer perceptrons in a speech recognition system
JP3272842B2 (en) * 1992-12-17 2002-04-08 ゼロックス・コーポレーション Processor-based decision method
US5623609A (en) * 1993-06-14 1997-04-22 Hal Trust, L.L.C. Computer system and computer-implemented process for phonology-based automatic speech recognition
JP3450411B2 (en) * 1994-03-22 2003-09-22 キヤノン株式会社 Voice information processing method and apparatus
US5655058A (en) * 1994-04-12 1997-08-05 Xerox Corporation Segmentation of audio data for indexing of conversational speech for real-time or postprocessing applications
US5625749A (en) * 1994-08-22 1997-04-29 Massachusetts Institute Of Technology Segment-based apparatus and method for speech recognition by analyzing multiple speech unit frames and modeling both temporal and spatial correlation
US5687287A (en) * 1995-05-22 1997-11-11 Lucent Technologies Inc. Speaker verification method and apparatus using mixture decomposition discrimination
JP3453456B2 (en) * 1995-06-19 2003-10-06 キヤノン株式会社 State sharing model design method and apparatus, and speech recognition method and apparatus using the state sharing model
JP2871561B2 (en) * 1995-11-30 1999-03-17 株式会社エイ・ティ・アール音声翻訳通信研究所 Unspecified speaker model generation device and speech recognition device
DE69712277T2 (en) * 1996-02-27 2002-12-19 Koninkl Philips Electronics Nv METHOD AND DEVICE FOR AUTOMATIC VOICE SEGMENTATION IN PHONEMIC UNITS
US5913193A (en) * 1996-04-30 1999-06-15 Microsoft Corporation Method and system of runtime acoustic unit selection for speech synthesis
US6076057A (en) * 1997-05-21 2000-06-13 At&T Corp Unsupervised HMM adaptation based on speech-silence discrimination
US5913192A (en) * 1997-08-22 1999-06-15 At&T Corp Speaker identification with user-selected password phrases
US6317716B1 (en) * 1997-09-19 2001-11-13 Massachusetts Institute Of Technology Automatic cueing of speech
US6163769A (en) * 1997-10-02 2000-12-19 Microsoft Corporation Text-to-speech using clustered context-dependent phoneme-based units
US6202047B1 (en) * 1998-03-30 2001-03-13 At&T Corp. Method and apparatus for speech recognition using second order statistics and linear estimation of cepstral coefficients
US6292778B1 (en) * 1998-10-30 2001-09-18 Lucent Technologies Inc. Task-independent utterance verification with subword-based minimum verification error training
JP2002530703A (en) * 1998-11-13 2002-09-17 ルノー・アンド・オスピー・スピーチ・プロダクツ・ナームローゼ・ベンノートシャープ Speech synthesis using concatenation of speech waveforms
WO2000054254A1 (en) * 1999-03-08 2000-09-14 Siemens Aktiengesellschaft Method and array for determining a representative phoneme
US6539354B1 (en) * 2000-03-24 2003-03-25 Fluent Speech Technologies, Inc. Methods and devices for producing and using synthetic visual speech based on natural coarticulation
US7120575B2 (en) * 2000-04-08 2006-10-10 International Business Machines Corporation Method and system for the automatic segmentation of an audio stream into semantic or syntactic units
US7165030B2 (en) * 2001-09-17 2007-01-16 Massachusetts Institute Of Technology Concatenative speech synthesis using a finite-state transducer
US6965861B1 (en) * 2001-11-20 2005-11-15 Burning Glass Technologies, Llc Method for improving results in an HMM-based segmentation system by incorporating external knowledge
US7266497B2 (en) * 2002-03-29 2007-09-04 At&T Corp. Automatic segmentation in speech synthesis
US6928407B2 (en) * 2002-03-29 2005-08-09 International Business Machines Corporation System and method for the automatic discovery of salient segments in speech transcripts
US7089185B2 (en) * 2002-06-27 2006-08-08 Intel Corporation Embedded multi-layer coupled hidden Markov model
KR100486735B1 (en) * 2003-02-28 2005-05-03 삼성전자주식회사 Method of establishing optimum-partitioned classifed neural network and apparatus and method and apparatus for automatic labeling using optimum-partitioned classifed neural network
US7664642B2 (en) * 2004-03-17 2010-02-16 University Of Maryland System and method for automatic speech recognition from phonetic features and acoustic landmarks
US7496512B2 (en) * 2004-04-13 2009-02-24 Microsoft Corporation Refining of segmental boundaries in speech waveforms using contextual-dependent models

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1035537A2 (en) * 1999-03-09 2000-09-13 Matsushita Electric Industrial Co., Ltd. Identification of unit overlap regions for concatenative speech synthesis system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BRUGNARA F ET AL: "AUTOMATIC SEGMENTATION AND LABELING OF SPEECH BASED ON HIDDEN MARKOV MODELS", SPEECH COMMUNICATION, ELSEVIER SCIENCE PUBLISHERS, AMSTERDAM, NL, vol. 12, no. 4, 1 August 1993 (1993-08-01), pages 357 - 370, XP000393652, ISSN: 0167-6393 *
HON H ET AL: "Automatic generation of synthesis units for trainable text-to-speech systems", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 1998. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON SEATTLE, WA, USA 12-15 MAY 1998, NEW YORK, NY, USA,IEEE, US, 12 May 1998 (1998-05-12), pages 293 - 296, XP010279159, ISBN: 0-7803-4428-6 *
TOLEDANO D T: "Neural network boundary refining for automatic speech segmentation", 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL, vol. 6, 5 June 2000 (2000-06-05), pages 3438 - 3441, XP010505636 *

Also Published As

Publication number Publication date
US8131547B2 (en) 2012-03-06
US20070271100A1 (en) 2007-11-22
DE60336102D1 (en) 2011-04-07
EP1394769B1 (en) 2011-02-23
CA2423144C (en) 2009-06-23
US20030187647A1 (en) 2003-10-02
US7587320B2 (en) 2009-09-08
US20090313025A1 (en) 2009-12-17
EP1394769A2 (en) 2004-03-03
US7266497B2 (en) 2007-09-04
CA2423144A1 (en) 2003-09-29

Similar Documents

Publication Publication Date Title
EP1394769A3 (en) Automatic segmentation in speech synthesis
AU2003217013A1 (en) System for estimating parameters of a gaussian mixture model
EP1050872A3 (en) Method and system for selecting recognized words when correcting recognized speech
EP2019985B1 (en) Method for changing over from a first adaptive data processing version to a second adaptive data processing version
WO2007005098A3 (en) Method and apparatus for generating and updating a voice tag
EP2388778B1 (en) Speech recognition
WO2004075027A3 (en) A method for form completion using speech recognition and text comparison
WO2004090866A3 (en) Phonetically based speech recognition system and method
WO2003030150A1 (en) Dialogue apparatus, dialogue parent apparatus, dialogue child apparatus, dialogue control method, and dialogue control program
WO2007118100A3 (en) Automatic language model update
WO2006086511A8 (en) Method and apparatus utilizing voice input to resolve ambiguous manually entered text input
WO2004017175A3 (en) System and method for automating firmware maintenance
WO2005077098A8 (en) Handwriting and voice input with automatic correction
WO2007047587A3 (en) Method and device for recognizing human intent
WO2007136723A3 (en) System and method for prolonging wireless data product's life
EP1465153A3 (en) Method and apparatus for formant tracking using a residual model
WO2005015546A8 (en) Speech input interface for dialog systems
EP1553560A4 (en) Transmission device, transmission method, reception device, reception method, transmission/reception device, communication device, communication method, recording medium, and program
WO2004003697A3 (en) Swine genetics business system
EP1548042A3 (en) Highly elastic polyurethane hot melt adhesives
CN109753665A (en) Wake up the update method and device of model
Rodríguez et al. Computer assisted transcription of speech
DE60219030D1 (en) Method for multilingual speech recognition
WO2007067837A3 (en) Voice quality control for high quality speech reconstruction
Bansal et al. Speech synthesis–automatic segmentation

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK RO

17P Request for examination filed

Effective date: 20040715

AKX Designation fees paid

Designated state(s): DE FI FR GB NL

17Q First examination report despatched

Effective date: 20070504

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: AT&T CORP.

APBK Appeal reference recorded

Free format text: ORIGINAL CODE: EPIDOSNREFNE

APBN Date of receipt of notice of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA2E

APBR Date of receipt of statement of grounds of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA3E

APBV Interlocutory revision of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNIRAPE

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FI FR GB NL

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 60336102

Country of ref document: DE

Date of ref document: 20110407

Kind code of ref document: P

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 60336102

Country of ref document: DE

Effective date: 20110407

REG Reference to a national code

Ref country code: NL

Ref legal event code: VDEP

Effective date: 20110223

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20110223

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20111124

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 60336102

Country of ref document: DE

Effective date: 20111124

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 14

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 60336102

Country of ref document: DE

Representative=s name: MARKS & CLERK (LUXEMBOURG) LLP, LU

Ref country code: DE

Ref legal event code: R081

Ref document number: 60336102

Country of ref document: DE

Owner name: AT&T INTELLECTUAL PROPERTY II, L.P., ATLANTA, US

Free format text: FORMER OWNER: AT&T CORP., NEW YORK, N.Y., US

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 15

REG Reference to a national code

Ref country code: GB

Ref legal event code: 732E

Free format text: REGISTERED BETWEEN 20170914 AND 20170920

REG Reference to a national code

Ref country code: FR

Ref legal event code: TP

Owner name: AT&T INTELLECTUAL PROPERTY II, L.P., US

Effective date: 20180104

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20220203

Year of fee payment: 20

Ref country code: FI

Payment date: 20220309

Year of fee payment: 20

Ref country code: DE

Payment date: 20220203

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20220210

Year of fee payment: 20

REG Reference to a national code

Ref country code: DE

Ref legal event code: R071

Ref document number: 60336102

Country of ref document: DE

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

Expiry date: 20230326

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20230326