GB2484615A - A text to speech method and system - Google Patents

A text to speech method and system Download PDF

Info

Publication number
GB2484615A
GB2484615A GB1200335.6A GB201200335A GB2484615A GB 2484615 A GB2484615 A GB 2484615A GB 201200335 A GB201200335 A GB 201200335A GB 2484615 A GB2484615 A GB 2484615A
Authority
GB
United Kingdom
Prior art keywords
sequence
language
speech
text
acoustic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB1200335.6A
Other versions
GB201200335D0 (en
GB2484615B (en
Inventor
Byung Ha Chun
Sacha Krstulovich
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Europe Ltd
Original Assignee
Toshiba Research Europe Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Research Europe Ltd filed Critical Toshiba Research Europe Ltd
Publication of GB201200335D0 publication Critical patent/GB201200335D0/en
Publication of GB2484615A publication Critical patent/GB2484615A/en
Application granted granted Critical
Publication of GB2484615B publication Critical patent/GB2484615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]

Abstract

A text-to-speech method for use in a plurality of languages, said method comprising: inputting text in a selected language; dividing said inputted text into a sequence of acoustic units; converting said sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein said model has a plurality of model parameters describing probability distributions which relate an acoustic unit to a speech vector; and outputting said sequence of speech vectors as audio in said selected language, wherein a parameter of a predetermined type of each probability distribution in said selected language is expressed as a weighted sum of language independent parameters of the same type, and wherein the weighting used is language dependent, such that converting said sequence of acoustic units to a sequence of speech vectors comprises retrieving the language dependent weights for said selected language.
GB1200335.6A 2009-06-10 2009-06-10 A text to speech method and system Active GB2484615B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/GB2009/001464 WO2010142928A1 (en) 2009-06-10 2009-06-10 A text to speech method and system

Publications (3)

Publication Number Publication Date
GB201200335D0 GB201200335D0 (en) 2012-02-22
GB2484615A true GB2484615A (en) 2012-04-18
GB2484615B GB2484615B (en) 2013-05-08

Family

ID=41278515

Family Applications (1)

Application Number Title Priority Date Filing Date
GB1200335.6A Active GB2484615B (en) 2009-06-10 2009-06-10 A text to speech method and system

Country Status (4)

Country Link
US (1) US8825485B2 (en)
JP (1) JP5398909B2 (en)
GB (1) GB2484615B (en)
WO (1) WO2010142928A1 (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9798653B1 (en) * 2010-05-05 2017-10-24 Nuance Communications, Inc. Methods, apparatus and data structure for cross-language speech adaptation
US20130030789A1 (en) * 2011-07-29 2013-01-31 Reginald Dalce Universal Language Translator
US8478278B1 (en) 2011-08-12 2013-07-02 Amazon Technologies, Inc. Location based call routing to subject matter specialist
GB2501062B (en) * 2012-03-14 2014-08-13 Toshiba Res Europ Ltd A text to speech method and system
GB2501067B (en) * 2012-03-30 2014-12-03 Toshiba Kk A text to speech system
JP5706368B2 (en) * 2012-05-17 2015-04-22 日本電信電話株式会社 Speech conversion function learning device, speech conversion device, speech conversion function learning method, speech conversion method, and program
GB2505400B (en) * 2012-07-18 2015-01-07 Toshiba Res Europ Ltd A speech processing system
GB2508417B (en) * 2012-11-30 2017-02-08 Toshiba Res Europe Ltd A speech processing system
GB2508411B (en) * 2012-11-30 2015-10-28 Toshiba Res Europ Ltd Speech synthesis
GB2510200B (en) 2013-01-29 2017-05-10 Toshiba Res Europe Ltd A computer generated head
JP6091938B2 (en) * 2013-03-07 2017-03-08 株式会社東芝 Speech synthesis dictionary editing apparatus, speech synthesis dictionary editing method, and speech synthesis dictionary editing program
GB2516965B (en) 2013-08-08 2018-01-31 Toshiba Res Europe Limited Synthetic audiovisual storyteller
GB2517503B (en) * 2013-08-23 2016-12-28 Toshiba Res Europe Ltd A speech processing system and method
JP6392012B2 (en) 2014-07-14 2018-09-19 株式会社東芝 Speech synthesis dictionary creation device, speech synthesis device, speech synthesis dictionary creation method, and speech synthesis dictionary creation program
CN111566655B (en) * 2018-01-11 2024-02-06 新智株式会社 Multi-language text-to-speech synthesis method
GB201804073D0 (en) * 2018-03-14 2018-04-25 Papercup Tech Limited A speech processing system and a method of processing a speech signal
CN111798832A (en) * 2019-04-03 2020-10-20 北京京东尚科信息技术有限公司 Speech synthesis method, apparatus and computer-readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009026270A2 (en) * 2007-08-20 2009-02-26 Microsoft Corporation Hmm-based bilingual (mandarin-english) tts techniques

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2296846A (en) * 1995-01-07 1996-07-10 Ibm Synthesising speech from text
US7496498B2 (en) 2003-03-24 2009-02-24 Microsoft Corporation Front-end architecture for a multi-lingual text-to-speech system
US8583418B2 (en) * 2008-09-29 2013-11-12 Apple Inc. Systems and methods of detecting language and natural language strings for text to speech synthesis

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009026270A2 (en) * 2007-08-20 2009-02-26 Microsoft Corporation Hmm-based bilingual (mandarin-english) tts techniques

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BLACK A and SCHULTZ T: Speaker clustering for multilingual synthesis. Multiling-2006, 024, 9 April 2006-11 April 2006 pages 1-5. XP002556503. Stellenbosch, South Africa. Page 2, right-hand column, paragrpah 3. Page 4, left-hand column, paragrpah 5.1 *
LATORRE J ET AL: New approach to the polygot speech generation by means of an HMM-based speaker adaptable synthesizer. Sppech communication, elsevier science publishers. Amsterdam, NL. Vol 48, no. 10. 1st October 2006, Pages 1227-1242. XP025056845. ISSN: 0167-6393. Page 1229, right-hand paragraph 4. *
ZEN H et al: Statisticial parametric speech synthesis. Speech communication, elsevier science publishers, Amsterdam. NL. Vol 51, no. 11. 1 November 2009, pages 1039-1064. XP026349492. ISSN: 0167-6393. *

Also Published As

Publication number Publication date
JP5398909B2 (en) 2014-01-29
WO2010142928A1 (en) 2010-12-16
GB201200335D0 (en) 2012-02-22
US20120278081A1 (en) 2012-11-01
JP2012529664A (en) 2012-11-22
GB2484615B (en) 2013-05-08
US8825485B2 (en) 2014-09-02

Similar Documents

Publication Publication Date Title
GB2484615A (en) A text to speech method and system
GB201212783D0 (en) A speech processing system
GB2507674A (en) Statistical enhancement of speech output from statistical text-to-speech synthesis system
WO2018183650A3 (en) End-to-end text-to-speech conversion
US9767788B2 (en) Method and apparatus for speech synthesis based on large corpus
MX2016013015A (en) Methods and systems of handling a dialog with a robot.
CN108231062B (en) Voice translation method and device
PH12016502120B1 (en) Coding vectors decomposed from higher-order ambisonics audio signals
CN106611597A (en) Voice wakeup method and voice wakeup device based on artificial intelligence
EP4318463A3 (en) Multi-modal input on an electronic device
EP2499582A4 (en) System and method for hybrid processing in a natural language voive services environment
WO2013003772A3 (en) Speech recognition using variable-length context
PL401372A1 (en) Hybrid compression of voice data in the text to speech conversion systems
CN105118501A (en) Speech recognition method and system
GB2466674B (en) Speech coding
GB2506278A (en) Voice transformation with encoded information
WO2014052326A3 (en) Apparatus and methods for managing resources for a system using voice recognition
WO2013032252A3 (en) Apparatus and method for translation using a translation tree structure in a portable terminal
Pettorino et al. Transplanting native prosody into second language speech
Cai et al. Fast learning of deep neural networks via singular value decomposition
JP2014048443A (en) Voice synthesis system, voice synthesis method, and voice synthesis program
Yoon et al. An analysis of the vowel formants of the young males in the Buckeye corpus
WO2017082717A3 (en) Method and system for text to speech synthesis
Yang An analysis of short and long syllables of sino-Korean words produced by college students with Kyungsang dialect
Wang et al. Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words