GB2484615A - A text to speech method and system - Google Patents
A text to speech method and system Download PDFInfo
- Publication number
- GB2484615A GB2484615A GB1200335.6A GB201200335A GB2484615A GB 2484615 A GB2484615 A GB 2484615A GB 201200335 A GB201200335 A GB 201200335A GB 2484615 A GB2484615 A GB 2484615A
- Authority
- GB
- United Kingdom
- Prior art keywords
- sequence
- language
- speech
- text
- acoustic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
Abstract
A text-to-speech method for use in a plurality of languages, said method comprising: inputting text in a selected language; dividing said inputted text into a sequence of acoustic units; converting said sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein said model has a plurality of model parameters describing probability distributions which relate an acoustic unit to a speech vector; and outputting said sequence of speech vectors as audio in said selected language, wherein a parameter of a predetermined type of each probability distribution in said selected language is expressed as a weighted sum of language independent parameters of the same type, and wherein the weighting used is language dependent, such that converting said sequence of acoustic units to a sequence of speech vectors comprises retrieving the language dependent weights for said selected language.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/GB2009/001464 WO2010142928A1 (en) | 2009-06-10 | 2009-06-10 | A text to speech method and system |
Publications (3)
Publication Number | Publication Date |
---|---|
GB201200335D0 GB201200335D0 (en) | 2012-02-22 |
GB2484615A true GB2484615A (en) | 2012-04-18 |
GB2484615B GB2484615B (en) | 2013-05-08 |
Family
ID=41278515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB1200335.6A Active GB2484615B (en) | 2009-06-10 | 2009-06-10 | A text to speech method and system |
Country Status (4)
Country | Link |
---|---|
US (1) | US8825485B2 (en) |
JP (1) | JP5398909B2 (en) |
GB (1) | GB2484615B (en) |
WO (1) | WO2010142928A1 (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9798653B1 (en) * | 2010-05-05 | 2017-10-24 | Nuance Communications, Inc. | Methods, apparatus and data structure for cross-language speech adaptation |
US20130030789A1 (en) * | 2011-07-29 | 2013-01-31 | Reginald Dalce | Universal Language Translator |
US8478278B1 (en) | 2011-08-12 | 2013-07-02 | Amazon Technologies, Inc. | Location based call routing to subject matter specialist |
GB2501062B (en) * | 2012-03-14 | 2014-08-13 | Toshiba Res Europ Ltd | A text to speech method and system |
GB2501067B (en) * | 2012-03-30 | 2014-12-03 | Toshiba Kk | A text to speech system |
JP5706368B2 (en) * | 2012-05-17 | 2015-04-22 | 日本電信電話株式会社 | Speech conversion function learning device, speech conversion device, speech conversion function learning method, speech conversion method, and program |
GB2505400B (en) * | 2012-07-18 | 2015-01-07 | Toshiba Res Europ Ltd | A speech processing system |
GB2508417B (en) * | 2012-11-30 | 2017-02-08 | Toshiba Res Europe Ltd | A speech processing system |
GB2508411B (en) * | 2012-11-30 | 2015-10-28 | Toshiba Res Europ Ltd | Speech synthesis |
GB2510200B (en) | 2013-01-29 | 2017-05-10 | Toshiba Res Europe Ltd | A computer generated head |
JP6091938B2 (en) * | 2013-03-07 | 2017-03-08 | 株式会社東芝 | Speech synthesis dictionary editing apparatus, speech synthesis dictionary editing method, and speech synthesis dictionary editing program |
GB2516965B (en) | 2013-08-08 | 2018-01-31 | Toshiba Res Europe Limited | Synthetic audiovisual storyteller |
GB2517503B (en) * | 2013-08-23 | 2016-12-28 | Toshiba Res Europe Ltd | A speech processing system and method |
JP6392012B2 (en) | 2014-07-14 | 2018-09-19 | 株式会社東芝 | Speech synthesis dictionary creation device, speech synthesis device, speech synthesis dictionary creation method, and speech synthesis dictionary creation program |
CN111566655B (en) * | 2018-01-11 | 2024-02-06 | 新智株式会社 | Multi-language text-to-speech synthesis method |
GB201804073D0 (en) * | 2018-03-14 | 2018-04-25 | Papercup Tech Limited | A speech processing system and a method of processing a speech signal |
CN111798832A (en) * | 2019-04-03 | 2020-10-20 | 北京京东尚科信息技术有限公司 | Speech synthesis method, apparatus and computer-readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009026270A2 (en) * | 2007-08-20 | 2009-02-26 | Microsoft Corporation | Hmm-based bilingual (mandarin-english) tts techniques |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2296846A (en) * | 1995-01-07 | 1996-07-10 | Ibm | Synthesising speech from text |
US7496498B2 (en) | 2003-03-24 | 2009-02-24 | Microsoft Corporation | Front-end architecture for a multi-lingual text-to-speech system |
US8583418B2 (en) * | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
-
2009
- 2009-06-10 WO PCT/GB2009/001464 patent/WO2010142928A1/en active Application Filing
- 2009-06-10 JP JP2012514523A patent/JP5398909B2/en active Active
- 2009-06-10 US US13/377,706 patent/US8825485B2/en active Active
- 2009-06-10 GB GB1200335.6A patent/GB2484615B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009026270A2 (en) * | 2007-08-20 | 2009-02-26 | Microsoft Corporation | Hmm-based bilingual (mandarin-english) tts techniques |
Non-Patent Citations (3)
Title |
---|
BLACK A and SCHULTZ T: Speaker clustering for multilingual synthesis. Multiling-2006, 024, 9 April 2006-11 April 2006 pages 1-5. XP002556503. Stellenbosch, South Africa. Page 2, right-hand column, paragrpah 3. Page 4, left-hand column, paragrpah 5.1 * |
LATORRE J ET AL: New approach to the polygot speech generation by means of an HMM-based speaker adaptable synthesizer. Sppech communication, elsevier science publishers. Amsterdam, NL. Vol 48, no. 10. 1st October 2006, Pages 1227-1242. XP025056845. ISSN: 0167-6393. Page 1229, right-hand paragraph 4. * |
ZEN H et al: Statisticial parametric speech synthesis. Speech communication, elsevier science publishers, Amsterdam. NL. Vol 51, no. 11. 1 November 2009, pages 1039-1064. XP026349492. ISSN: 0167-6393. * |
Also Published As
Publication number | Publication date |
---|---|
JP5398909B2 (en) | 2014-01-29 |
WO2010142928A1 (en) | 2010-12-16 |
GB201200335D0 (en) | 2012-02-22 |
US20120278081A1 (en) | 2012-11-01 |
JP2012529664A (en) | 2012-11-22 |
GB2484615B (en) | 2013-05-08 |
US8825485B2 (en) | 2014-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
GB2484615A (en) | A text to speech method and system | |
GB201212783D0 (en) | A speech processing system | |
GB2507674A (en) | Statistical enhancement of speech output from statistical text-to-speech synthesis system | |
WO2018183650A3 (en) | End-to-end text-to-speech conversion | |
US9767788B2 (en) | Method and apparatus for speech synthesis based on large corpus | |
MX2016013015A (en) | Methods and systems of handling a dialog with a robot. | |
CN108231062B (en) | Voice translation method and device | |
PH12016502120B1 (en) | Coding vectors decomposed from higher-order ambisonics audio signals | |
CN106611597A (en) | Voice wakeup method and voice wakeup device based on artificial intelligence | |
EP4318463A3 (en) | Multi-modal input on an electronic device | |
EP2499582A4 (en) | System and method for hybrid processing in a natural language voive services environment | |
WO2013003772A3 (en) | Speech recognition using variable-length context | |
PL401372A1 (en) | Hybrid compression of voice data in the text to speech conversion systems | |
CN105118501A (en) | Speech recognition method and system | |
GB2466674B (en) | Speech coding | |
GB2506278A (en) | Voice transformation with encoded information | |
WO2014052326A3 (en) | Apparatus and methods for managing resources for a system using voice recognition | |
WO2013032252A3 (en) | Apparatus and method for translation using a translation tree structure in a portable terminal | |
Pettorino et al. | Transplanting native prosody into second language speech | |
Cai et al. | Fast learning of deep neural networks via singular value decomposition | |
JP2014048443A (en) | Voice synthesis system, voice synthesis method, and voice synthesis program | |
Yoon et al. | An analysis of the vowel formants of the young males in the Buckeye corpus | |
WO2017082717A3 (en) | Method and system for text to speech synthesis | |
Yang | An analysis of short and long syllables of sino-Korean words produced by college students with Kyungsang dialect | |
Wang et al. | Generating Adversarial Samples For Training Wake-up Word Detection Systems Against Confusing Words |