BR112016016310B1 - Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros - Google Patents
Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros Download PDFInfo
- Publication number
- BR112016016310B1 BR112016016310B1 BR112016016310-9A BR112016016310A BR112016016310B1 BR 112016016310 B1 BR112016016310 B1 BR 112016016310B1 BR 112016016310 A BR112016016310 A BR 112016016310A BR 112016016310 B1 BR112016016310 B1 BR 112016016310B1
- Authority
- BR
- Brazil
- Prior art keywords
- parameters
- frame
- speech
- segment
- parameter
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 81
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 14
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 17
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 17
- 230000008569 process Effects 0.000 claims description 47
- 230000003595 spectral effect Effects 0.000 claims description 25
- 230000008859 change Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 claims description 6
- 238000005192 partition Methods 0.000 claims description 5
- 238000000638 solvent extraction Methods 0.000 claims description 3
- 241000269627 Amphiuma means Species 0.000 claims 1
- 230000001131 transforming effect Effects 0.000 claims 1
- 230000003278 mimic effect Effects 0.000 abstract description 4
- 238000012805 post-processing Methods 0.000 abstract description 2
- 230000003068 static effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000012935 Averaging Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013179 statistical model Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
- Document Processing Apparatus (AREA)
Applications Claiming Priority (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201461927152P | 2014-01-14 | 2014-01-14 | |
| US61/927,152 | 2014-01-14 | ||
| PCT/US2015/011348 WO2015108935A1 (en) | 2014-01-14 | 2015-01-14 | System and method for synthesis of speech from provided text |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| BR112016016310A2 BR112016016310A2 (https=) | 2017-08-08 |
| BR112016016310B1 true BR112016016310B1 (pt) | 2022-06-07 |
Family
ID=53521887
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| BR112016016310-9A BR112016016310B1 (pt) | 2014-01-14 | 2015-01-14 | Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros |
Country Status (9)
| Country | Link |
|---|---|
| US (2) | US9911407B2 (https=) |
| EP (1) | EP3095112B1 (https=) |
| JP (1) | JP6614745B2 (https=) |
| AU (2) | AU2015206631A1 (https=) |
| BR (1) | BR112016016310B1 (https=) |
| CA (1) | CA2934298C (https=) |
| CL (1) | CL2016001802A1 (https=) |
| WO (1) | WO2015108935A1 (https=) |
| ZA (1) | ZA201604177B (https=) |
Families Citing this family (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2017046887A1 (ja) * | 2015-09-16 | 2017-03-23 | 株式会社東芝 | 音声合成装置、音声合成方法、音声合成プログラム、音声合成モデル学習装置、音声合成モデル学習方法及び音声合成モデル学習プログラム |
| US10249314B1 (en) * | 2016-07-21 | 2019-04-02 | Oben, Inc. | Voice conversion system and method with variance and spectrum compensation |
| US10872598B2 (en) * | 2017-02-24 | 2020-12-22 | Baidu Usa Llc | Systems and methods for real-time neural text-to-speech |
| US10896669B2 (en) | 2017-05-19 | 2021-01-19 | Baidu Usa Llc | Systems and methods for multi-speaker neural text-to-speech |
| US10872596B2 (en) | 2017-10-19 | 2020-12-22 | Baidu Usa Llc | Systems and methods for parallel wave generation in end-to-end text-to-speech |
| CN108962217B (zh) * | 2018-07-28 | 2021-07-16 | 华为技术有限公司 | 语音合成方法及相关设备 |
| CN109285535A (zh) * | 2018-10-11 | 2019-01-29 | 四川长虹电器股份有限公司 | 基于前端设计的语音合成方法 |
| CN109785823B (zh) * | 2019-01-22 | 2021-04-02 | 中财颐和科技发展(北京)有限公司 | 语音合成方法及系统 |
| US11514634B2 (en) | 2020-06-12 | 2022-11-29 | Baidu Usa Llc | Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses |
| US11587548B2 (en) * | 2020-06-12 | 2023-02-21 | Baidu Usa Llc | Text-driven video synthesis with phonetic dictionary |
| CN121237074A (zh) * | 2024-06-28 | 2025-12-30 | 腾讯科技(深圳)有限公司 | 音频处理方法、相关装置和介质 |
Family Cites Families (23)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| DE69620967T2 (de) * | 1995-09-19 | 2002-11-07 | At & T Corp., New York | Synthese von Sprachsignalen in Abwesenheit kodierter Parameter |
| US6567777B1 (en) * | 2000-08-02 | 2003-05-20 | Motorola, Inc. | Efficient magnitude spectrum approximation |
| US6970820B2 (en) * | 2001-02-26 | 2005-11-29 | Matsushita Electric Industrial Co., Ltd. | Voice personalization of speech synthesizer |
| US6792407B2 (en) * | 2001-03-30 | 2004-09-14 | Matsushita Electric Industrial Co., Ltd. | Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems |
| GB0113570D0 (en) * | 2001-06-04 | 2001-07-25 | Hewlett Packard Co | Audio-form presentation of text messages |
| US20030028377A1 (en) * | 2001-07-31 | 2003-02-06 | Noyes Albert W. | Method and device for synthesizing and distributing voice types for voice-enabled devices |
| CA2365203A1 (en) * | 2001-12-14 | 2003-06-14 | Voiceage Corporation | A signal modification method for efficient coding of speech signals |
| US7096183B2 (en) | 2002-02-27 | 2006-08-22 | Matsushita Electric Industrial Co., Ltd. | Customizing the speaking style of a speech synthesizer based on semantic analysis |
| US7136816B1 (en) * | 2002-04-05 | 2006-11-14 | At&T Corp. | System and method for predicting prosodic parameters |
| CN1692403A (zh) * | 2002-10-04 | 2005-11-02 | 皇家飞利浦电子股份有限公司 | 具有个人化语音段的语音合成设备 |
| US6961704B1 (en) | 2003-01-31 | 2005-11-01 | Speechworks International, Inc. | Linguistic prosodic model-based text to speech |
| US8886538B2 (en) | 2003-09-26 | 2014-11-11 | Nuance Communications, Inc. | Systems and methods for text-to-speech synthesis using spoken example |
| US7567896B2 (en) | 2004-01-16 | 2009-07-28 | Nuance Communications, Inc. | Corpus-based speech synthesis based on segment recombination |
| US7693719B2 (en) * | 2004-10-29 | 2010-04-06 | Microsoft Corporation | Providing personalized voice font for text-to-speech applications |
| US20100030557A1 (en) * | 2006-07-31 | 2010-02-04 | Stephen Molloy | Voice and text communication system, method and apparatus |
| JP4455610B2 (ja) * | 2007-03-28 | 2010-04-21 | 株式会社東芝 | 韻律パタン生成装置、音声合成装置、プログラムおよび韻律パタン生成方法 |
| JP5457706B2 (ja) * | 2009-03-30 | 2014-04-02 | 株式会社東芝 | 音声モデル生成装置、音声合成装置、音声モデル生成プログラム、音声合成プログラム、音声モデル生成方法および音声合成方法 |
| EP2507794B1 (en) * | 2009-12-02 | 2018-10-17 | Agnitio S.L. | Obfuscated speech synthesis |
| US20120143611A1 (en) * | 2010-12-07 | 2012-06-07 | Microsoft Corporation | Trajectory Tiling Approach for Text-to-Speech |
| CN102651217A (zh) | 2011-02-25 | 2012-08-29 | 株式会社东芝 | 用于合成语音的方法、设备以及用于语音合成的声学模型训练方法 |
| CN102270449A (zh) | 2011-08-10 | 2011-12-07 | 歌尔声学股份有限公司 | 参数语音合成方法和系统 |
| JP5631915B2 (ja) | 2012-03-29 | 2014-11-26 | 株式会社東芝 | 音声合成装置、音声合成方法、音声合成プログラムならびに学習装置 |
| EP3114584B1 (en) | 2014-03-04 | 2021-06-23 | Interactive Intelligence Group, Inc. | Optimization of audio fingerprint search |
-
2015
- 2015-01-14 CA CA2934298A patent/CA2934298C/en active Active
- 2015-01-14 EP EP15737007.3A patent/EP3095112B1/en active Active
- 2015-01-14 JP JP2016542126A patent/JP6614745B2/ja active Active
- 2015-01-14 US US14/596,628 patent/US9911407B2/en active Active
- 2015-01-14 WO PCT/US2015/011348 patent/WO2015108935A1/en not_active Ceased
- 2015-01-14 AU AU2015206631A patent/AU2015206631A1/en not_active Abandoned
- 2015-01-14 BR BR112016016310-9A patent/BR112016016310B1/pt active IP Right Grant
-
2016
- 2016-06-21 ZA ZA2016/04177A patent/ZA201604177B/en unknown
- 2016-07-14 CL CL2016001802A patent/CL2016001802A1/es unknown
-
2018
- 2018-01-18 US US15/874,612 patent/US10733974B2/en active Active
-
2020
- 2020-05-29 AU AU2020203559A patent/AU2020203559B2/en active Active
Also Published As
| Publication number | Publication date |
|---|---|
| AU2015206631A1 (en) | 2016-06-30 |
| WO2015108935A1 (en) | 2015-07-23 |
| CL2016001802A1 (es) | 2016-12-23 |
| EP3095112B1 (en) | 2019-10-30 |
| ZA201604177B (en) | 2018-11-28 |
| US20180144739A1 (en) | 2018-05-24 |
| NZ721092A (en) | 2021-03-26 |
| EP3095112A4 (en) | 2017-09-13 |
| US20150199956A1 (en) | 2015-07-16 |
| EP3095112A1 (en) | 2016-11-23 |
| AU2020203559B2 (en) | 2021-10-28 |
| US10733974B2 (en) | 2020-08-04 |
| JP6614745B2 (ja) | 2019-12-04 |
| JP2017502349A (ja) | 2017-01-19 |
| US9911407B2 (en) | 2018-03-06 |
| CA2934298A1 (en) | 2015-07-23 |
| BR112016016310A2 (https=) | 2017-08-08 |
| CA2934298C (en) | 2023-03-07 |
| AU2020203559A1 (en) | 2020-06-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| BR112016016310B1 (pt) | Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros | |
| US10497362B2 (en) | System and method for outlier identification to remove poor alignments in speech synthesis | |
| US20170309271A1 (en) | Speaking-rate normalized prosodic parameter builder, speaking-rate dependent prosodic model builder, speaking-rate controlled prosodic-information generation device and prosodic-information generation method able to learn different languages and mimic various speakers' speaking styles | |
| Wu et al. | Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features. | |
| AU2020205275B2 (en) | System and method for outlier identification to remove poor alignments in speech synthesis | |
| CN105474307A (zh) | 定量的f0轮廓生成装置及方法、以及用于生成f0轮廓的模型学习装置及方法 | |
| Pradhan et al. | A syllable based statistical text to speech system | |
| Liu et al. | Modeling partial pronunciation variations for spontaneous Mandarin speech recognition | |
| Mustafa et al. | Emotional speech acoustic model for Malay: iterative versus isolated unit training | |
| Mustafa et al. | Developing an HMM-based speech synthesis system for Malay: a comparison of iterative and isolated unit training | |
| Kubo et al. | Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors. | |
| Jokisch et al. | Multi-level rhythm control for speech synthesis using hybrid data driven and rule-based approaches. | |
| Anumanchipalli et al. | A style capturing approach to F0 transformation in voice conversion | |
| Matsuda et al. | Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis | |
| NZ721092B2 (en) | System and method for synthesis of speech from provided text | |
| Astrinaki et al. | sHTS: A streaming architecture for statistical parametric speech synthesis | |
| Kuczmarski | Overview of HMM-based Speech Synthesis Methods | |
| Wu et al. | Development of hmm-based malay text-to-speech system | |
| Hentschel et al. | Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs | |
| Shah et al. | Deterministic annealing EM algorithm for developing TTS system in Gujarati | |
| Petrov | Structured Acoustic Models for Speech Recognition | |
| Majji | Building a Tamil Text-to-Speech Synthesizer using Festival | |
| Sazhok | Phoneme Recognition Output Post-Processing for Word Sequences Decoding |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| B06U | Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette] | ||
| B09A | Decision: intention to grant [chapter 9.1 patent gazette] | ||
| B16A | Patent or certificate of addition of invention granted [chapter 16.1 patent gazette] |
Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 14/01/2015, OBSERVADAS AS CONDICOES LEGAIS |