BR112016016310B1 - Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros - Google Patents

Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros Download PDF

Info

Publication number
BR112016016310B1
BR112016016310B1 BR112016016310-9A BR112016016310A BR112016016310B1 BR 112016016310 B1 BR112016016310 B1 BR 112016016310B1 BR 112016016310 A BR112016016310 A BR 112016016310A BR 112016016310 B1 BR112016016310 B1 BR 112016016310B1
Authority
BR
Brazil
Prior art keywords
parameters
frame
speech
segment
parameter
Prior art date
Application number
BR112016016310-9A
Other languages
English (en)
Portuguese (pt)
Other versions
BR112016016310A2 (https=
Inventor
Yingyi Tan
Aravind Ganapathiraju
Felix Immanuel Wyss
Original Assignee
Interactive Intelligence Group, Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Interactive Intelligence Group, Inc filed Critical Interactive Intelligence Group, Inc
Publication of BR112016016310A2 publication Critical patent/BR112016016310A2/pt
Publication of BR112016016310B1 publication Critical patent/BR112016016310B1/pt

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)
  • Document Processing Apparatus (AREA)
BR112016016310-9A 2014-01-14 2015-01-14 Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros BR112016016310B1 (pt)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201461927152P 2014-01-14 2014-01-14
US61/927,152 2014-01-14
PCT/US2015/011348 WO2015108935A1 (en) 2014-01-14 2015-01-14 System and method for synthesis of speech from provided text

Publications (2)

Publication Number Publication Date
BR112016016310A2 BR112016016310A2 (https=) 2017-08-08
BR112016016310B1 true BR112016016310B1 (pt) 2022-06-07

Family

ID=53521887

Family Applications (1)

Application Number Title Priority Date Filing Date
BR112016016310-9A BR112016016310B1 (pt) 2014-01-14 2015-01-14 Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros

Country Status (9)

Country Link
US (2) US9911407B2 (https=)
EP (1) EP3095112B1 (https=)
JP (1) JP6614745B2 (https=)
AU (2) AU2015206631A1 (https=)
BR (1) BR112016016310B1 (https=)
CA (1) CA2934298C (https=)
CL (1) CL2016001802A1 (https=)
WO (1) WO2015108935A1 (https=)
ZA (1) ZA201604177B (https=)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017046887A1 (ja) * 2015-09-16 2017-03-23 株式会社東芝 音声合成装置、音声合成方法、音声合成プログラム、音声合成モデル学習装置、音声合成モデル学習方法及び音声合成モデル学習プログラム
US10249314B1 (en) * 2016-07-21 2019-04-02 Oben, Inc. Voice conversion system and method with variance and spectrum compensation
US10872598B2 (en) * 2017-02-24 2020-12-22 Baidu Usa Llc Systems and methods for real-time neural text-to-speech
US10896669B2 (en) 2017-05-19 2021-01-19 Baidu Usa Llc Systems and methods for multi-speaker neural text-to-speech
US10872596B2 (en) 2017-10-19 2020-12-22 Baidu Usa Llc Systems and methods for parallel wave generation in end-to-end text-to-speech
CN108962217B (zh) * 2018-07-28 2021-07-16 华为技术有限公司 语音合成方法及相关设备
CN109285535A (zh) * 2018-10-11 2019-01-29 四川长虹电器股份有限公司 基于前端设计的语音合成方法
CN109785823B (zh) * 2019-01-22 2021-04-02 中财颐和科技发展(北京)有限公司 语音合成方法及系统
US11514634B2 (en) 2020-06-12 2022-11-29 Baidu Usa Llc Personalized speech-to-video with three-dimensional (3D) skeleton regularization and expressive body poses
US11587548B2 (en) * 2020-06-12 2023-02-21 Baidu Usa Llc Text-driven video synthesis with phonetic dictionary
CN121237074A (zh) * 2024-06-28 2025-12-30 腾讯科技(深圳)有限公司 音频处理方法、相关装置和介质

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69620967T2 (de) * 1995-09-19 2002-11-07 At & T Corp., New York Synthese von Sprachsignalen in Abwesenheit kodierter Parameter
US6567777B1 (en) * 2000-08-02 2003-05-20 Motorola, Inc. Efficient magnitude spectrum approximation
US6970820B2 (en) * 2001-02-26 2005-11-29 Matsushita Electric Industrial Co., Ltd. Voice personalization of speech synthesizer
US6792407B2 (en) * 2001-03-30 2004-09-14 Matsushita Electric Industrial Co., Ltd. Text selection and recording by feedback and adaptation for development of personalized text-to-speech systems
GB0113570D0 (en) * 2001-06-04 2001-07-25 Hewlett Packard Co Audio-form presentation of text messages
US20030028377A1 (en) * 2001-07-31 2003-02-06 Noyes Albert W. Method and device for synthesizing and distributing voice types for voice-enabled devices
CA2365203A1 (en) * 2001-12-14 2003-06-14 Voiceage Corporation A signal modification method for efficient coding of speech signals
US7096183B2 (en) 2002-02-27 2006-08-22 Matsushita Electric Industrial Co., Ltd. Customizing the speaking style of a speech synthesizer based on semantic analysis
US7136816B1 (en) * 2002-04-05 2006-11-14 At&T Corp. System and method for predicting prosodic parameters
CN1692403A (zh) * 2002-10-04 2005-11-02 皇家飞利浦电子股份有限公司 具有个人化语音段的语音合成设备
US6961704B1 (en) 2003-01-31 2005-11-01 Speechworks International, Inc. Linguistic prosodic model-based text to speech
US8886538B2 (en) 2003-09-26 2014-11-11 Nuance Communications, Inc. Systems and methods for text-to-speech synthesis using spoken example
US7567896B2 (en) 2004-01-16 2009-07-28 Nuance Communications, Inc. Corpus-based speech synthesis based on segment recombination
US7693719B2 (en) * 2004-10-29 2010-04-06 Microsoft Corporation Providing personalized voice font for text-to-speech applications
US20100030557A1 (en) * 2006-07-31 2010-02-04 Stephen Molloy Voice and text communication system, method and apparatus
JP4455610B2 (ja) * 2007-03-28 2010-04-21 株式会社東芝 韻律パタン生成装置、音声合成装置、プログラムおよび韻律パタン生成方法
JP5457706B2 (ja) * 2009-03-30 2014-04-02 株式会社東芝 音声モデル生成装置、音声合成装置、音声モデル生成プログラム、音声合成プログラム、音声モデル生成方法および音声合成方法
EP2507794B1 (en) * 2009-12-02 2018-10-17 Agnitio S.L. Obfuscated speech synthesis
US20120143611A1 (en) * 2010-12-07 2012-06-07 Microsoft Corporation Trajectory Tiling Approach for Text-to-Speech
CN102651217A (zh) 2011-02-25 2012-08-29 株式会社东芝 用于合成语音的方法、设备以及用于语音合成的声学模型训练方法
CN102270449A (zh) 2011-08-10 2011-12-07 歌尔声学股份有限公司 参数语音合成方法和系统
JP5631915B2 (ja) 2012-03-29 2014-11-26 株式会社東芝 音声合成装置、音声合成方法、音声合成プログラムならびに学習装置
EP3114584B1 (en) 2014-03-04 2021-06-23 Interactive Intelligence Group, Inc. Optimization of audio fingerprint search

Also Published As

Publication number Publication date
AU2015206631A1 (en) 2016-06-30
WO2015108935A1 (en) 2015-07-23
CL2016001802A1 (es) 2016-12-23
EP3095112B1 (en) 2019-10-30
ZA201604177B (en) 2018-11-28
US20180144739A1 (en) 2018-05-24
NZ721092A (en) 2021-03-26
EP3095112A4 (en) 2017-09-13
US20150199956A1 (en) 2015-07-16
EP3095112A1 (en) 2016-11-23
AU2020203559B2 (en) 2021-10-28
US10733974B2 (en) 2020-08-04
JP6614745B2 (ja) 2019-12-04
JP2017502349A (ja) 2017-01-19
US9911407B2 (en) 2018-03-06
CA2934298A1 (en) 2015-07-23
BR112016016310A2 (https=) 2017-08-08
CA2934298C (en) 2023-03-07
AU2020203559A1 (en) 2020-06-18

Similar Documents

Publication Publication Date Title
BR112016016310B1 (pt) Sistema para sintetizar discurso para um texto provido e método para gerar parâmetros
US10497362B2 (en) System and method for outlier identification to remove poor alignments in speech synthesis
US20170309271A1 (en) Speaking-rate normalized prosodic parameter builder, speaking-rate dependent prosodic model builder, speaking-rate controlled prosodic-information generation device and prosodic-information generation method able to learn different languages and mimic various speakers' speaking styles
Wu et al. Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features.
AU2020205275B2 (en) System and method for outlier identification to remove poor alignments in speech synthesis
CN105474307A (zh) 定量的f0轮廓生成装置及方法、以及用于生成f0轮廓的模型学习装置及方法
Pradhan et al. A syllable based statistical text to speech system
Liu et al. Modeling partial pronunciation variations for spontaneous Mandarin speech recognition
Mustafa et al. Emotional speech acoustic model for Malay: iterative versus isolated unit training
Mustafa et al. Developing an HMM-based speech synthesis system for Malay: a comparison of iterative and isolated unit training
Kubo et al. Grapheme-to-phoneme conversion based on adaptive regularization of weight vectors.
Jokisch et al. Multi-level rhythm control for speech synthesis using hybrid data driven and rule-based approaches.
Anumanchipalli et al. A style capturing approach to F0 transformation in voice conversion
Matsuda et al. Applying generation process model constraint to fundamental frequency contours generated by hidden-Markov-model-based speech synthesis
NZ721092B2 (en) System and method for synthesis of speech from provided text
Astrinaki et al. sHTS: A streaming architecture for statistical parametric speech synthesis
Kuczmarski Overview of HMM-based Speech Synthesis Methods
Wu et al. Development of hmm-based malay text-to-speech system
Hentschel et al. Exploiting imbalanced textual and acoustic data for training prosodically-enhanced RNNLMs
Shah et al. Deterministic annealing EM algorithm for developing TTS system in Gujarati
Petrov Structured Acoustic Models for Speech Recognition
Majji Building a Tamil Text-to-Speech Synthesizer using Festival
Sazhok Phoneme Recognition Output Post-Processing for Word Sequences Decoding

Legal Events

Date Code Title Description
B06U Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]
B09A Decision: intention to grant [chapter 9.1 patent gazette]
B16A Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]

Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 14/01/2015, OBSERVADAS AS CONDICOES LEGAIS