CN1379391A - 由文本生成个性化语音的方法 - Google Patents

由文本生成个性化语音的方法 Download PDF

Info

Publication number
CN1379391A
CN1379391A CN01116305.4A CN01116305A CN1379391A CN 1379391 A CN1379391 A CN 1379391A CN 01116305 A CN01116305 A CN 01116305A CN 1379391 A CN1379391 A CN 1379391A
Authority
CN
China
Prior art keywords
parameter
personalized
speech
text
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN01116305.4A
Other languages
English (en)
Other versions
CN1156819C (zh
Inventor
唐道南
沈丽琴
施勤
张维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CNB011163054A priority Critical patent/CN1156819C/zh
Priority to JP2002085138A priority patent/JP2002328695A/ja
Priority to US10/118,497 priority patent/US20020173962A1/en
Publication of CN1379391A publication Critical patent/CN1379391A/zh
Application granted granted Critical
Publication of CN1156819C publication Critical patent/CN1156819C/zh
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Abstract

本发明公开了一种由文本生成个性化语音方法,包括以下步骤:对输入的文本进行分析,通过标准TTS数据库得出可以表征将要合成的语音的特征的标准语音参数;使用通过训练获得的参数个性化模型将所述标准语音参数变换为个性化的语音参数;以及基于所述个性化语音参数合成对应于所述输入文本的语音。本发明的由文本生成个性化语音的方法,可以模仿任意的目标人的语音,从而使标准TTS系统产生的语音更加生动,并且具有个性化特征。

Description

由文本生成个性化语音的方法
本发明一般涉及文本-语音生成技术,具体地说,涉及由文本生成个性化语音的方法。
现有的TTS(文本-语音)系统通常产生缺乏情感的单调的语音。在现有的TTS系统中,首先对所有字/词的标准发音按音节记录并对此进行分析,然后在字/词级将用于表述标准发音的相关参数存储在字典中。通过字典中定义的标准控制参数和常用的平滑技术由各个音节分量合成对应于文本的语音。这样合成的语音非常单调,不具有个性化。
为此本发明提出了一种可以由文本生成个性化语音的方法。
根据本发明的可以由文本生成个性化语音的方法包括以下步骤:对输入的文本进行分析,通过标准文本-语音数据库得出可以表征将要合成的语音的特征的标准语音参数;使用通过先前训练获得的参数个性化模型将所述标准语音参数变换为个性化的语音参数;以及,基于所述个性化语音参数合成对应于所述输入文本的语音。
通过以下结合附图对本发明优选实施例的详细描述,可以使本发明目的、优点以及特征更加清楚。
图1描述了在现有TTS系统中由文本生成语音的过程;
图2描述了根据本发明由文本生成个性化语音的过程;
图3描述了根据本发明一优选实施例产生参数个性化模型的过程;
图4描述了为获得参数个性化模型而在两组倒频谱系数之间进行映射的过程;以及
图5描述了在韵律模型中使用的决策树。
如图1所示,在现有的TTS系统,为了由文本生成语音,通常要经过以下步骤:首先,对输入的文本进行分析,通过标准文本-语音数据库得出用于表述标准发音的相关参数;其次,使用标准控制参数和常用的平滑技术由各个音节分量合成对应于文本的语音。这样产生的语音通常缺乏情感、单调,从而不具有个性化。
为此本发明提出了一种可以由文本生成个性化语音的方法。
如图2所示,根据本发明的由文本生成个性化语音的方法包括以下步骤:首先,对输入的文本进行分析,通过标准文本-语音数据库得出可以表征将要合成的语音的特征的标准语音参数;其次,使用通过训练获得的参数个性化模型将所述标准语音参数变换为个性化的语音参数;最后,基于所述个性化语音参数合成对应于所述输入文本的语音。
以下结合图3描述一下根据本发明一优选实施例产生参数个性化模型的过程。具体地说,为了获得参数个性化模型,首先使用标准TTS分析过程,获取标准的语音参数Vgeneral;同时,对个性化语音进行检测,得出其语音参数Vpersonalized;初始建立反映标准语音参数Vgeneral与个性化语音参数Vpersonalized之间对应关系的参数个性化模型:
Vpersonalized=F[Vgeneral];
为了获得稳定的F[*],多次重复以上检测个性化语音参数Vpersonalized过程,并根据检测结果来调整所述参数个性化模型F[*],直到获得稳定的参数个性化模型F[*]。在根据本发明一个具体实施例中,我们认为如果在n次检测中,每相邻两次结果都使|Fi[*]-Fi+1[*]|≤δ,则认为F[*]是稳定的。根据本发明一优选实施例,本发明在以下两个层次上获取反映标准语音参数Vgeneral和个性化语音参数Vpersonalized之间对应关系的参数个性化模型F[*]:
层次1:与倒频谱参数相关的声学层次,
层次2:与超音段参数相关的韵律层次。对于不同层次我们采取了不同的训练方式。
·层次1:与倒频谱参数相关的声学层次:
借助于语音识别技术,我们可以获得语音的倒频谱参数序列。如果给出两个人对同一文本的语音,则我们不仅能够获得每个人的倒频谱参数序列,而且还可以获得两个倒频谱序列之间在帧一级上的对应关系。这样我们可以逐帧比较它们之间的差异,并对它们之间的差异建模以得到与倒频谱参数相关的语声级上的F[*]。
在该模型中,定义两组倒频谱参数,一组来自标准TTS系统,而另一组来自作为要模仿的目标的某个人的语音。使用图4描述的智能VQ(向量量化)方法建立两组倒频谱参数之间的映射关系。首先,对于标准TTS中的语音倒频谱参数,进行初始的高斯聚类,以量化向量,我们得到:G1,G2…。其次,从两组倒频谱参数序列之间的逐帧的严格映射关系以及对标准TTS中的语音的倒频谱参数初始高斯聚类结果中,我们得出要模仿的语音的初始高斯聚类结果。为了获得每个Gi’的更精确的模型,我们进行高斯聚类,得到G1.1’,G1.2’…,G2.1’,G2.2’…。然后我们得到高斯中的一一映射关系,并将F[*]定义如下: V personalized = F [ V general ] : V general ∈ G i , j , V personal = ( V general - M G i , j ) * D G i , j ′ D G i , j + M G i , j ′
在以上等式中,MGi,j,DGi,j表示Gi,j的均值和变化,而MGi,j’,DGi,j’表示Gi,j’的均值和变化。
·层次2:与超音段参数相关的韵律层次:
据我们所知,韵律参数是与上下文相关的。上下文信息包括:音子、重音、语义、句法、语义结构等等。为了确定上下文信息之间的关系,我们使用决策树来对韵律层次的变换机制F[*]建模。
韵律参数包括:基频、时长以及响度。对于每个音子,我们按如下方式定义韵律向量:
基频模式:10个点上的基频值,完全分布在整个音子上;
时长:3个值,包括:爆破部分时长、稳定部分时长以及过渡部分时长
响度:2个值,包括前响度和后响度
我们用15维向量来表示音子的韵律。
假设该韵律向量是高斯分布的,我们可以使用一般的决策树算法来对标准TTS系统的语音的韵律向量进行聚类。所以我们可以得出图5所示的决策树D.T.以及高斯值G1,G2,G3…。
当输入要模仿的语音和其文本时,首先对文本进行分析,得出其上下文信息,然后将上下文信息输入到决策树D.T.,以得到另一组高斯值G1’,G2’,G3’…。
我们假设高斯G1,G2,G3…和G1’,G2’,G3’…是一一映射的,我们构造如下的映射函数: V personalized = F [ V general ] : V general ∈ G i , j , V personal = ( V general - M G i , j ) * D G i , j ′ D G i , j + M G i , j ′ 在等式中MGi,j,DGi,j表示Gi,j的均值和变化,而MGi,j’,DGi,j’表示Gi,j’的均值和变化。
以上结合图1-图5描述了根据本发明的由文本生成个性化语音的方法。其中的关键问题是要从特征向量中实时地合成音子的模拟信号。这基本上是数字化特征提取过程的逆过程(类似于逆付立叶变换)。这样的过程非常复杂,但是人们可以使用当前可以获得的专用算法来实现这一过程,如IBM的由倒频谱特性重构语音的技术。
尽管在通常情况下,人们会通过实时的变换计算来生成个性化的语音,但可以预计,对于任意特定的目标说话音,可以建立完备的个性化TTS数据库。由于变换和生成模拟语音分量是在通过TTS系统产生个性化语音的最后步骤上完成的,所以本发明的方法对于现有的TTS系统不会产生任何的影响。
以上结合具体实施例描述了根据本发明的由文本生成个性化语音的方法。正如本领域一般技术人员所熟知的,在不背离本发明的精神和实质的情况下,可以对本发明作出许多修改和变型,因此本发明将包括所有这些修改和变型,本发明的保护范围应由所附权利要求书来限定。

Claims (6)

1.一种由文本生成个性化语音方法,包括以下步骤:
对输入的文本进行分析,通过标准文本-语音数据库得出可以表征将要合成的语音的特征的标准语音参数;
使用通过先前训练获得的参数个性化模型将所述标准语音参数变换为个性化的语音参数;以及
基于所述个性化语音参数合成对应于所述输入文本的语音。
2.根据权利要求1的方法,其特征在于通过以下步骤获取参数个性化模型:
使用标准文本-语音分析过程,获取标准语音参数;
检测个性化语音中的个性化语音参数;
初始建立反映标准语音参数与个性化语音参数之间对应关系的参数个性化模型;
多次重复以上检测个性化语音参数的过程,并根据检测结果来调整所述参数个性化模型,直到获得稳定的参数个性化模型。
3.根据权利要求1或2的方法,其中所述参数个性化模型包括与倒频谱参数相关的声学层次上的参数个性化模型。
4.根据权利要求3的方法,其中使用智能向量量化方法建立所述与倒频谱参数相关的声学层次上的参数个性化模型。
5.根据权利要求1或2的方法,其中所述参数个性化模型包括与超音段参数相关的韵律层次上的参数个性化模。
6.根据权利要求5的方法,其中使用决策树来建立所述与超音段参数相关的韵律层次上的参数个性化模型。
CNB011163054A 2001-04-06 2001-04-06 由文本生成个性化语音的方法 Expired - Fee Related CN1156819C (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CNB011163054A CN1156819C (zh) 2001-04-06 2001-04-06 由文本生成个性化语音的方法
JP2002085138A JP2002328695A (ja) 2001-04-06 2002-03-26 テキストからパーソナライズ化音声を生成する方法
US10/118,497 US20020173962A1 (en) 2001-04-06 2002-04-05 Method for generating pesonalized speech from text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB011163054A CN1156819C (zh) 2001-04-06 2001-04-06 由文本生成个性化语音的方法

Publications (2)

Publication Number Publication Date
CN1379391A true CN1379391A (zh) 2002-11-13
CN1156819C CN1156819C (zh) 2004-07-07

Family

ID=4662451

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB011163054A Expired - Fee Related CN1156819C (zh) 2001-04-06 2001-04-06 由文本生成个性化语音的方法

Country Status (3)

Country Link
US (1) US20020173962A1 (zh)
JP (1) JP2002328695A (zh)
CN (1) CN1156819C (zh)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100362521C (zh) * 2004-01-06 2008-01-16 秦国锋 Gps动态精确定位智能自动报站终端
CN1879147B (zh) * 2003-12-16 2010-05-26 洛昆多股份公司 文本到语音转换方法和系统
CN102693729A (zh) * 2012-05-15 2012-09-26 北京奥信通科技发展有限公司 个性化语音阅读方法、系统及具有该系统的终端
CN102117614B (zh) * 2010-01-05 2013-01-02 索尼爱立信移动通讯有限公司 个性化文本语音合成和个性化语音特征提取
WO2013011397A1 (en) * 2011-07-07 2013-01-24 International Business Machines Corporation Statistical enhancement of speech output from statistical text-to-speech synthesis system
CN103856626A (zh) * 2012-11-29 2014-06-11 北京千橡网景科技发展有限公司 个性声音的定制方法和装置
CN105206258A (zh) * 2015-10-19 2015-12-30 百度在线网络技术(北京)有限公司 声学模型的生成方法和装置及语音合成方法和装置
CN105609096A (zh) * 2015-12-30 2016-05-25 小米科技有限责任公司 文本数据输出方法和装置
CN105989832A (zh) * 2015-02-10 2016-10-05 阿尔卡特朗讯 一种用于在计算机设备中生成个性化语音的方法和装置
CN106847256A (zh) * 2016-12-27 2017-06-13 苏州帷幄投资管理有限公司 一种语音转化聊天方法
CN108366302A (zh) * 2018-02-06 2018-08-03 南京创维信息技术研究院有限公司 Tts播报指令优化方法、智能电视、系统及存储装置
WO2018153223A1 (zh) * 2017-02-21 2018-08-30 腾讯科技(深圳)有限公司 语音转换方法、计算机设备和存储介质
CN109935225A (zh) * 2017-12-15 2019-06-25 富泰华工业(深圳)有限公司 文字信息处理装置及方法、计算机存储介质及移动终端
CN110289010A (zh) * 2019-06-17 2019-09-27 百度在线网络技术(北京)有限公司 一种声音采集的方法、装置、设备和计算机存储介质
CN111145721A (zh) * 2019-12-12 2020-05-12 科大讯飞股份有限公司 个性化提示语生成方法、装置和设备
CN111192566A (zh) * 2020-03-03 2020-05-22 云知声智能科技股份有限公司 英文语音合成方法及装置
WO2020114323A1 (zh) * 2018-12-06 2020-06-11 阿里巴巴集团控股有限公司 一种用于个性化语音合成的方法和装置
CN112712798A (zh) * 2020-12-23 2021-04-27 苏州思必驰信息科技有限公司 私有化数据获取方法及装置

Families Citing this family (130)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
JP2004226741A (ja) * 2003-01-23 2004-08-12 Nissan Motor Co Ltd 情報提供装置
US8768701B2 (en) * 2003-01-24 2014-07-01 Nuance Communications, Inc. Prosodic mimic method and apparatus
GB2412046A (en) * 2004-03-11 2005-09-14 Seiko Epson Corp Semiconductor device having a TTS system to which is applied a voice parameter set
DE602005012998D1 (de) * 2005-01-31 2009-04-09 France Telecom Verfahren zur schätzung einer sprachumsetzungsfunktion
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
WO2007063827A1 (ja) * 2005-12-02 2007-06-07 Asahi Kasei Kabushiki Kaisha 声質変換システム
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
GB2443027B (en) * 2006-10-19 2009-04-01 Sony Comp Entertainment Europe Apparatus and method of audio processing
US8886537B2 (en) * 2007-03-20 2014-11-11 Nuance Communications, Inc. Method and system for text-to-speech synthesis with personalized voice
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
WO2008132533A1 (en) * 2007-04-26 2008-11-06 Nokia Corporation Text-to-speech conversion method, apparatus and system
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US8332225B2 (en) * 2009-06-04 2012-12-11 Microsoft Corporation Techniques to create a custom voice font
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US20110066438A1 (en) * 2009-09-15 2011-03-17 Apple Inc. Contextual voiceover
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
DE202011111062U1 (de) 2010-01-25 2019-02-19 Newvaluexchange Ltd. Vorrichtung und System für eine Digitalkonversationsmanagementplattform
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
GB2505400B (en) * 2012-07-18 2015-01-07 Toshiba Res Europ Ltd A speech processing system
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
JP6314828B2 (ja) * 2012-10-16 2018-04-25 日本電気株式会社 韻律モデル学習装置、韻律モデル学習方法、音声合成システム、および韻律モデル学習プログラム
JP2016508007A (ja) 2013-02-07 2016-03-10 アップル インコーポレイテッド デジタルアシスタントのためのボイストリガ
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
WO2014144579A1 (en) 2013-03-15 2014-09-18 Apple Inc. System and method for updating an adaptive speech recognition model
KR101759009B1 (ko) 2013-03-15 2017-07-17 애플 인크. 적어도 부분적인 보이스 커맨드 시스템을 트레이닝시키는 것
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
CN110442699A (zh) 2013-06-09 2019-11-12 苹果公司 操作数字助理的方法、计算机可读介质、电子设备和系统
CN105265005B (zh) 2013-06-13 2019-09-17 苹果公司 用于由语音命令发起的紧急呼叫的系统和方法
JP6163266B2 (ja) 2013-08-06 2017-07-12 アップル インコーポレイテッド リモート機器からの作動に基づくスマート応答の自動作動
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9824681B2 (en) * 2014-09-11 2017-11-21 Microsoft Technology Licensing, Llc Text-to-speech with emotional content
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
CN105096934B (zh) * 2015-06-30 2019-02-12 百度在线网络技术(北京)有限公司 构建语音特征库的方法、语音合成方法、装置及设备
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179049B1 (en) 2016-06-11 2017-09-18 Apple Inc Data driven natural language event detection and classification
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
DK201770439A1 (en) 2017-05-11 2018-12-13 Apple Inc. Offline personal assistant
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
DK201770432A1 (en) 2017-05-15 2018-12-21 Apple Inc. Hierarchical belief states for digital assistants
DK201770431A1 (en) 2017-05-15 2018-12-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
DK179549B1 (en) 2017-05-16 2019-02-12 Apple Inc. FAR-FIELD EXTENSION FOR DIGITAL ASSISTANT SERVICES
JP6737320B2 (ja) * 2018-11-06 2020-08-05 ヤマハ株式会社 音響処理方法、音響処理システムおよびプログラム
US11023470B2 (en) 2018-11-14 2021-06-01 International Business Machines Corporation Voice response system for text presentation

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4624012A (en) * 1982-05-06 1986-11-18 Texas Instruments Incorporated Method and apparatus for converting voice characteristics of synthesized speech
US4692941A (en) * 1984-04-10 1987-09-08 First Byte Real-time text-to-speech conversion system
US5063698A (en) * 1987-09-08 1991-11-12 Johnson Ellen B Greeting card with electronic sound recording
US5278943A (en) * 1990-03-23 1994-01-11 Bright Star Technology, Inc. Speech animation and inflection system
US5165008A (en) * 1991-09-18 1992-11-17 U S West Advanced Technologies, Inc. Speech synthesis using perceptual linear prediction parameters
US5502790A (en) * 1991-12-24 1996-03-26 Oki Electric Industry Co., Ltd. Speech recognition method and system using triphones, diphones, and phonemes
GB2296846A (en) * 1995-01-07 1996-07-10 Ibm Synthesising speech from text
US5737487A (en) * 1996-02-13 1998-04-07 Apple Computer, Inc. Speaker adaptation based on lateral tying for large-vocabulary continuous speech recognition
US6035273A (en) * 1996-06-26 2000-03-07 Lucent Technologies, Inc. Speaker-specific speech-to-text/text-to-speech communication system with hypertext-indicated speech parameter changes
US6119086A (en) * 1998-04-28 2000-09-12 International Business Machines Corporation Speech coding via speech recognition and synthesis based on pre-enrolled phonetic tokens
US5974116A (en) * 1998-07-02 1999-10-26 Ultratec, Inc. Personal interpreter
US6970820B2 (en) * 2001-02-26 2005-11-29 Matsushita Electric Industrial Co., Ltd. Voice personalization of speech synthesizer

Cited By (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1879147B (zh) * 2003-12-16 2010-05-26 洛昆多股份公司 文本到语音转换方法和系统
CN100362521C (zh) * 2004-01-06 2008-01-16 秦国锋 Gps动态精确定位智能自动报站终端
CN102117614B (zh) * 2010-01-05 2013-01-02 索尼爱立信移动通讯有限公司 个性化文本语音合成和个性化语音特征提取
WO2013011397A1 (en) * 2011-07-07 2013-01-24 International Business Machines Corporation Statistical enhancement of speech output from statistical text-to-speech synthesis system
CN103635960A (zh) * 2011-07-07 2014-03-12 国际商业机器公司 从统计文本到语音合成系统输出的语音的统计增强
GB2507674A (en) * 2011-07-07 2014-05-07 Ibm Statistical enhancement of speech output from statistical text-to-speech synthesis system
GB2507674B (en) * 2011-07-07 2015-04-08 Ibm Statistical enhancement of speech output from A statistical text-to-speech synthesis system
CN103635960B (zh) * 2011-07-07 2016-04-13 国际商业机器公司 从统计文本到语音合成系统输出的语音的统计增强
CN102693729A (zh) * 2012-05-15 2012-09-26 北京奥信通科技发展有限公司 个性化语音阅读方法、系统及具有该系统的终端
CN102693729B (zh) * 2012-05-15 2014-09-03 北京奥信通科技发展有限公司 个性化语音阅读方法、系统及具有该系统的终端
CN103856626A (zh) * 2012-11-29 2014-06-11 北京千橡网景科技发展有限公司 个性声音的定制方法和装置
CN105989832A (zh) * 2015-02-10 2016-10-05 阿尔卡特朗讯 一种用于在计算机设备中生成个性化语音的方法和装置
CN105206258B (zh) * 2015-10-19 2018-05-04 百度在线网络技术(北京)有限公司 声学模型的生成方法和装置及语音合成方法和装置
US10614795B2 (en) 2015-10-19 2020-04-07 Baidu Online Network Technology (Beijing) Co., Ltd. Acoustic model generation method and device, and speech synthesis method
WO2017067246A1 (zh) * 2015-10-19 2017-04-27 百度在线网络技术(北京)有限公司 声学模型的生成方法和装置及语音合成方法和装置
CN105206258A (zh) * 2015-10-19 2015-12-30 百度在线网络技术(北京)有限公司 声学模型的生成方法和装置及语音合成方法和装置
CN105609096A (zh) * 2015-12-30 2016-05-25 小米科技有限责任公司 文本数据输出方法和装置
CN106847256A (zh) * 2016-12-27 2017-06-13 苏州帷幄投资管理有限公司 一种语音转化聊天方法
WO2018153223A1 (zh) * 2017-02-21 2018-08-30 腾讯科技(深圳)有限公司 语音转换方法、计算机设备和存储介质
CN109935225A (zh) * 2017-12-15 2019-06-25 富泰华工业(深圳)有限公司 文字信息处理装置及方法、计算机存储介质及移动终端
CN108366302A (zh) * 2018-02-06 2018-08-03 南京创维信息技术研究院有限公司 Tts播报指令优化方法、智能电视、系统及存储装置
CN108366302B (zh) * 2018-02-06 2020-06-30 南京创维信息技术研究院有限公司 Tts播报指令优化方法、智能电视、系统及存储装置
WO2020114323A1 (zh) * 2018-12-06 2020-06-11 阿里巴巴集团控股有限公司 一种用于个性化语音合成的方法和装置
CN110289010A (zh) * 2019-06-17 2019-09-27 百度在线网络技术(北京)有限公司 一种声音采集的方法、装置、设备和计算机存储介质
CN110289010B (zh) * 2019-06-17 2020-10-30 百度在线网络技术(北京)有限公司 一种声音采集的方法、装置、设备和计算机存储介质
US11295724B2 (en) 2019-06-17 2022-04-05 Baidu Online Network Technology (Beijing) Co., Ltd. Sound-collecting method, device and computer storage medium
CN111145721A (zh) * 2019-12-12 2020-05-12 科大讯飞股份有限公司 个性化提示语生成方法、装置和设备
CN111145721B (zh) * 2019-12-12 2024-02-13 科大讯飞股份有限公司 个性化提示语生成方法、装置和设备
CN111192566A (zh) * 2020-03-03 2020-05-22 云知声智能科技股份有限公司 英文语音合成方法及装置
CN111192566B (zh) * 2020-03-03 2022-06-24 云知声智能科技股份有限公司 英文语音合成方法及装置
CN112712798A (zh) * 2020-12-23 2021-04-27 苏州思必驰信息科技有限公司 私有化数据获取方法及装置
CN112712798B (zh) * 2020-12-23 2022-08-05 思必驰科技股份有限公司 私有化数据获取方法及装置

Also Published As

Publication number Publication date
US20020173962A1 (en) 2002-11-21
JP2002328695A (ja) 2002-11-15
CN1156819C (zh) 2004-07-07

Similar Documents

Publication Publication Date Title
CN1156819C (zh) 由文本生成个性化语音的方法
CN1222924C (zh) 声音个性化的语音合成器
CN110992987B (zh) 语音信号中针对通用特定语音的并联特征提取系统及方法
Masuko et al. Imposture using synthetic speech against speaker verification based on spectrum and pitch.
Le Cornu et al. Generating intelligible audio speech from visual speech
Takaki et al. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis
Hibare et al. Feature extraction techniques in speech processing: a survey
CN112002348B (zh) 一种患者语音愤怒情绪识别方法和系统
Niwa et al. Statistical voice conversion based on WaveNet
Das et al. A voice identification system using hidden markov model
Miyanaga et al. A style control technique for HMM-based speech synthesis
KR102528019B1 (ko) 인공지능 기술에 기반한 음성 합성 시스템
KR102449209B1 (ko) 무음 부분을 자연스럽게 처리하는 음성 합성 시스템
Dharun et al. Voice and speech recognition for tamil words and numerals
Razak et al. Towards automatic recognition of emotion in speech
Rusan et al. Human-Computer Interaction Through Voice Commands Recognition
KR102503066B1 (ko) 어텐션 얼라인먼트의 스코어를 이용하여 스펙트로그램의 품질을 평가하는 방법 및 음성 합성 시스템
KR102532253B1 (ko) 스펙트로그램에 대응하는 어텐션 얼라인먼트의 디코더 스코어를 연산하는 방법 및 음성 합성 시스템
KR102463570B1 (ko) 무음 구간 검출을 통한 멜 스펙트로그램의 배치 구성 방법 및 음성 합성 시스템
Thandil et al. Automatic speech recognition system for utterances in Malayalam language
Minematsu et al. Prosodic manipulation system of speech material for perceptual experiments
Tamura et al. Speaker adaptation of pitch and spectrum for HMM‐based speech synthesis
Ma et al. Further feature extraction for speaker recognition
Singh et al. A Hybrid Deep Learning Model for Emotion Conversion in Tamil Language
Hsan A Study on Isolated-Word Myanmar Speech Recognition via Artificial Neural Networks

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee