CA3154029A1 - Deep learning-based emotional speech synthesis method and device - Google Patents
Deep learning-based emotional speech synthesis method and device Download PDFInfo
- Publication number
- CA3154029A1 CA3154029A1 CA3154029A CA3154029A CA3154029A1 CA 3154029 A1 CA3154029 A1 CA 3154029A1 CA 3154029 A CA3154029 A CA 3154029A CA 3154029 A CA3154029 A CA 3154029A CA 3154029 A1 CA3154029 A1 CA 3154029A1
- Authority
- CA
- Canada
- Prior art keywords
- information
- model
- text information
- sample
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002996 emotional effect Effects 0.000 title claims abstract description 64
- 238000013135 deep learning Methods 0.000 title claims abstract description 35
- 238000001308 synthesis method Methods 0.000 title claims abstract description 23
- 230000008451 emotion Effects 0.000 claims abstract description 183
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000002372 labelling Methods 0.000 claims abstract description 25
- 238000012549 training Methods 0.000 claims description 76
- 238000000605 extraction Methods 0.000 claims description 28
- 230000015572 biosynthetic process Effects 0.000 claims description 26
- 238000003786 synthesis reaction Methods 0.000 claims description 26
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 238000004891 communication Methods 0.000 abstract description 13
- 230000006870 function Effects 0.000 description 13
- 238000010586 diagram Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 5
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910850474.8 | 2019-09-10 | ||
CN201910850474.8A CN110675853B (zh) | 2019-09-10 | 2019-09-10 | 一种基于深度学习的情感语音合成方法及装置 |
PCT/CN2020/096998 WO2021047233A1 (zh) | 2019-09-10 | 2020-06-19 | 一种基于深度学习的情感语音合成方法及装置 |
Publications (1)
Publication Number | Publication Date |
---|---|
CA3154029A1 true CA3154029A1 (en) | 2021-03-18 |
Family
ID=69077740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA3154029A Pending CA3154029A1 (en) | 2019-09-10 | 2020-06-19 | Deep learning-based emotional speech synthesis method and device |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN110675853B (zh) |
CA (1) | CA3154029A1 (zh) |
WO (1) | WO2021047233A1 (zh) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675853B (zh) * | 2019-09-10 | 2022-07-05 | 苏宁云计算有限公司 | 一种基于深度学习的情感语音合成方法及装置 |
CN113223493A (zh) * | 2020-01-20 | 2021-08-06 | Tcl集团股份有限公司 | 语音看护方法、装置、系统及存储介质 |
CN111816212B (zh) * | 2020-06-19 | 2022-10-11 | 杭州电子科技大学 | 基于特征集融合的语音情感识别及评价方法 |
CN112489620B (zh) * | 2020-11-20 | 2022-09-09 | 北京有竹居网络技术有限公司 | 语音合成方法、装置、可读介质及电子设备 |
CN113192483B (zh) * | 2021-03-22 | 2024-02-27 | 联想(北京)有限公司 | 一种文本转换为语音的方法、装置、存储介质和设备 |
CN113421576B (zh) * | 2021-06-29 | 2024-05-24 | 平安科技(深圳)有限公司 | 语音转换方法、装置、设备以及存储介质 |
CN114005446A (zh) * | 2021-11-01 | 2022-02-01 | 科大讯飞股份有限公司 | 情感分析方法、相关设备及可读存储介质 |
CN114783406B (zh) * | 2022-06-16 | 2022-10-21 | 深圳比特微电子科技有限公司 | 语音合成方法、装置和计算机可读存储介质 |
CN116825088B (zh) * | 2023-08-25 | 2023-11-07 | 深圳市国硕宏电子有限公司 | 一种基于深度学习的会议语音检测方法及系统 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105355193B (zh) * | 2015-10-30 | 2020-09-25 | 百度在线网络技术(北京)有限公司 | 语音合成方法和装置 |
CN106599998B (zh) * | 2016-12-01 | 2019-02-01 | 竹间智能科技(上海)有限公司 | 基于情感特征调整机器人回答的方法及系统 |
US10424288B2 (en) * | 2017-03-31 | 2019-09-24 | Wipro Limited | System and method for rendering textual messages using customized natural voice |
CN108172209A (zh) * | 2018-01-09 | 2018-06-15 | 上海大学 | 构建语音偶像方法 |
CN109003624B (zh) * | 2018-06-29 | 2022-02-15 | 北京百度网讯科技有限公司 | 情绪识别方法、装置、计算机设备及存储介质 |
CN109523989B (zh) * | 2019-01-29 | 2022-01-11 | 网易有道信息技术(北京)有限公司 | 语音合成方法、语音合成装置、存储介质及电子设备 |
CN110211563B (zh) * | 2019-06-19 | 2024-05-24 | 平安科技(深圳)有限公司 | 面向情景及情感的中文语音合成方法、装置及存储介质 |
CN110675853B (zh) * | 2019-09-10 | 2022-07-05 | 苏宁云计算有限公司 | 一种基于深度学习的情感语音合成方法及装置 |
-
2019
- 2019-09-10 CN CN201910850474.8A patent/CN110675853B/zh active Active
-
2020
- 2020-06-19 WO PCT/CN2020/096998 patent/WO2021047233A1/zh active Application Filing
- 2020-06-19 CA CA3154029A patent/CA3154029A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2021047233A1 (zh) | 2021-03-18 |
CN110675853A (zh) | 2020-01-10 |
CN110675853B (zh) | 2022-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA3154029A1 (en) | Deep learning-based emotional speech synthesis method and device | |
CN108766414B (zh) | 用于语音翻译的方法、装置、设备和计算机可读存储介质 | |
Sotelo et al. | Char2wav: End-to-end speech synthesis | |
CN107657017B (zh) | 用于提供语音服务的方法和装置 | |
US7353177B2 (en) | System and method of providing conversational visual prosody for talking heads | |
US7136818B1 (en) | System and method of providing conversational visual prosody for talking heads | |
CN103218842B (zh) | 一种语音同步驱动三维人脸口型与面部姿势动画的方法 | |
CN112184858B (zh) | 基于文本的虚拟对象动画生成方法及装置、存储介质、终端 | |
KR20190114150A (ko) | 비디오 번역 및 립싱크 방법 및 시스템 | |
CN110880198A (zh) | 动画生成方法和装置 | |
Malcangi | Text-driven avatars based on artificial neural networks and fuzzy logic | |
CN112102811B (zh) | 一种合成语音的优化方法、装置及电子设备 | |
CN111508501B (zh) | 一种电话机器人中带口音的语音识别方法及系统 | |
CN110853616A (zh) | 一种基于神经网络的语音合成方法、系统与存储介质 | |
CN113658577A (zh) | 一种语音合成模型训练方法、音频生成方法、设备及介质 | |
CN115761075A (zh) | 脸部图像生成方法及其装置、设备、介质、产品 | |
CN115910066A (zh) | 用于区域配电网的智能调度指挥与运营系统 | |
CN113593522B (zh) | 一种语音数据标注方法和装置 | |
Huang et al. | Mongolian emotional speech synthesis based on transfer learning and emotional embedding | |
CN116631434A (zh) | 基于转换系统的视频语音同步方法、装置、电子设备 | |
CN116580691A (zh) | 语音合成方法、语音合成装置、电子设备及存储介质 | |
CN114512121A (zh) | 语音合成方法、模型训练方法及装置 | |
CN113299272B (zh) | 语音合成模型训练和语音合成方法、设备及存储介质 | |
CN113257225A (zh) | 一种融合词汇及音素发音特征的情感语音合成方法及系统 | |
CN113066473A (zh) | 一种语音合成方法、装置、存储介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request |
Effective date: 20220916 |
|
EEER | Examination request |
Effective date: 20220916 |
|
EEER | Examination request |
Effective date: 20220916 |