CN110675853B - 一种基于深度学习的情感语音合成方法及装置 - Google Patents
一种基于深度学习的情感语音合成方法及装置 Download PDFInfo
- Publication number
- CN110675853B CN110675853B CN201910850474.8A CN201910850474A CN110675853B CN 110675853 B CN110675853 B CN 110675853B CN 201910850474 A CN201910850474 A CN 201910850474A CN 110675853 B CN110675853 B CN 110675853B
- Authority
- CN
- China
- Prior art keywords
- information
- emotion
- model
- sample
- text information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 234
- 238000013135 deep learning Methods 0.000 title claims abstract description 27
- 238000001308 synthesis method Methods 0.000 title claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 35
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 24
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 24
- 230000002996 emotional effect Effects 0.000 claims abstract description 22
- 238000012549 training Methods 0.000 claims description 95
- 238000000605 extraction Methods 0.000 claims description 58
- 238000002372 labelling Methods 0.000 claims description 13
- 230000002194 synthesizing effect Effects 0.000 claims description 7
- 230000010365 information processing Effects 0.000 claims description 5
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000003672 processing method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 abstract 1
- 230000006870 function Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 241000282414 Homo sapiens Species 0.000 description 2
- 206010003591 Ataxia Diseases 0.000 description 1
- 206010010947 Coordination abnormal Diseases 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 208000016290 incoordination Diseases 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Machine Translation (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Description
Claims (9)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910850474.8A CN110675853B (zh) | 2019-09-10 | 2019-09-10 | 一种基于深度学习的情感语音合成方法及装置 |
PCT/CN2020/096998 WO2021047233A1 (zh) | 2019-09-10 | 2020-06-19 | 一种基于深度学习的情感语音合成方法及装置 |
CA3154029A CA3154029A1 (en) | 2019-09-10 | 2020-06-19 | Deep learning-based emotional speech synthesis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910850474.8A CN110675853B (zh) | 2019-09-10 | 2019-09-10 | 一种基于深度学习的情感语音合成方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110675853A CN110675853A (zh) | 2020-01-10 |
CN110675853B true CN110675853B (zh) | 2022-07-05 |
Family
ID=69077740
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910850474.8A Active CN110675853B (zh) | 2019-09-10 | 2019-09-10 | 一种基于深度学习的情感语音合成方法及装置 |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN110675853B (zh) |
CA (1) | CA3154029A1 (zh) |
WO (1) | WO2021047233A1 (zh) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110675853B (zh) * | 2019-09-10 | 2022-07-05 | 苏宁云计算有限公司 | 一种基于深度学习的情感语音合成方法及装置 |
CN113223493A (zh) * | 2020-01-20 | 2021-08-06 | Tcl集团股份有限公司 | 语音看护方法、装置、系统及存储介质 |
CN111816212B (zh) * | 2020-06-19 | 2022-10-11 | 杭州电子科技大学 | 基于特征集融合的语音情感识别及评价方法 |
CN112489620B (zh) * | 2020-11-20 | 2022-09-09 | 北京有竹居网络技术有限公司 | 语音合成方法、装置、可读介质及电子设备 |
CN113192483B (zh) * | 2021-03-22 | 2024-02-27 | 联想(北京)有限公司 | 一种文本转换为语音的方法、装置、存储介质和设备 |
CN113421576B (zh) * | 2021-06-29 | 2024-05-24 | 平安科技(深圳)有限公司 | 语音转换方法、装置、设备以及存储介质 |
CN114005446A (zh) * | 2021-11-01 | 2022-02-01 | 科大讯飞股份有限公司 | 情感分析方法、相关设备及可读存储介质 |
CN114783406B (zh) * | 2022-06-16 | 2022-10-21 | 深圳比特微电子科技有限公司 | 语音合成方法、装置和计算机可读存储介质 |
CN116825088B (zh) * | 2023-08-25 | 2023-11-07 | 深圳市国硕宏电子有限公司 | 一种基于深度学习的会议语音检测方法及系统 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105355193A (zh) * | 2015-10-30 | 2016-02-24 | 百度在线网络技术(北京)有限公司 | 语音合成方法和装置 |
CN109523989A (zh) * | 2019-01-29 | 2019-03-26 | 网易有道信息技术(北京)有限公司 | 语音合成方法、语音合成装置、存储介质及电子设备 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599998B (zh) * | 2016-12-01 | 2019-02-01 | 竹间智能科技(上海)有限公司 | 基于情感特征调整机器人回答的方法及系统 |
US10424288B2 (en) * | 2017-03-31 | 2019-09-24 | Wipro Limited | System and method for rendering textual messages using customized natural voice |
CN108172209A (zh) * | 2018-01-09 | 2018-06-15 | 上海大学 | 构建语音偶像方法 |
CN109003624B (zh) * | 2018-06-29 | 2022-02-15 | 北京百度网讯科技有限公司 | 情绪识别方法、装置、计算机设备及存储介质 |
CN110211563B (zh) * | 2019-06-19 | 2024-05-24 | 平安科技(深圳)有限公司 | 面向情景及情感的中文语音合成方法、装置及存储介质 |
CN110675853B (zh) * | 2019-09-10 | 2022-07-05 | 苏宁云计算有限公司 | 一种基于深度学习的情感语音合成方法及装置 |
-
2019
- 2019-09-10 CN CN201910850474.8A patent/CN110675853B/zh active Active
-
2020
- 2020-06-19 WO PCT/CN2020/096998 patent/WO2021047233A1/zh active Application Filing
- 2020-06-19 CA CA3154029A patent/CA3154029A1/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105355193A (zh) * | 2015-10-30 | 2016-02-24 | 百度在线网络技术(北京)有限公司 | 语音合成方法和装置 |
CN109523989A (zh) * | 2019-01-29 | 2019-03-26 | 网易有道信息技术(北京)有限公司 | 语音合成方法、语音合成装置、存储介质及电子设备 |
Non-Patent Citations (1)
Title |
---|
小规模情感数据和大规模中性数据相结合的情感韵律建模研究;邵艳秋等;《计算机研究与发展》;20070915(第09期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CA3154029A1 (en) | 2021-03-18 |
WO2021047233A1 (zh) | 2021-03-18 |
CN110675853A (zh) | 2020-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110675853B (zh) | 一种基于深度学习的情感语音合成方法及装置 | |
CN112184858B (zh) | 基于文本的虚拟对象动画生成方法及装置、存储介质、终端 | |
Nachmani et al. | Fitting new speakers based on a short untranscribed sample | |
CN105244026B (zh) | 一种语音处理方法及装置 | |
CN106056207B (zh) | 一种基于自然语言的机器人深度交互与推理方法与装置 | |
CN101064104B (zh) | 基于语音转换的情感语音生成方法 | |
CN110880315A (zh) | 一种基于音素后验概率的个性化语音和视频生成系统 | |
CN105159870A (zh) | 一种精准完成连续自然语音文本化的处理系统及方法 | |
CN110691258A (zh) | 一种节目素材制作方法、装置及计算机存储介质、电子设备 | |
CN111312292A (zh) | 基于语音的情绪识别方法、装置、电子设备及存储介质 | |
CN111916054A (zh) | 基于唇形的语音生成方法、装置和系统及存储介质 | |
CN103885924A (zh) | 一种领域自适应的公开课字幕自动生成系统及方法 | |
CN115147521A (zh) | 一种基于人工智能语义分析的角色表情动画的生成方法 | |
CN111259196B (zh) | 一种基于视频大数据的文章转视频的方法 | |
CN110111778A (zh) | 一种语音处理方法、装置、存储介质及电子设备 | |
CN114125506A (zh) | 语音审核方法及装置 | |
CN112185341A (zh) | 基于语音合成的配音方法、装置、设备和存储介质 | |
CN112242134A (zh) | 语音合成方法及装置 | |
CN115359778A (zh) | 基于说话人情感语音合成模型的对抗与元学习方法 | |
Um et al. | Facetron: A Multi-Speaker Face-to-Speech Model Based on Cross-Modal Latent Representations | |
CN111312211A (zh) | 一种基于过采样技术的方言语音识别系统 | |
CN113223513A (zh) | 语音转换方法、装置、设备和存储介质 | |
CN113299272A (zh) | 语音合成模型训练和语音合成方法、设备及存储介质 | |
Kadam et al. | A Survey of Audio Synthesis and Lip-syncing for Synthetic Video Generation | |
Kwon et al. | Implementation of Python-Based Korean Speech Generation Service with Tacotron |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000 Patentee after: Jiangsu Suning cloud computing Co.,Ltd. Country or region after: China Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000 Patentee before: Suning Cloud Computing Co.,Ltd. Country or region before: China |
|
CP03 | Change of name, title or address | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240204 Address after: Room 3104, Building A5, No. 3 Gutan Avenue, Economic Development Zone, Gaochun District, Nanjing City, Jiangsu Province, 210000 Patentee after: Jiangsu Biying Technology Co.,Ltd. Country or region after: China Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000 Patentee before: Jiangsu Suning cloud computing Co.,Ltd. Country or region before: China |
|
TR01 | Transfer of patent right |