KR102057927B1 - 음성 합성 장치 및 그 방법 - Google Patents
음성 합성 장치 및 그 방법 Download PDFInfo
- Publication number
- KR102057927B1 KR102057927B1 KR1020190030905A KR20190030905A KR102057927B1 KR 102057927 B1 KR102057927 B1 KR 102057927B1 KR 1020190030905 A KR1020190030905 A KR 1020190030905A KR 20190030905 A KR20190030905 A KR 20190030905A KR 102057927 B1 KR102057927 B1 KR 102057927B1
- Authority
- KR
- South Korea
- Prior art keywords
- emotion
- speech synthesis
- neural network
- vector
- embedding
- Prior art date
Links
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 23
- 238000000034 method Methods 0.000 title claims description 50
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 192
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 192
- 230000008451 emotion Effects 0.000 claims abstract description 185
- 238000013528 artificial neural network Methods 0.000 claims abstract description 127
- 239000013598 vector Substances 0.000 claims abstract description 126
- 230000002996 emotional effect Effects 0.000 claims abstract description 39
- 238000007781 pre-processing Methods 0.000 claims abstract description 16
- 238000012549 training Methods 0.000 claims description 16
- 239000012141 concentrate Substances 0.000 claims description 2
- 230000000306 recurrent effect Effects 0.000 claims description 2
- 230000000644 propagated effect Effects 0.000 claims 1
- 230000006870 function Effects 0.000 description 14
- 238000001308 synthesis method Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 12
- 230000015654 memory Effects 0.000 description 11
- 238000004891 communication Methods 0.000 description 10
- 238000004590 computer program Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 4
- 239000012634 fragment Substances 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 230000010482 emotional regulation Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000001151 other effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020190030905A KR102057927B1 (ko) | 2019-03-19 | 2019-03-19 | 음성 합성 장치 및 그 방법 |
PCT/KR2020/003768 WO2020190054A1 (fr) | 2019-03-19 | 2020-03-19 | Appareil de synthèse de la parole et procédé associé |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020190030905A KR102057927B1 (ko) | 2019-03-19 | 2019-03-19 | 음성 합성 장치 및 그 방법 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020190167464A Division KR20200111609A (ko) | 2019-12-16 | 2019-12-16 | 음성 합성 장치 및 그 방법 |
Publications (1)
Publication Number | Publication Date |
---|---|
KR102057927B1 true KR102057927B1 (ko) | 2019-12-20 |
Family
ID=69062875
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020190030905A KR102057927B1 (ko) | 2019-03-19 | 2019-03-19 | 음성 합성 장치 및 그 방법 |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR102057927B1 (fr) |
WO (1) | WO2020190054A1 (fr) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111402923A (zh) * | 2020-03-27 | 2020-07-10 | 中南大学 | 基于wavenet的情感语音转换方法 |
CN111627420A (zh) * | 2020-04-21 | 2020-09-04 | 升智信息科技(南京)有限公司 | 极低资源下的特定发音人情感语音合成方法及装置 |
CN111667812A (zh) * | 2020-05-29 | 2020-09-15 | 北京声智科技有限公司 | 一种语音合成方法、装置、设备及存储介质 |
WO2020190054A1 (fr) * | 2019-03-19 | 2020-09-24 | 휴멜로 주식회사 | Appareil de synthèse de la parole et procédé associé |
CN111973178A (zh) * | 2020-08-14 | 2020-11-24 | 中国科学院上海微系统与信息技术研究所 | 一种脑电信号识别系统及方法 |
KR102277205B1 (ko) * | 2020-03-18 | 2021-07-15 | 휴멜로 주식회사 | 오디오 변환 장치 및 방법 |
KR20220004272A (ko) * | 2020-07-03 | 2022-01-11 | 한국과학기술원 | 음성 감정 인식 및 합성의 반복 학습 방법 및 장치 |
KR20220041448A (ko) * | 2020-09-25 | 2022-04-01 | 주식회사 딥브레인에이아이 | 텍스트 기반의 음성 합성 방법 및 장치 |
KR20220071525A (ko) * | 2020-11-24 | 2022-05-31 | 주식회사 자이냅스 | 어텐션 얼라인먼트의 스코어를 이용하여 스펙트로그램의 품질을 평가하는 방법 및 음성 합성 시스템 |
KR20220134247A (ko) * | 2021-03-26 | 2022-10-05 | 주식회사 엔씨소프트 | 음색 임베딩 모델 학습 장치 및 방법 |
US11769482B2 (en) | 2020-11-11 | 2023-09-26 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method and apparatus of synthesizing speech, method and apparatus of training speech synthesis model, electronic device, and storage medium |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11241574B2 (en) | 2019-09-11 | 2022-02-08 | Bose Corporation | Systems and methods for providing and coordinating vagus nerve stimulation with audio therapy |
CN112489621B (zh) * | 2020-11-20 | 2022-07-12 | 北京有竹居网络技术有限公司 | 语音合成方法、装置、可读介质及电子设备 |
CN112633364B (zh) * | 2020-12-21 | 2024-04-05 | 上海海事大学 | 一种基于Transformer-ESIM注意力机制的多模态情绪识别方法 |
CN112992177B (zh) * | 2021-02-20 | 2023-10-17 | 平安科技(深圳)有限公司 | 语音风格迁移模型的训练方法、装置、设备及存储介质 |
CN113257218B (zh) * | 2021-05-13 | 2024-01-30 | 北京有竹居网络技术有限公司 | 语音合成方法、装置、电子设备和存储介质 |
CN113421546B (zh) * | 2021-06-30 | 2024-03-01 | 平安科技(深圳)有限公司 | 基于跨被试多模态的语音合成方法及相关设备 |
CN114299915A (zh) * | 2021-11-09 | 2022-04-08 | 腾讯科技(深圳)有限公司 | 语音合成方法及相关设备 |
CN117423327B (zh) * | 2023-10-12 | 2024-03-19 | 北京家瑞科技有限公司 | 基于gpt神经网络的语音合成方法和装置 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006084967A (ja) * | 2004-09-17 | 2006-03-30 | Advanced Telecommunication Research Institute International | 予測モデルの作成方法およびコンピュータプログラム |
KR101954447B1 (ko) * | 2018-03-12 | 2019-03-05 | 박기수 | 이동 단말 및 고정 단말 간 연동 기반 텔레마케팅 서비스 제공 방법 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20130091364A (ko) * | 2011-12-26 | 2013-08-19 | 한국생산기술연구원 | 로봇의 학습이 가능한 감정생성장치 및 감정생성방법 |
KR102137523B1 (ko) * | 2017-08-09 | 2020-07-24 | 한국과학기술원 | 텍스트-음성 변환 방법 및 시스템 |
KR102057927B1 (ko) * | 2019-03-19 | 2019-12-20 | 휴멜로 주식회사 | 음성 합성 장치 및 그 방법 |
-
2019
- 2019-03-19 KR KR1020190030905A patent/KR102057927B1/ko active IP Right Grant
-
2020
- 2020-03-19 WO PCT/KR2020/003768 patent/WO2020190054A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006084967A (ja) * | 2004-09-17 | 2006-03-30 | Advanced Telecommunication Research Institute International | 予測モデルの作成方法およびコンピュータプログラム |
KR101954447B1 (ko) * | 2018-03-12 | 2019-03-05 | 박기수 | 이동 단말 및 고정 단말 간 연동 기반 텔레마케팅 서비스 제공 방법 |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020190054A1 (fr) * | 2019-03-19 | 2020-09-24 | 휴멜로 주식회사 | Appareil de synthèse de la parole et procédé associé |
KR102277205B1 (ko) * | 2020-03-18 | 2021-07-15 | 휴멜로 주식회사 | 오디오 변환 장치 및 방법 |
CN111402923A (zh) * | 2020-03-27 | 2020-07-10 | 中南大学 | 基于wavenet的情感语音转换方法 |
CN111402923B (zh) * | 2020-03-27 | 2023-11-03 | 中南大学 | 基于wavenet的情感语音转换方法 |
CN111627420A (zh) * | 2020-04-21 | 2020-09-04 | 升智信息科技(南京)有限公司 | 极低资源下的特定发音人情感语音合成方法及装置 |
CN111627420B (zh) * | 2020-04-21 | 2023-12-08 | 升智信息科技(南京)有限公司 | 极低资源下的特定发音人情感语音合成方法及装置 |
CN111667812B (zh) * | 2020-05-29 | 2023-07-18 | 北京声智科技有限公司 | 一种语音合成方法、装置、设备及存储介质 |
CN111667812A (zh) * | 2020-05-29 | 2020-09-15 | 北京声智科技有限公司 | 一种语音合成方法、装置、设备及存储介质 |
KR20220004272A (ko) * | 2020-07-03 | 2022-01-11 | 한국과학기술원 | 음성 감정 인식 및 합성의 반복 학습 방법 및 장치 |
KR102382191B1 (ko) * | 2020-07-03 | 2022-04-04 | 한국과학기술원 | 음성 감정 인식 및 합성의 반복 학습 방법 및 장치 |
CN111973178A (zh) * | 2020-08-14 | 2020-11-24 | 中国科学院上海微系统与信息技术研究所 | 一种脑电信号识别系统及方法 |
KR102392904B1 (ko) * | 2020-09-25 | 2022-05-02 | 주식회사 딥브레인에이아이 | 텍스트 기반의 음성 합성 방법 및 장치 |
KR20220041448A (ko) * | 2020-09-25 | 2022-04-01 | 주식회사 딥브레인에이아이 | 텍스트 기반의 음성 합성 방법 및 장치 |
US11769482B2 (en) | 2020-11-11 | 2023-09-26 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method and apparatus of synthesizing speech, method and apparatus of training speech synthesis model, electronic device, and storage medium |
KR102503066B1 (ko) | 2020-11-24 | 2023-03-02 | 주식회사 자이냅스 | 어텐션 얼라인먼트의 스코어를 이용하여 스펙트로그램의 품질을 평가하는 방법 및 음성 합성 시스템 |
KR20220071525A (ko) * | 2020-11-24 | 2022-05-31 | 주식회사 자이냅스 | 어텐션 얼라인먼트의 스코어를 이용하여 스펙트로그램의 품질을 평가하는 방법 및 음성 합성 시스템 |
KR20220134247A (ko) * | 2021-03-26 | 2022-10-05 | 주식회사 엔씨소프트 | 음색 임베딩 모델 학습 장치 및 방법 |
KR102576606B1 (ko) | 2021-03-26 | 2023-09-08 | 주식회사 엔씨소프트 | 음색 임베딩 모델 학습 장치 및 방법 |
Also Published As
Publication number | Publication date |
---|---|
WO2020190054A1 (fr) | 2020-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102057927B1 (ko) | 음성 합성 장치 및 그 방법 | |
KR102057926B1 (ko) | 음성 합성 장치 및 그 방법 | |
JP7204989B2 (ja) | エンドツーエンド音声合成システムにおける表現度の制御 | |
US11990118B2 (en) | Text-to-speech (TTS) processing | |
EP3614376B1 (fr) | Procédé de synthèse vocale, serveur et support de stockage | |
US20210209315A1 (en) | Direct Speech-to-Speech Translation via Machine Learning | |
KR20200143659A (ko) | 다중 언어 텍스트-음성 합성 방법 | |
KR20200111609A (ko) | 음성 합성 장치 및 그 방법 | |
US11763797B2 (en) | Text-to-speech (TTS) processing | |
US20200410981A1 (en) | Text-to-speech (tts) processing | |
US11289068B2 (en) | Method, device, and computer-readable storage medium for speech synthesis in parallel | |
JP7379756B2 (ja) | 韻律的特徴からのパラメトリックボコーダパラメータの予測 | |
US20230169953A1 (en) | Phrase-based end-to-end text-to-speech (tts) synthesis | |
JP2024505076A (ja) | 多様で自然なテキスト読み上げサンプルを生成する | |
KR20200111608A (ko) | 음성 합성 장치 및 그 방법 | |
KR102277205B1 (ko) | 오디오 변환 장치 및 방법 | |
JP7504188B2 (ja) | エンドツーエンド音声合成システムにおける表現度の制御 | |
KR20240035548A (ko) | 합성 트레이닝 데이터를 사용하는 2-레벨 텍스트-스피치 변환 시스템 | |
Oralbekova et al. | Current advances and algorithmic solutions in speech generation | |
Saleh et al. | Arabic Text-to-Speech Service with Syrian Dialect | |
Zhu et al. | Control Emotion Intensity for LSTM-Based Expressive Speech Synthesis | |
CN115346510A (zh) | 一种语音合成方法、装置、电子设备及存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
A107 | Divisional application of patent | ||
GRNT | Written decision to grant |