CN110600013B - 非平行语料声音转换数据增强模型训练方法及装置 - Google Patents
非平行语料声音转换数据增强模型训练方法及装置 Download PDFInfo
- Publication number
- CN110600013B CN110600013B CN201910863861.5A CN201910863861A CN110600013B CN 110600013 B CN110600013 B CN 110600013B CN 201910863861 A CN201910863861 A CN 201910863861A CN 110600013 B CN110600013 B CN 110600013B
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- speech
- model
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 102
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 90
- 238000012549 training Methods 0.000 title abstract description 46
- 238000004590 computer program Methods 0.000 claims description 10
- 230000002708 enhancing effect Effects 0.000 claims description 8
- 230000003190 augmentative effect Effects 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 13
- 238000001228 spectrum Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 230000009466 transformation Effects 0.000 description 10
- 238000013507 mapping Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000013518 transcription Methods 0.000 description 3
- 230000035897 transcription Effects 0.000 description 3
- 238000011426 transformation method Methods 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000008485 antagonism Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000003042 antagnostic effect Effects 0.000 description 1
- 230000000454 anti-cipatory effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000037433 frameshift Effects 0.000 description 1
- 238000013383 initial experiment Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910863861.5A CN110600013B (zh) | 2019-09-12 | 2019-09-12 | 非平行语料声音转换数据增强模型训练方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910863861.5A CN110600013B (zh) | 2019-09-12 | 2019-09-12 | 非平行语料声音转换数据增强模型训练方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110600013A CN110600013A (zh) | 2019-12-20 |
CN110600013B true CN110600013B (zh) | 2021-11-02 |
Family
ID=68859258
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910863861.5A Active CN110600013B (zh) | 2019-09-12 | 2019-09-12 | 非平行语料声音转换数据增强模型训练方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110600013B (zh) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111462768B (zh) * | 2020-03-12 | 2023-04-25 | 南京邮电大学 | 基于共享训练的多尺度StarGAN的语音转换方法 |
CN111508519B (zh) * | 2020-04-03 | 2022-04-26 | 北京达佳互联信息技术有限公司 | 一种音频信号人声增强的方法及装置 |
CN112037760B (zh) | 2020-08-24 | 2022-01-07 | 北京百度网讯科技有限公司 | 语音频谱生成模型的训练方法、装置及电子设备 |
CN113409759B (zh) * | 2021-07-07 | 2023-04-07 | 浙江工业大学 | 一种端到端实时语音合成方法 |
CN114360557B (zh) * | 2021-12-22 | 2022-11-01 | 北京百度网讯科技有限公司 | 语音音色转换方法、模型训练方法、装置、设备和介质 |
CN114582029B (zh) * | 2022-05-06 | 2022-08-02 | 山东大学 | 一种非专业舞蹈运动序列增强方法及系统 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8751239B2 (en) * | 2007-10-04 | 2014-06-10 | Core Wireless Licensing, S.a.r.l. | Method, apparatus and computer program product for providing text independent voice conversion |
CN102982809B (zh) * | 2012-12-11 | 2014-12-10 | 中国科学技术大学 | 一种说话人声音转换方法 |
WO2018218081A1 (en) * | 2017-05-24 | 2018-11-29 | Modulate, LLC | System and method for voice-to-voice conversion |
US10796686B2 (en) * | 2017-10-19 | 2020-10-06 | Baidu Usa Llc | Systems and methods for neural text-to-speech using convolutional sequence learning |
CN108847249B (zh) * | 2018-05-30 | 2020-06-05 | 苏州思必驰信息科技有限公司 | 声音转换优化方法和系统 |
CN109326283B (zh) * | 2018-11-23 | 2021-01-26 | 南京邮电大学 | 非平行文本条件下基于文本编码器的多对多语音转换方法 |
CN109377986B (zh) * | 2018-11-29 | 2022-02-01 | 四川长虹电器股份有限公司 | 一种非平行语料语音个性化转换方法 |
CN110060690B (zh) * | 2019-04-04 | 2023-03-24 | 南京邮电大学 | 基于STARGAN和ResNet的多对多说话人转换方法 |
CN109979429A (zh) * | 2019-05-29 | 2019-07-05 | 南京硅基智能科技有限公司 | 一种tts的方法及系统 |
-
2019
- 2019-09-12 CN CN201910863861.5A patent/CN110600013B/zh active Active
Also Published As
Publication number | Publication date |
---|---|
CN110600013A (zh) | 2019-12-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110600013B (zh) | 非平行语料声音转换数据增强模型训练方法及装置 | |
CN110246488B (zh) | 半优化CycleGAN模型的语音转换方法及装置 | |
CN112634856B (zh) | 语音合成模型训练方法和语音合成方法 | |
US11450313B2 (en) | Determining phonetic relationships | |
US9183367B2 (en) | Voice based biometric authentication method and apparatus | |
US11810471B2 (en) | Computer implemented method and apparatus for recognition of speech patterns and feedback | |
US11676572B2 (en) | Instantaneous learning in text-to-speech during dialog | |
US20230230576A1 (en) | Text-to-speech synthesis method and system, and a method of training a text-to-speech synthesis system | |
WO2021212954A1 (zh) | 极低资源下的特定发音人情感语音合成方法及装置 | |
CN112908308B (zh) | 一种音频处理方法、装置、设备及介质 | |
Kabashima et al. | Dnn-based scoring of language learners’ proficiency using learners’ shadowings and native listeners’ responsive shadowings | |
Mirishkar et al. | CSTD-Telugu corpus: Crowd-sourced approach for large-scale speech data collection | |
Yi | Corpus-based unit selection for natural-sounding speech synthesis | |
CN114783410B (zh) | 语音合成方法、系统、电子设备和存储介质 | |
CN113053409B (zh) | 音频测评方法及装置 | |
Mann et al. | Tamil talk: What you speak is what you get! | |
Gunasekara et al. | Real-time translation of discrete sinhala speech to unicode text | |
JP7557085B2 (ja) | 対話中のテキスト-音声の瞬時学習 | |
Ming et al. | A Mandarin e‐learning system based on speech recognition and evaluation | |
US20240274122A1 (en) | Speech translation with performance characteristics | |
Jung et al. | Domain-adversarial training of multi-speaker TTS | |
王暁芸 et al. | Phoneme set design for second language speech recognition | |
Lerjebo et al. | Intelligent chatbot assistant: A study of Natural Language Processing and Artificial Intelligence | |
CN117877482A (zh) | 一种基于人脸唇动语音分离的声纹识别方法及装置 | |
Selouani | “Well Adjusted”: Using Robust and Flexible Speech Recognition Capabilities in Clean to Noisy Mobile Environments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200616 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant after: AI SPEECH Ltd. Applicant after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant before: AI SPEECH Ltd. Applicant before: SHANGHAI JIAO TONG University |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201028 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant after: AI SPEECH Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant before: AI SPEECH Ltd. Applicant before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. |
|
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |