CN108417207B - 一种深度混合生成网络自适应方法及系统 - Google Patents
一种深度混合生成网络自适应方法及系统 Download PDFInfo
- Publication number
- CN108417207B CN108417207B CN201810054314.8A CN201810054314A CN108417207B CN 108417207 B CN108417207 B CN 108417207B CN 201810054314 A CN201810054314 A CN 201810054314A CN 108417207 B CN108417207 B CN 108417207B
- Authority
- CN
- China
- Prior art keywords
- deep
- audio data
- generation network
- speaker
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 230000003044 adaptive effect Effects 0.000 claims abstract description 68
- 238000012549 training Methods 0.000 claims abstract description 49
- 239000011159 matrix material Substances 0.000 claims abstract description 45
- 239000000203 mixture Substances 0.000 claims abstract description 44
- 230000006978 adaptation Effects 0.000 claims abstract description 38
- 230000009466 transformation Effects 0.000 claims abstract description 37
- 238000013528 artificial neural network Methods 0.000 claims abstract description 15
- 238000004590 computer program Methods 0.000 claims description 9
- 238000002372 labelling Methods 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 13
- 230000001131 transforming effect Effects 0.000 abstract description 2
- 239000013598 vector Substances 0.000 description 9
- 230000001419 dependent effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 238000012417 linear regression Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012800 visualization Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003631 expected effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 229920001155 polypropylene Polymers 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009897 systematic effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810054314.8A CN108417207B (zh) | 2018-01-19 | 2018-01-19 | 一种深度混合生成网络自适应方法及系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810054314.8A CN108417207B (zh) | 2018-01-19 | 2018-01-19 | 一种深度混合生成网络自适应方法及系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108417207A CN108417207A (zh) | 2018-08-17 |
CN108417207B true CN108417207B (zh) | 2020-06-30 |
Family
ID=63125806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810054314.8A Active CN108417207B (zh) | 2018-01-19 | 2018-01-19 | 一种深度混合生成网络自适应方法及系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108417207B (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109378014A (zh) * | 2018-10-22 | 2019-02-22 | 华中师范大学 | 一种基于卷积神经网络的移动设备源识别方法及系统 |
CN109523995B (zh) * | 2018-12-26 | 2019-07-09 | 出门问问信息科技有限公司 | 语音识别方法、语音识别装置、可读存储介质和电子设备 |
CN110415686B (zh) * | 2019-05-21 | 2021-08-17 | 腾讯科技(深圳)有限公司 | 语音处理方法、装置、介质、电子设备 |
CN111243574B (zh) * | 2020-01-13 | 2023-01-03 | 苏州奇梦者网络科技有限公司 | 一种语音模型自适应训练方法、系统、装置及存储介质 |
CN112697883A (zh) * | 2020-12-13 | 2021-04-23 | 南通卓强信息技术有限公司 | 基于音频向量协方差矩阵的混凝土管桩浇筑质量检测方法 |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693724A (zh) * | 2011-03-22 | 2012-09-26 | 张燕 | 一种基于神经网络的高斯混合模型的噪声分类方法 |
CN102324232A (zh) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | 基于高斯混合模型的声纹识别方法及系统 |
CN102779510B (zh) * | 2012-07-19 | 2013-12-18 | 东南大学 | 基于特征空间自适应投影的语音情感识别方法 |
US9177550B2 (en) * | 2013-03-06 | 2015-11-03 | Microsoft Technology Licensing, Llc | Conservatively adapting a deep neural network in a recognition system |
US9202462B2 (en) * | 2013-09-30 | 2015-12-01 | Google Inc. | Key phrase detection |
CN103531205B (zh) * | 2013-10-09 | 2016-08-31 | 常州工学院 | 基于深层神经网络特征映射的非对称语音转换方法 |
CN103594087B (zh) * | 2013-11-08 | 2016-10-12 | 科大讯飞股份有限公司 | 提高口语评测性能的方法及系统 |
US9721561B2 (en) * | 2013-12-05 | 2017-08-01 | Nuance Communications, Inc. | Method and apparatus for speech recognition using neural networks with speaker adaptation |
CN104751228B (zh) * | 2013-12-31 | 2018-04-27 | 科大讯飞股份有限公司 | 用于语音识别的深度神经网络的构建方法及系统 |
US9520127B2 (en) * | 2014-04-29 | 2016-12-13 | Microsoft Technology Licensing, Llc | Shared hidden layer combination for speech recognition systems |
US20160034811A1 (en) * | 2014-07-31 | 2016-02-04 | Apple Inc. | Efficient generation of complementary acoustic models for performing automatic speech recognition system combination |
JP6243858B2 (ja) * | 2015-02-05 | 2017-12-06 | 日本電信電話株式会社 | 音声モデル学習方法、雑音抑圧方法、音声モデル学習装置、雑音抑圧装置、音声モデル学習プログラム及び雑音抑圧プログラム |
KR101988222B1 (ko) * | 2015-02-12 | 2019-06-13 | 한국전자통신연구원 | 대어휘 연속 음성 인식 장치 및 방법 |
KR101805976B1 (ko) * | 2015-03-02 | 2017-12-07 | 한국전자통신연구원 | 음성 인식 장치 및 방법 |
WO2016165120A1 (en) * | 2015-04-17 | 2016-10-20 | Microsoft Technology Licensing, Llc | Deep neural support vector machines |
CN106297773B (zh) * | 2015-05-29 | 2019-11-19 | 中国科学院声学研究所 | 一种神经网络声学模型训练方法 |
US10347271B2 (en) * | 2015-12-04 | 2019-07-09 | Synaptics Incorporated | Semi-supervised system for multichannel source enhancement through configurable unsupervised adaptive transformations and supervised deep neural network |
CN105679316A (zh) * | 2015-12-29 | 2016-06-15 | 深圳微服机器人科技有限公司 | 一种基于深度神经网络的语音关键词识别方法及装置 |
CN105702250B (zh) * | 2016-01-06 | 2020-05-19 | 福建天晴数码有限公司 | 语音识别方法和装置 |
CN105590625A (zh) * | 2016-03-18 | 2016-05-18 | 上海语知义信息技术有限公司 | 声学模型自适应方法及系统 |
DK179415B1 (en) * | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
CN107124141B (zh) * | 2016-08-26 | 2020-06-12 | 深圳泽惠通通讯技术有限公司 | 基于复数域矩阵数值求解自适应误差验证的数字预失真的方法 |
CN106504741B (zh) * | 2016-09-18 | 2019-10-25 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种基于深度神经网络音素信息的语音转换方法 |
CN106782510B (zh) * | 2016-12-19 | 2020-06-02 | 苏州金峰物联网技术有限公司 | 基于连续混合高斯hmm模型的地名语音信号识别方法 |
CN106952643A (zh) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | 一种基于高斯均值超矢量与谱聚类的录音设备聚类方法 |
CN106920544A (zh) * | 2017-03-17 | 2017-07-04 | 深圳市唯特视科技有限公司 | 一种基于深度神经网络特征训练的语音识别方法 |
CN107293288B (zh) * | 2017-06-09 | 2020-04-21 | 清华大学 | 一种残差长短期记忆循环神经网络的声学模型建模方法 |
CN107331384B (zh) * | 2017-06-12 | 2018-05-04 | 平安科技(深圳)有限公司 | 语音识别方法、装置、计算机设备及存储介质 |
CN107240397A (zh) * | 2017-08-14 | 2017-10-10 | 广东工业大学 | 一种基于声纹识别的智能锁及其语音识别方法和系统 |
-
2018
- 2018-01-19 CN CN201810054314.8A patent/CN108417207B/zh active Active
Also Published As
Publication number | Publication date |
---|---|
CN108417207A (zh) | 2018-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108417207B (zh) | 一种深度混合生成网络自适应方法及系统 | |
CN108417217B (zh) | 说话人识别网络模型训练方法、说话人识别方法及系统 | |
Li et al. | Developing far-field speaker system via teacher-student learning | |
CN106104674B (zh) | 混合语音识别 | |
US10629185B2 (en) | Statistical acoustic model adaptation method, acoustic model learning method suitable for statistical acoustic model adaptation, storage medium storing parameters for building deep neural network, and computer program for adapting statistical acoustic model | |
US9454958B2 (en) | Exploiting heterogeneous data in deep neural network-based speech recognition systems | |
Drude et al. | Integration of neural networks and probabilistic spatial models for acoustic blind source separation | |
US8935167B2 (en) | Exemplar-based latent perceptual modeling for automatic speech recognition | |
Žmolíková et al. | Learning speaker representation for neural network based multichannel speaker extraction | |
CN108417224B (zh) | 双向神经网络模型的训练和识别方法及系统 | |
US8515758B2 (en) | Speech recognition including removal of irrelevant information | |
Tzinis et al. | Remixit: Continual self-training of speech enhancement models via bootstrapped remixing | |
Liu et al. | An investigation into back-end advancements for speaker recognition in multi-session and noisy enrollment scenarios | |
CN106170800A (zh) | 经由输出分布来学习学生dnn | |
CN109887484A (zh) | 一种基于对偶学习的语音识别与语音合成方法及装置 | |
CN110400572B (zh) | 音频增强方法及系统 | |
US9984678B2 (en) | Factored transforms for separable adaptation of acoustic models | |
Saeidi et al. | Uncertain LDA: Including observation uncertainties in discriminative transforms | |
CN107910008B (zh) | 一种用于个人设备的基于多声学模型的语音识别方法 | |
Weninger et al. | Recognition of nonprototypical emotions in reverberated and noisy speech by nonnegative matrix factorization | |
CN111243604B (zh) | 支持多唤醒词的说话人识别神经网络模型的训练方法、说话人识别方法及系统 | |
Kandala et al. | Speaker Adaptation for Lip-Reading Using Visual Identity Vectors. | |
Bořil et al. | GAN-based augmentation for gender classification from speech spectrograms | |
CN114267334A (zh) | 语音识别模型训练方法及语音识别方法 | |
CN113380268A (zh) | 模型训练的方法、装置和语音信号的处理方法、装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200619 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Co-patentee after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. Patentee after: AI SPEECH Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Co-patentee before: SHANGHAI JIAO TONG University Patentee before: AI SPEECH Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20201027 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee after: AI SPEECH Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee before: AI SPEECH Co.,Ltd. Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee after: Sipic Technology Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee before: AI SPEECH Co.,Ltd. |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A deep hybrid generation network adaptive method and system Effective date of registration: 20230726 Granted publication date: 20200630 Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch Pledgor: Sipic Technology Co.,Ltd. Registration number: Y2023980049433 |