CN108417207A - 一种深度混合生成网络自适应方法及系统 - Google Patents
一种深度混合生成网络自适应方法及系统 Download PDFInfo
- Publication number
- CN108417207A CN108417207A CN201810054314.8A CN201810054314A CN108417207A CN 108417207 A CN108417207 A CN 108417207A CN 201810054314 A CN201810054314 A CN 201810054314A CN 108417207 A CN108417207 A CN 108417207A
- Authority
- CN
- China
- Prior art keywords
- speaker
- network
- mean value
- phoneme
- adaptive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000003044 adaptive effect Effects 0.000 claims abstract description 98
- 238000012549 training Methods 0.000 claims abstract description 53
- 239000011159 matrix material Substances 0.000 claims abstract description 48
- 238000013528 artificial neural network Methods 0.000 claims abstract description 14
- 230000015654 memory Effects 0.000 claims description 19
- 238000003860 storage Methods 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 10
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000004891 communication Methods 0.000 claims description 6
- 210000005036 nerve Anatomy 0.000 claims 2
- 230000008569 process Effects 0.000 abstract description 9
- 230000009466 transformation Effects 0.000 description 12
- 238000013459 approach Methods 0.000 description 11
- 239000000203 mixture Substances 0.000 description 9
- 230000006870 function Effects 0.000 description 8
- 238000010586 diagram Methods 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 6
- 238000012417 linear regression Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000018199 S phase Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000027455 binding Effects 0.000 description 1
- 238000009739 binding Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 201000006549 dyspepsia Diseases 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810054314.8A CN108417207B (zh) | 2018-01-19 | 2018-01-19 | 一种深度混合生成网络自适应方法及系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810054314.8A CN108417207B (zh) | 2018-01-19 | 2018-01-19 | 一种深度混合生成网络自适应方法及系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108417207A true CN108417207A (zh) | 2018-08-17 |
CN108417207B CN108417207B (zh) | 2020-06-30 |
Family
ID=63125806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810054314.8A Active CN108417207B (zh) | 2018-01-19 | 2018-01-19 | 一种深度混合生成网络自适应方法及系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108417207B (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109378014A (zh) * | 2018-10-22 | 2019-02-22 | 华中师范大学 | 一种基于卷积神经网络的移动设备源识别方法及系统 |
CN109523995A (zh) * | 2018-12-26 | 2019-03-26 | 出门问问信息科技有限公司 | 语音识别方法、语音识别装置、可读存储介质和电子设备 |
CN110415686A (zh) * | 2019-05-21 | 2019-11-05 | 腾讯科技(深圳)有限公司 | 语音处理方法、装置、介质、电子设备 |
CN111243574A (zh) * | 2020-01-13 | 2020-06-05 | 苏州奇梦者网络科技有限公司 | 一种语音模型自适应训练方法、系统、装置及存储介质 |
CN112697883A (zh) * | 2020-12-13 | 2021-04-23 | 南通卓强信息技术有限公司 | 基于音频向量协方差矩阵的混凝土管桩浇筑质量检测方法 |
Citations (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102324232A (zh) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | 基于高斯混合模型的声纹识别方法及系统 |
CN102693724A (zh) * | 2011-03-22 | 2012-09-26 | 张燕 | 一种基于神经网络的高斯混合模型的噪声分类方法 |
CN102779510A (zh) * | 2012-07-19 | 2012-11-14 | 东南大学 | 基于特征空间自适应投影的语音情感识别方法 |
CN103531205A (zh) * | 2013-10-09 | 2014-01-22 | 常州工学院 | 基于深层神经网络特征映射的非对称语音转换方法 |
CN103594087A (zh) * | 2013-11-08 | 2014-02-19 | 安徽科大讯飞信息科技股份有限公司 | 提高口语评测性能的方法及系统 |
US20150095027A1 (en) * | 2013-09-30 | 2015-04-02 | Google Inc. | Key phrase detection |
US20150161994A1 (en) * | 2013-12-05 | 2015-06-11 | Nuance Communications, Inc. | Method and Apparatus for Speech Recognition Using Neural Networks with Speaker Adaptation |
CN104751228A (zh) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | 深度神经网络的构建方法及系统 |
US20150310858A1 (en) * | 2014-04-29 | 2015-10-29 | Microsoft Corporation | Shared hidden layer combination for speech recognition systems |
CN105122279A (zh) * | 2013-03-06 | 2015-12-02 | 微软技术许可有限责任公司 | 在识别系统中保守地适配深度神经网络 |
US20160034811A1 (en) * | 2014-07-31 | 2016-02-04 | Apple Inc. | Efficient generation of complementary acoustic models for performing automatic speech recognition system combination |
CN105590625A (zh) * | 2016-03-18 | 2016-05-18 | 上海语知义信息技术有限公司 | 声学模型自适应方法及系统 |
CN105679316A (zh) * | 2015-12-29 | 2016-06-15 | 深圳微服机器人科技有限公司 | 一种基于深度神经网络的语音关键词识别方法及装置 |
CN105702250A (zh) * | 2016-01-06 | 2016-06-22 | 福建天晴数码有限公司 | 语音识别方法和装置 |
US20160260426A1 (en) * | 2015-03-02 | 2016-09-08 | Electronics And Telecommunications Research Institute | Speech recognition apparatus and method |
CN106297773A (zh) * | 2015-05-29 | 2017-01-04 | 中国科学院声学研究所 | 一种神经网络声学模型训练方法 |
CN106504741A (zh) * | 2016-09-18 | 2017-03-15 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种基于深度神经网络音素信息的语音转换方法 |
CN106782510A (zh) * | 2016-12-19 | 2017-05-31 | 苏州金峰物联网技术有限公司 | 基于连续混合高斯hmm模型的地名语音信号识别方法 |
US20170162194A1 (en) * | 2015-12-04 | 2017-06-08 | Conexant Systems, Inc. | Semi-supervised system for multichannel source enhancement through configurable adaptive transformations and deep neural network |
CN106920544A (zh) * | 2017-03-17 | 2017-07-04 | 深圳市唯特视科技有限公司 | 一种基于深度神经网络特征训练的语音识别方法 |
CN106952643A (zh) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | 一种基于高斯均值超矢量与谱聚类的录音设备聚类方法 |
CN107112005A (zh) * | 2015-04-17 | 2017-08-29 | 微软技术许可有限责任公司 | 深度神经支持向量机 |
CN107124141A (zh) * | 2016-08-26 | 2017-09-01 | 深圳泽惠通通讯技术有限公司 | 基于复数域矩阵数值求解自适应误差验证的数字预失真的方法 |
CN107240397A (zh) * | 2017-08-14 | 2017-10-10 | 广东工业大学 | 一种基于声纹识别的智能锁及其语音识别方法和系统 |
CN107293288A (zh) * | 2017-06-09 | 2017-10-24 | 清华大学 | 一种残差长短期记忆循环神经网络的声学模型建模方法 |
US9805716B2 (en) * | 2015-02-12 | 2017-10-31 | Electronics And Telecommunications Research Institute | Apparatus and method for large vocabulary continuous speech recognition |
CN107331384A (zh) * | 2017-06-12 | 2017-11-07 | 平安科技(深圳)有限公司 | 语音识别方法、装置、计算机设备及存储介质 |
JP6243858B2 (ja) * | 2015-02-05 | 2017-12-06 | 日本電信電話株式会社 | 音声モデル学習方法、雑音抑圧方法、音声モデル学習装置、雑音抑圧装置、音声モデル学習プログラム及び雑音抑圧プログラム |
WO2017213678A1 (en) * | 2016-06-11 | 2017-12-14 | Apple Inc. | Intelligent device arbitration and control |
-
2018
- 2018-01-19 CN CN201810054314.8A patent/CN108417207B/zh active Active
Patent Citations (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102693724A (zh) * | 2011-03-22 | 2012-09-26 | 张燕 | 一种基于神经网络的高斯混合模型的噪声分类方法 |
CN102324232A (zh) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | 基于高斯混合模型的声纹识别方法及系统 |
CN102779510A (zh) * | 2012-07-19 | 2012-11-14 | 东南大学 | 基于特征空间自适应投影的语音情感识别方法 |
CN105122279A (zh) * | 2013-03-06 | 2015-12-02 | 微软技术许可有限责任公司 | 在识别系统中保守地适配深度神经网络 |
US20150095027A1 (en) * | 2013-09-30 | 2015-04-02 | Google Inc. | Key phrase detection |
CN103531205A (zh) * | 2013-10-09 | 2014-01-22 | 常州工学院 | 基于深层神经网络特征映射的非对称语音转换方法 |
CN103594087A (zh) * | 2013-11-08 | 2014-02-19 | 安徽科大讯飞信息科技股份有限公司 | 提高口语评测性能的方法及系统 |
EP3078020A1 (en) * | 2013-12-05 | 2016-10-12 | Nuance Communications, Inc. | Method and apparatus for speech recognition using neural networks with speaker adaptation |
US20150161994A1 (en) * | 2013-12-05 | 2015-06-11 | Nuance Communications, Inc. | Method and Apparatus for Speech Recognition Using Neural Networks with Speaker Adaptation |
CN104751228A (zh) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | 深度神经网络的构建方法及系统 |
US20150310858A1 (en) * | 2014-04-29 | 2015-10-29 | Microsoft Corporation | Shared hidden layer combination for speech recognition systems |
US9520127B2 (en) * | 2014-04-29 | 2016-12-13 | Microsoft Technology Licensing, Llc | Shared hidden layer combination for speech recognition systems |
US20160034811A1 (en) * | 2014-07-31 | 2016-02-04 | Apple Inc. | Efficient generation of complementary acoustic models for performing automatic speech recognition system combination |
JP6243858B2 (ja) * | 2015-02-05 | 2017-12-06 | 日本電信電話株式会社 | 音声モデル学習方法、雑音抑圧方法、音声モデル学習装置、雑音抑圧装置、音声モデル学習プログラム及び雑音抑圧プログラム |
US9805716B2 (en) * | 2015-02-12 | 2017-10-31 | Electronics And Telecommunications Research Institute | Apparatus and method for large vocabulary continuous speech recognition |
US20160260426A1 (en) * | 2015-03-02 | 2016-09-08 | Electronics And Telecommunications Research Institute | Speech recognition apparatus and method |
KR20160106270A (ko) * | 2015-03-02 | 2016-09-12 | 한국전자통신연구원 | 음성 인식 장치 및 방법 |
CN107112005A (zh) * | 2015-04-17 | 2017-08-29 | 微软技术许可有限责任公司 | 深度神经支持向量机 |
CN106297773A (zh) * | 2015-05-29 | 2017-01-04 | 中国科学院声学研究所 | 一种神经网络声学模型训练方法 |
US20170162194A1 (en) * | 2015-12-04 | 2017-06-08 | Conexant Systems, Inc. | Semi-supervised system for multichannel source enhancement through configurable adaptive transformations and deep neural network |
CN105679316A (zh) * | 2015-12-29 | 2016-06-15 | 深圳微服机器人科技有限公司 | 一种基于深度神经网络的语音关键词识别方法及装置 |
CN105702250A (zh) * | 2016-01-06 | 2016-06-22 | 福建天晴数码有限公司 | 语音识别方法和装置 |
CN105590625A (zh) * | 2016-03-18 | 2016-05-18 | 上海语知义信息技术有限公司 | 声学模型自适应方法及系统 |
WO2017213678A1 (en) * | 2016-06-11 | 2017-12-14 | Apple Inc. | Intelligent device arbitration and control |
CN107124141A (zh) * | 2016-08-26 | 2017-09-01 | 深圳泽惠通通讯技术有限公司 | 基于复数域矩阵数值求解自适应误差验证的数字预失真的方法 |
CN106504741A (zh) * | 2016-09-18 | 2017-03-15 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | 一种基于深度神经网络音素信息的语音转换方法 |
CN106782510A (zh) * | 2016-12-19 | 2017-05-31 | 苏州金峰物联网技术有限公司 | 基于连续混合高斯hmm模型的地名语音信号识别方法 |
CN106952643A (zh) * | 2017-02-24 | 2017-07-14 | 华南理工大学 | 一种基于高斯均值超矢量与谱聚类的录音设备聚类方法 |
CN106920544A (zh) * | 2017-03-17 | 2017-07-04 | 深圳市唯特视科技有限公司 | 一种基于深度神经网络特征训练的语音识别方法 |
CN107293288A (zh) * | 2017-06-09 | 2017-10-24 | 清华大学 | 一种残差长短期记忆循环神经网络的声学模型建模方法 |
CN107331384A (zh) * | 2017-06-12 | 2017-11-07 | 平安科技(深圳)有限公司 | 语音识别方法、装置、计算机设备及存储介质 |
CN107240397A (zh) * | 2017-08-14 | 2017-10-10 | 广东工业大学 | 一种基于声纹识别的智能锁及其语音识别方法和系统 |
Non-Patent Citations (2)
Title |
---|
XIN LEI: ""Deep neural networks with auxiliary Gaussian mixture models for real-time speech recognition"", 《2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING》 * |
YANMIN QIAN: ""Very deep convolutional neural networks for robust speech recognition"", 《IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109378014A (zh) * | 2018-10-22 | 2019-02-22 | 华中师范大学 | 一种基于卷积神经网络的移动设备源识别方法及系统 |
CN109523995A (zh) * | 2018-12-26 | 2019-03-26 | 出门问问信息科技有限公司 | 语音识别方法、语音识别装置、可读存储介质和电子设备 |
CN109523995B (zh) * | 2018-12-26 | 2019-07-09 | 出门问问信息科技有限公司 | 语音识别方法、语音识别装置、可读存储介质和电子设备 |
CN110415686A (zh) * | 2019-05-21 | 2019-11-05 | 腾讯科技(深圳)有限公司 | 语音处理方法、装置、介质、电子设备 |
CN110415686B (zh) * | 2019-05-21 | 2021-08-17 | 腾讯科技(深圳)有限公司 | 语音处理方法、装置、介质、电子设备 |
CN111243574A (zh) * | 2020-01-13 | 2020-06-05 | 苏州奇梦者网络科技有限公司 | 一种语音模型自适应训练方法、系统、装置及存储介质 |
CN112697883A (zh) * | 2020-12-13 | 2021-04-23 | 南通卓强信息技术有限公司 | 基于音频向量协方差矩阵的混凝土管桩浇筑质量检测方法 |
Also Published As
Publication number | Publication date |
---|---|
CN108417207B (zh) | 2020-06-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108417217B (zh) | 说话人识别网络模型训练方法、说话人识别方法及系统 | |
CN108922518B (zh) | 语音数据扩增方法和系统 | |
CN106104674B (zh) | 混合语音识别 | |
CN108417207A (zh) | 一种深度混合生成网络自适应方法及系统 | |
CN108417224B (zh) | 双向神经网络模型的训练和识别方法及系统 | |
CN108109613B (zh) | 用于智能对话语音平台的音频训练和识别方法及电子设备 | |
US9401148B2 (en) | Speaker verification using neural networks | |
CN110706692B (zh) | 儿童语音识别模型的训练方法及系统 | |
CN110310647B (zh) | 一种语音身份特征提取器、分类器训练方法及相关设备 | |
CN108417201B (zh) | 单信道多说话人身份识别方法及系统 | |
CN108962237A (zh) | 混合语音识别方法、装置及计算机可读存储介质 | |
CN110211575A (zh) | 用于数据增强的语音加噪方法及系统 | |
CN108766445A (zh) | 声纹识别方法及系统 | |
CN109887484A (zh) | 一种基于对偶学习的语音识别与语音合成方法及装置 | |
CN105096955B (zh) | 一种基于模型生长聚类的说话人快速识别方法及系统 | |
Bagchi et al. | Spectral feature mapping with mimic loss for robust speech recognition | |
Lee et al. | Ensemble of jointly trained deep neural network-based acoustic models for reverberant speech recognition | |
CN108986798B (zh) | 语音数据的处理方法、装置及设备 | |
CN110211599A (zh) | 应用唤醒方法、装置、存储介质及电子设备 | |
CN105895082A (zh) | 声学模型训练方法、语音识别方法及装置 | |
CN108091326A (zh) | 一种基于线性回归的声纹识别方法及系统 | |
CN109637527A (zh) | 对话语句的语义解析方法及系统 | |
Song et al. | Non-parallel training for voice conversion based on adaptation method | |
Meyer et al. | Anonymizing speech with generative adversarial networks to preserve speaker privacy | |
Yan et al. | Audio deepfake detection system with neural stitching for add 2022 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200619 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Co-patentee after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. Patentee after: AI SPEECH Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Co-patentee before: SHANGHAI JIAO TONG University Patentee before: AI SPEECH Co.,Ltd. |
|
TR01 | Transfer of patent right |
Effective date of registration: 20201027 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee after: AI SPEECH Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee before: AI SPEECH Co.,Ltd. Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
CP01 | Change in the name or title of a patent holder |
Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee after: Sipic Technology Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Patentee before: AI SPEECH Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder | ||
PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A deep hybrid generation network adaptive method and system Effective date of registration: 20230726 Granted publication date: 20200630 Pledgee: CITIC Bank Limited by Share Ltd. Suzhou branch Pledgor: Sipic Technology Co.,Ltd. Registration number: Y2023980049433 |
|
PE01 | Entry into force of the registration of the contract for pledge of patent right |