CN110211575B - 用于数据增强的语音加噪方法及系统 - Google Patents
用于数据增强的语音加噪方法及系统 Download PDFInfo
- Publication number
- CN110211575B CN110211575B CN201910511890.5A CN201910511890A CN110211575B CN 110211575 B CN110211575 B CN 110211575B CN 201910511890 A CN201910511890 A CN 201910511890A CN 110211575 B CN110211575 B CN 110211575B
- Authority
- CN
- China
- Prior art keywords
- noise
- vector
- coding model
- conditional
- self
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
Abstract
Description
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910511890.5A CN110211575B (zh) | 2019-06-13 | 2019-06-13 | 用于数据增强的语音加噪方法及系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910511890.5A CN110211575B (zh) | 2019-06-13 | 2019-06-13 | 用于数据增强的语音加噪方法及系统 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110211575A CN110211575A (zh) | 2019-09-06 |
CN110211575B true CN110211575B (zh) | 2021-06-04 |
Family
ID=67792721
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910511890.5A Active CN110211575B (zh) | 2019-06-13 | 2019-06-13 | 用于数据增强的语音加噪方法及系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110211575B (zh) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110808033B (zh) * | 2019-09-25 | 2022-04-15 | 武汉科技大学 | 一种基于双重数据增强策略的音频分类方法 |
CN110706692B (zh) * | 2019-10-21 | 2021-12-14 | 思必驰科技股份有限公司 | 儿童语音识别模型的训练方法及系统 |
CN110807333B (zh) * | 2019-10-30 | 2024-02-06 | 腾讯科技(深圳)有限公司 | 一种语义理解模型的语义处理方法、装置及存储介质 |
CN111724767B (zh) * | 2019-12-09 | 2023-06-02 | 江汉大学 | 基于狄利克雷变分自编码器的口语理解方法及相关设备 |
CN111145730B (zh) * | 2019-12-30 | 2022-05-06 | 思必驰科技股份有限公司 | 语音识别模型的优化方法及系统 |
CN111161740A (zh) * | 2019-12-31 | 2020-05-15 | 中国建设银行股份有限公司 | 意图识别模型训练方法、意图识别方法以及相关装置 |
CN111341323B (zh) * | 2020-02-10 | 2022-07-01 | 厦门快商通科技股份有限公司 | 声纹识别训练数据扩增方法、系统、移动终端及存储介质 |
CN111564160B (zh) * | 2020-04-21 | 2022-10-18 | 重庆邮电大学 | 一种基于aewgan的语音降噪的方法 |
CN111724809A (zh) * | 2020-06-15 | 2020-09-29 | 苏州意能通信息技术有限公司 | 一种基于变分自编码器的声码器实现方法及装置 |
CN111653288B (zh) * | 2020-06-18 | 2023-05-09 | 南京大学 | 基于条件变分自编码器的目标人语音增强方法 |
CN112132225A (zh) * | 2020-09-28 | 2020-12-25 | 天津天地伟业智能安全防范科技有限公司 | 一种基于深度学习的数据增强方法 |
CN112509559B (zh) * | 2021-02-03 | 2021-04-13 | 北京世纪好未来教育科技有限公司 | 音频识别方法、模型训练方法、装置、设备及存储介质 |
CN114609493B (zh) * | 2022-05-09 | 2022-08-12 | 杭州兆华电子股份有限公司 | 一种信号数据增强的局部放电信号识别方法 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108777140A (zh) * | 2018-04-27 | 2018-11-09 | 南京邮电大学 | 一种非平行语料训练下基于vae的语音转换方法 |
CN108922518A (zh) * | 2018-07-18 | 2018-11-30 | 苏州思必驰信息科技有限公司 | 语音数据扩增方法和系统 |
US10204625B2 (en) * | 2010-06-07 | 2019-02-12 | Affectiva, Inc. | Audio analysis learning using video data |
CN109377978A (zh) * | 2018-11-12 | 2019-02-22 | 南京邮电大学 | 非平行文本条件下基于i向量的多对多说话人转换方法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6922284B2 (ja) * | 2017-03-15 | 2021-08-18 | 富士フイルムビジネスイノベーション株式会社 | 情報処理装置及びプログラム |
-
2019
- 2019-06-13 CN CN201910511890.5A patent/CN110211575B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10204625B2 (en) * | 2010-06-07 | 2019-02-12 | Affectiva, Inc. | Audio analysis learning using video data |
CN108777140A (zh) * | 2018-04-27 | 2018-11-09 | 南京邮电大学 | 一种非平行语料训练下基于vae的语音转换方法 |
CN108922518A (zh) * | 2018-07-18 | 2018-11-30 | 苏州思必驰信息科技有限公司 | 语音数据扩增方法和系统 |
CN109377978A (zh) * | 2018-11-12 | 2019-02-22 | 南京邮电大学 | 非平行文本条件下基于i向量的多对多说话人转换方法 |
Non-Patent Citations (6)
Title |
---|
Data augmentation and feature extraction using variational autoencoder for acoustic modeling;Nishizaki H;《2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)》;20180208;1222-1227 * |
Data Augmentation using Conditional Generative Adversarial Networks for Robust Speech Recognition;P. Sheng;《2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP)》;20190506;121-125 * |
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization;W. Hsu;《ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20190417;5901-5905 * |
Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation;Hsu W N;《2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)》;20180125;16-23 * |
人工智能研究的新前线:生成式对抗网络;林懿伦;《自动化学报》;20180531;775-792 * |
基于变分自动编码器的特征表示学习研究及其应用;李明宇;《中国优秀硕士学位论文全文数据库信息科技辑》;20190131;I140-97 * |
Also Published As
Publication number | Publication date |
---|---|
CN110211575A (zh) | 2019-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110211575B (zh) | 用于数据增强的语音加噪方法及系统 | |
CN110709924B (zh) | 视听语音分离 | |
Zhang et al. | Deep learning for environmentally robust speech recognition: An overview of recent developments | |
US10854205B2 (en) | Channel-compensated low-level features for speaker recognition | |
Qian et al. | Speech Enhancement Using Bayesian Wavenet. | |
CN110956957B (zh) | 语音增强模型的训练方法及系统 | |
CN110706692B (zh) | 儿童语音识别模型的训练方法及系统 | |
CN112634856B (zh) | 语音合成模型训练方法和语音合成方法 | |
CN108417224B (zh) | 双向神经网络模型的训练和识别方法及系统 | |
CN111161752A (zh) | 回声消除方法和装置 | |
EP2410514A2 (en) | Speaker authentication | |
KR20170030923A (ko) | 음향 모델 생성 장치 및 방법, 음성 인식 장치 및 방법 | |
CN111862934B (zh) | 语音合成模型的改进方法和语音合成方法及装置 | |
CN111145730B (zh) | 语音识别模型的优化方法及系统 | |
EP3989217B1 (en) | Method for detecting an audio adversarial attack with respect to a voice input processed by an automatic speech recognition system, corresponding device, computer program product and computer-readable carrier medium | |
CN110246489B (zh) | 用于儿童的语音识别方法及系统 | |
CN112837669B (zh) | 语音合成方法、装置及服务器 | |
Hsieh et al. | Improving perceptual quality by phone-fortified perceptual loss for speech enhancement | |
CN114267372A (zh) | 语音降噪方法、系统、电子设备和存储介质 | |
JP7329393B2 (ja) | 音声信号処理装置、音声信号処理方法、音声信号処理プログラム、学習装置、学習方法及び学習プログラム | |
CN106875944A (zh) | 一种语音控制家庭智能终端的系统 | |
WO2020015546A1 (zh) | 一种远场语音识别方法、语音识别模型训练方法和服务器 | |
Han et al. | Reverberation and noise robust feature compensation based on IMM | |
CN112634859B (zh) | 用于文本相关说话人识别的数据增强方法及系统 | |
CN115762557A (zh) | 用于语音分离的自监督训练预测器的训练方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20200616 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant after: AI SPEECH Ltd. Applicant after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant before: AI SPEECH Ltd. Applicant before: SHANGHAI JIAO TONG University |
|
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201026 Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant after: AI SPEECH Ltd. Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu. Applicant before: AI SPEECH Ltd. Applicant before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd. |
|
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |