CN106328126B - 远场语音识别处理方法及装置 - Google Patents
远场语音识别处理方法及装置 Download PDFInfo
- Publication number
- CN106328126B CN106328126B CN201610917557.0A CN201610917557A CN106328126B CN 106328126 B CN106328126 B CN 106328126B CN 201610917557 A CN201610917557 A CN 201610917557A CN 106328126 B CN106328126 B CN 106328126B
- Authority
- CN
- China
- Prior art keywords
- voice
- far field
- field voice
- training
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 14
- 238000012549 training Methods 0.000 claims abstract description 118
- 238000013528 artificial neural network Methods 0.000 claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 35
- 238000012545 processing Methods 0.000 claims abstract description 30
- 230000008569 process Effects 0.000 claims description 20
- 238000004088 simulation Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 16
- 230000004927 fusion Effects 0.000 claims description 8
- 238000005457 optimization Methods 0.000 abstract description 2
- 238000004590 computer program Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 238000003062 neural network model Methods 0.000 description 7
- 238000003860 storage Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 2
- 210000005036 nerve Anatomy 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Machine Translation (AREA)
Abstract
Description
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610917557.0A CN106328126B (zh) | 2016-10-20 | 2016-10-20 | 远场语音识别处理方法及装置 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610917557.0A CN106328126B (zh) | 2016-10-20 | 2016-10-20 | 远场语音识别处理方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106328126A CN106328126A (zh) | 2017-01-11 |
CN106328126B true CN106328126B (zh) | 2019-08-16 |
Family
ID=57819200
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610917557.0A Active CN106328126B (zh) | 2016-10-20 | 2016-10-20 | 远场语音识别处理方法及装置 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106328126B (zh) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107680586B (zh) * | 2017-08-01 | 2020-09-29 | 百度在线网络技术(北京)有限公司 | 远场语音声学模型训练方法及系统 |
CN107481731B (zh) * | 2017-08-01 | 2021-01-22 | 百度在线网络技术(北京)有限公司 | 一种语音数据增强方法及系统 |
CN107452372B (zh) * | 2017-09-22 | 2020-12-11 | 百度在线网络技术(北京)有限公司 | 远场语音识别模型的训练方法和装置 |
CN109935226A (zh) * | 2017-12-15 | 2019-06-25 | 上海擎语信息科技有限公司 | 一种基于深度神经网络的远场语音识别增强系统及方法 |
CN108346433A (zh) * | 2017-12-28 | 2018-07-31 | 北京搜狗科技发展有限公司 | 一种音频处理方法、装置、设备及可读存储介质 |
CN110047478B (zh) * | 2018-01-16 | 2021-06-08 | 中国科学院声学研究所 | 基于空间特征补偿的多通道语音识别声学建模方法及装置 |
CN108269567B (zh) * | 2018-01-23 | 2021-02-05 | 北京百度网讯科技有限公司 | 用于生成远场语音数据的方法、装置、计算设备以及计算机可读存储介质 |
CN110097871B (zh) * | 2018-01-31 | 2023-05-12 | 阿里巴巴集团控股有限公司 | 一种语音数据处理方法及装置 |
CN108416096B (zh) * | 2018-02-01 | 2022-02-25 | 北京百度网讯科技有限公司 | 基于人工智能的远场语音数据信噪比估计方法及装置 |
CN108538303B (zh) * | 2018-04-23 | 2019-10-22 | 百度在线网络技术(北京)有限公司 | 用于生成信息的方法和装置 |
CN110930991B (zh) * | 2018-08-30 | 2023-08-25 | 阿里巴巴集团控股有限公司 | 一种远场语音识别模型训练方法及装置 |
CN109036412A (zh) * | 2018-09-17 | 2018-12-18 | 苏州奇梦者网络科技有限公司 | 语音唤醒方法和系统 |
KR20200063290A (ko) * | 2018-11-16 | 2020-06-05 | 삼성전자주식회사 | 오디오 장면을 인식하는 전자 장치 및 그 방법 |
CN109785856A (zh) * | 2019-03-01 | 2019-05-21 | 深圳市伟文无线通讯技术有限公司 | 一种多通道远近场语料采集方法及装置 |
CN111785282A (zh) * | 2019-04-03 | 2020-10-16 | 阿里巴巴集团控股有限公司 | 一种语音识别方法及装置和智能音箱 |
CN111862952B (zh) * | 2019-04-26 | 2024-04-12 | 华为技术有限公司 | 一种去混响模型训练方法及装置 |
CN110580906B (zh) * | 2019-08-01 | 2022-02-11 | 安徽声讯信息技术有限公司 | 一种基于云端数据的远场音频扩音方法及系统 |
CN112634877B (zh) * | 2019-10-09 | 2022-09-23 | 北京声智科技有限公司 | 一种远场语音模拟方法及装置 |
CN110827819A (zh) * | 2019-11-26 | 2020-02-21 | 珠海格力电器股份有限公司 | 家居设备控制方法及控制系统 |
CN112770222A (zh) * | 2020-12-25 | 2021-05-07 | 苏州思必驰信息科技有限公司 | 音频处理方法和装置 |
CN113257283B (zh) * | 2021-03-29 | 2023-09-26 | 北京字节跳动网络技术有限公司 | 音频信号的处理方法、装置、电子设备和存储介质 |
CN113241081B (zh) * | 2021-04-25 | 2023-06-16 | 华南理工大学 | 一种基于梯度反转层的远场说话人认证方法及系统 |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102779509B (zh) * | 2011-05-11 | 2014-12-03 | 联想(北京)有限公司 | 语音处理设备和语音处理方法 |
CN102890930B (zh) * | 2011-07-19 | 2014-06-04 | 上海上大海润信息系统有限公司 | 基于hmm/sofmnn混合模型的语音情感识别方法 |
CN103065629A (zh) * | 2012-11-20 | 2013-04-24 | 广东工业大学 | 一种仿人机器人的语音识别系统 |
CN104008751A (zh) * | 2014-06-18 | 2014-08-27 | 周婷婷 | 一种基于bp神经网络的说话人识别方法 |
CN105989839B (zh) * | 2015-06-03 | 2019-12-13 | 乐融致新电子科技(天津)有限公司 | 语音识别方法和装置 |
CN105355210B (zh) * | 2015-10-30 | 2020-06-23 | 百度在线网络技术(北京)有限公司 | 用于远场语音识别的预处理方法和装置 |
CN105427860B (zh) * | 2015-11-11 | 2019-09-03 | 百度在线网络技术(北京)有限公司 | 远场语音识别方法和装置 |
CN105448303B (zh) * | 2015-11-27 | 2020-02-04 | 百度在线网络技术(北京)有限公司 | 语音信号的处理方法和装置 |
CN105845128B (zh) * | 2016-04-06 | 2020-01-03 | 中国科学技术大学 | 基于动态剪枝束宽预测的语音识别效率优化方法 |
-
2016
- 2016-10-20 CN CN201610917557.0A patent/CN106328126B/zh active Active
Also Published As
Publication number | Publication date |
---|---|
CN106328126A (zh) | 2017-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106328126B (zh) | 远场语音识别处理方法及装置 | |
JP7337953B2 (ja) | 音声認識方法及び装置、ニューラルネットワークの訓練方法及び装置、並びにコンピュータープログラム | |
Warden | Speech commands: A dataset for limited-vocabulary speech recognition | |
US10381017B2 (en) | Method and device for eliminating background sound, and terminal device | |
CN110223705A (zh) | 语音转换方法、装置、设备及可读存储介质 | |
US10360899B2 (en) | Method and device for processing speech based on artificial intelligence | |
CN106887225A (zh) | 基于卷积神经网络的声学特征提取方法、装置和终端设备 | |
US10149089B1 (en) | Remote personalization of audio | |
CN108615525B (zh) | 一种语音识别方法及装置 | |
CN108831437A (zh) | 一种歌声生成方法、装置、终端和存储介质 | |
WO2021082823A1 (zh) | 音频处理方法、装置、计算机设备及存储介质 | |
CN110189748A (zh) | 模型构建方法和装置 | |
CN112164407B (zh) | 音色转换方法及装置 | |
CN109147831A (zh) | 一种语音连接播放方法、终端设备及计算机可读存储介质 | |
CN110232907A (zh) | 一种语音合成方法、装置、可读存储介质及计算设备 | |
CN113658583B (zh) | 一种基于生成对抗网络的耳语音转换方法、系统及其装置 | |
CN109949821A (zh) | 一种利用cnn的u-net结构进行远场语音去混响的方法 | |
US9484044B1 (en) | Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms | |
CN109410918A (zh) | 用于获取信息的方法及装置 | |
CN109616102A (zh) | 声学模型的训练方法、装置及存储介质 | |
CN105047192A (zh) | 基于隐马尔科夫模型的统计语音合成方法及装置 | |
CN108986841A (zh) | 音频信息处理方法、装置及存储介质 | |
CN110032355A (zh) | 语音播放方法、装置、终端设备及计算机存储介质 | |
CN113035169B (zh) | 一种可在线训练个性化音色库的语音合成方法和系统 | |
CN108363765A (zh) | 音频段落识别方法以及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: No. 101, 1st Floor, 1st Building, Xisanqi Building Materials City, Haidian District, Beijing, 100000 Patentee after: Yunzhisheng Intelligent Technology Co.,Ltd. Address before: 100191 Beijing, Huayuan Road, Haidian District No. 2 peony technology building, 5 floor, A503 Patentee before: BEIJING UNISOUND INFORMATION TECHNOLOGY Co.,Ltd. |
|
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200326 Address after: No. 101, 1st Floor, 1st Building, Xisanqi Building Materials City, Haidian District, Beijing, 100000 Co-patentee after: Xiamen yunzhixin Intelligent Technology Co.,Ltd. Patentee after: Yunzhisheng Intelligent Technology Co.,Ltd. Address before: No. 101, 1st Floor, 1st Building, Xisanqi Building Materials City, Haidian District, Beijing, 100000 Patentee before: Yunzhisheng Intelligent Technology Co.,Ltd. |