CN110444223B - 基于循环神经网络和声学特征的说话人分离方法及装置 - Google Patents
基于循环神经网络和声学特征的说话人分离方法及装置 Download PDFInfo
- Publication number
- CN110444223B CN110444223B CN201910561692.XA CN201910561692A CN110444223B CN 110444223 B CN110444223 B CN 110444223B CN 201910561692 A CN201910561692 A CN 201910561692A CN 110444223 B CN110444223 B CN 110444223B
- Authority
- CN
- China
- Prior art keywords
- speaker
- word
- recognized
- feature vector
- segmentation result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 57
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 43
- 125000004122 cyclic group Chemical group 0.000 title abstract description 10
- 239000013598 vector Substances 0.000 claims abstract description 156
- 230000011218 segmentation Effects 0.000 claims abstract description 65
- 238000006243 chemical reaction Methods 0.000 claims abstract description 43
- 230000000306 recurrent effect Effects 0.000 claims description 31
- 238000004590 computer program Methods 0.000 claims description 14
- 230000007246 mechanism Effects 0.000 claims description 10
- 230000004927 fusion Effects 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 3
- 238000000034 method Methods 0.000 abstract description 26
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000000556 factor analysis Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Quality & Reliability (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Machine Translation (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910561692.XA CN110444223B (zh) | 2019-06-26 | 2019-06-26 | 基于循环神经网络和声学特征的说话人分离方法及装置 |
PCT/CN2019/117805 WO2020258661A1 (fr) | 2019-06-26 | 2019-11-13 | Procédé et appareil de séparation relatifs à une personne qui parle fondés sur un réseau neuronal récurrent et sur des caractéristiques acoustiques |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910561692.XA CN110444223B (zh) | 2019-06-26 | 2019-06-26 | 基于循环神经网络和声学特征的说话人分离方法及装置 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110444223A CN110444223A (zh) | 2019-11-12 |
CN110444223B true CN110444223B (zh) | 2023-05-23 |
Family
ID=68428733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910561692.XA Active CN110444223B (zh) | 2019-06-26 | 2019-06-26 | 基于循环神经网络和声学特征的说话人分离方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110444223B (fr) |
WO (1) | WO2020258661A1 (fr) |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110444223B (zh) * | 2019-06-26 | 2023-05-23 | 平安科技(深圳)有限公司 | 基于循环神经网络和声学特征的说话人分离方法及装置 |
CN112951270B (zh) * | 2019-11-26 | 2024-04-19 | 新东方教育科技集团有限公司 | 语音流利度检测的方法、装置和电子设备 |
CN110931013B (zh) * | 2019-11-29 | 2022-06-03 | 北京搜狗科技发展有限公司 | 一种语音数据的处理方法及装置 |
CN111128223B (zh) * | 2019-12-30 | 2022-08-05 | 科大讯飞股份有限公司 | 一种基于文本信息的辅助说话人分离方法及相关装置 |
CN113112993B (zh) * | 2020-01-10 | 2024-04-02 | 阿里巴巴集团控股有限公司 | 一种音频信息处理方法、装置、电子设备以及存储介质 |
CN111261186B (zh) * | 2020-01-16 | 2023-05-30 | 南京理工大学 | 基于改进自注意力机制与跨频带特征的音频音源分离方法 |
CN111276131B (zh) * | 2020-01-22 | 2021-01-12 | 厦门大学 | 一种基于深度神经网络的多类声学特征整合方法和系统 |
CN111461173B (zh) * | 2020-03-06 | 2023-06-20 | 华南理工大学 | 一种基于注意力机制的多说话人聚类系统及方法 |
CN111223476B (zh) * | 2020-04-23 | 2020-08-04 | 深圳市友杰智新科技有限公司 | 语音特征向量的提取方法、装置、计算机设备和存储介质 |
CN111524527B (zh) * | 2020-04-30 | 2023-08-22 | 合肥讯飞数码科技有限公司 | 话者分离方法、装置、电子设备和存储介质 |
CN111640450A (zh) * | 2020-05-13 | 2020-09-08 | 广州国音智能科技有限公司 | 多人声音频处理方法、装置、设备及可读存储介质 |
CN111640456B (zh) * | 2020-06-04 | 2023-08-22 | 合肥讯飞数码科技有限公司 | 叠音检测方法、装置和设备 |
CN111883165B (zh) * | 2020-07-02 | 2024-06-18 | 中移(杭州)信息技术有限公司 | 说话人语音切分方法、装置、电子设备及存储介质 |
CN112201275B (zh) * | 2020-10-09 | 2024-05-07 | 深圳前海微众银行股份有限公司 | 声纹分割方法、装置、设备及可读存储介质 |
CN112233668B (zh) * | 2020-10-21 | 2023-04-07 | 中国人民解放军海军工程大学 | 一种基于神经网络的语音指令及身份识别方法 |
CN112992175B (zh) * | 2021-02-04 | 2023-08-11 | 深圳壹秘科技有限公司 | 一种语音区分方法及其语音记录装置 |
CN113642422B (zh) * | 2021-07-27 | 2024-05-24 | 东北电力大学 | 一种连续中文手语识别方法 |
CN113555034B (zh) * | 2021-08-03 | 2024-03-01 | 京东科技信息技术有限公司 | 压缩音频识别方法、装置及存储介质 |
CN113707130B (zh) * | 2021-08-16 | 2024-06-14 | 北京搜狗科技发展有限公司 | 一种语音识别方法、装置和用于语音识别的装置 |
CN113822276B (zh) * | 2021-09-30 | 2024-06-14 | 中国平安人寿保险股份有限公司 | 基于神经网络的图片矫正方法、装置、设备及介质 |
CN114330474B (zh) * | 2021-10-20 | 2024-04-26 | 腾讯科技(深圳)有限公司 | 一种数据处理方法、装置、计算机设备以及存储介质 |
CN114927124A (zh) * | 2022-03-04 | 2022-08-19 | 上海交通大学 | 一种基于语音识别和自然语言处理的实验室语音监控系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106683661A (zh) * | 2015-11-05 | 2017-05-17 | 阿里巴巴集团控股有限公司 | 基于语音的角色分离方法及装置 |
CN107731233A (zh) * | 2017-11-03 | 2018-02-23 | 王华锋 | 一种基于rnn的声纹识别方法 |
CN108766440A (zh) * | 2018-05-28 | 2018-11-06 | 平安科技(深圳)有限公司 | 说话人分离模型训练方法、两说话人分离方法及相关设备 |
CN109147758A (zh) * | 2018-09-12 | 2019-01-04 | 科大讯飞股份有限公司 | 一种说话人声音转换方法及装置 |
CN109584903A (zh) * | 2018-12-29 | 2019-04-05 | 中国科学院声学研究所 | 一种基于深度学习的多人语音分离方法 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6895376B2 (en) * | 2001-05-04 | 2005-05-17 | Matsushita Electric Industrial Co., Ltd. | Eigenvoice re-estimation technique of acoustic models for speech recognition, speaker identification and speaker verification |
CN105427858B (zh) * | 2015-11-06 | 2019-09-03 | 科大讯飞股份有限公司 | 实现语音自动分类的方法及系统 |
US9818431B2 (en) * | 2015-12-21 | 2017-11-14 | Microsoft Technoloogy Licensing, LLC | Multi-speaker speech separation |
US11373672B2 (en) * | 2016-06-14 | 2022-06-28 | The Trustees Of Columbia University In The City Of New York | Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments |
CN108320732A (zh) * | 2017-01-13 | 2018-07-24 | 阿里巴巴集团控股有限公司 | 生成目标说话人语音识别计算模型的方法和装置 |
KR102486395B1 (ko) * | 2017-11-23 | 2023-01-10 | 삼성전자주식회사 | 화자 인식을 위한 뉴럴 네트워크 장치, 및 그 동작 방법 |
CN109036454A (zh) * | 2018-06-06 | 2018-12-18 | 安徽继远软件有限公司 | 基于dnn的说话人无关单通道录音分离的方法和系统 |
CN110444223B (zh) * | 2019-06-26 | 2023-05-23 | 平安科技(深圳)有限公司 | 基于循环神经网络和声学特征的说话人分离方法及装置 |
-
2019
- 2019-06-26 CN CN201910561692.XA patent/CN110444223B/zh active Active
- 2019-11-13 WO PCT/CN2019/117805 patent/WO2020258661A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106683661A (zh) * | 2015-11-05 | 2017-05-17 | 阿里巴巴集团控股有限公司 | 基于语音的角色分离方法及装置 |
CN107731233A (zh) * | 2017-11-03 | 2018-02-23 | 王华锋 | 一种基于rnn的声纹识别方法 |
CN108766440A (zh) * | 2018-05-28 | 2018-11-06 | 平安科技(深圳)有限公司 | 说话人分离模型训练方法、两说话人分离方法及相关设备 |
CN109147758A (zh) * | 2018-09-12 | 2019-01-04 | 科大讯飞股份有限公司 | 一种说话人声音转换方法及装置 |
CN109584903A (zh) * | 2018-12-29 | 2019-04-05 | 中国科学院声学研究所 | 一种基于深度学习的多人语音分离方法 |
Also Published As
Publication number | Publication date |
---|---|
WO2020258661A1 (fr) | 2020-12-30 |
CN110444223A (zh) | 2019-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110444223B (zh) | 基于循环神经网络和声学特征的说话人分离方法及装置 | |
US10762305B2 (en) | Method for generating chatting data based on artificial intelligence, computer device and computer-readable storage medium | |
CN110288980A (zh) | 语音识别方法、模型的训练方法、装置、设备及存储介质 | |
CN1760974B (zh) | 用于标识至少一个语音单元的方法 | |
CN111128137A (zh) | 一种声学模型的训练方法、装置、计算机设备和存储介质 | |
CN116884391B (zh) | 基于扩散模型的多模态融合音频生成方法及装置 | |
CN113936643B (zh) | 语音识别方法、语音识别模型、电子设备和存储介质 | |
CN116250038A (zh) | 变换器换能器:一种统一流式和非流式语音识别的模型 | |
CN113300813B (zh) | 基于注意力的针对文本的联合信源信道方法 | |
Karita et al. | Sequence training of encoder-decoder model using policy gradient for end-to-end speech recognition | |
KR20230175258A (ko) | 반복적 화자 임베딩을 통한 종단간 화자 분리 | |
CN109885811B (zh) | 文章风格转换方法、装置、计算机设备及存储介质 | |
Kim et al. | Sequential labeling for tracking dynamic dialog states | |
JP2024508196A (ja) | 拡張された自己注意によってコンテキストを取り込むための人工知能システム | |
CN114360502A (zh) | 语音识别模型的处理方法、语音识别方法及装置 | |
JP7329393B2 (ja) | 音声信号処理装置、音声信号処理方法、音声信号処理プログラム、学習装置、学習方法及び学習プログラム | |
CN113948090B (zh) | 语音检测方法、会话记录产品及计算机存储介质 | |
CN113793599B (zh) | 语音识别模型的训练方法和语音识别方法及装置 | |
CN113362858B (zh) | 语音情感分类方法、装置、设备及介质 | |
CN115273862A (zh) | 语音处理的方法、装置、电子设备和介质 | |
CN115691510A (zh) | 一种基于随机屏蔽训练的声纹识别方法及计算机设备 | |
CN112735392B (zh) | 语音处理方法、装置、设备及存储介质 | |
JP7291099B2 (ja) | 音声認識方法及び装置 | |
CN117581233A (zh) | 适用于流应用的以双因果和非因果受限自注意力进行序列到序列处理的人工智能系统 | |
CN114333772A (zh) | 语音识别方法、装置、设备、可读存储介质及产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |