CN113066481B - 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 - Google Patents
一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 Download PDFInfo
- Publication number
- CN113066481B CN113066481B CN202110347388.2A CN202110347388A CN113066481B CN 113066481 B CN113066481 B CN 113066481B CN 202110347388 A CN202110347388 A CN 202110347388A CN 113066481 B CN113066481 B CN 113066481B
- Authority
- CN
- China
- Prior art keywords
- kelm
- feature
- model
- bird sound
- bird
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 241000282461 Canis lupus Species 0.000 claims abstract description 6
- 238000002790 cross-validation Methods 0.000 claims abstract description 6
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims abstract description 5
- 238000005070 sampling Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 2
- 238000013145 classification model Methods 0.000 abstract description 4
- 241000271566 Aves Species 0.000 description 7
- 238000002474 experimental method Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000000877 morphologic effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明涉及一种基于混合特征选择和GWO‑KELM模型的鸟声识别方法,属于鸟鸣声分类识别技术领域。该方法包括如下步骤:首先从鸟声数据提取ComParE特征集,接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到适用于鸟声识别的特征子集,然后将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数,最后在该参数上对KELM模型进行训练,得到识别结果。本发明使用大规模声学特征集ComParE,减弱了噪声对于识别结果的影响;通过基于KELM和Fscore的混合特征选择算法,降低了特征集的冗余度,提高了识别准确率;通过GWO优化KELM分类模型,找到最佳参数充分发挥KELM模型的性能。
Description
技术领域
本发明涉及一种基于混合特征选择和GWO-KELM(灰狼算法优化核极限学习机)模型的鸟声识别方法,属于鸟鸣声分类识别技术领域。
背景技术
鸟类作为生态系统的重要组成部分,对鸟类活动和分布的监测为了解一个地区的生物多样性变化和气候变化提供了重要的依据,因此对鸟类的监测与分类识别具有重要意义。鸟鸣声和形态特征是区分鸟类的重要特征,也是目前鸟类物种识别普遍采用的方式,在实际监测中鸟鸣声相较于形态特征更加便于监测。通过对鸟类物种的识别,对保护珍稀野生鸟类物种也具有重要意义。
目前国内外对于鸟声识别的研究并不是很多,通过调查研究发现目前国内外的鸟声识别技术主要通过改进鸟声提取算法,提取各种鸟声特征然后使用机器学习算法构建分类器进行识别。然而目前方法所提取的鸟声特征较为单一,识别效果易受环境噪声影响。
发明内容
为了解决鸟声识别算法中提取特征单一、分类准确率低等问题,本发明提出了一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,将广泛用于语音情感识别中的大规模声学特征集ComParE(Computational Paralinguistics ChallengE,InterSpeech挑战赛公开的特征集)引入鸟声识别领域,采用灰狼优化算法(GWO)寻找极限学习机(KELM)模型参数的全局最优值,提高准确率。
本发明为解决其技术问题采用如下技术方案:
一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,包括如下步骤:
(1)从鸟声数据提取ComParE特征集;
(2)接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到适用于鸟声识别的特征子集;
(3)将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数σ;
(4)最后在该参数上对KELM模型进行训练,得到识别结果。
步骤(1)的具体过程如下
首先对鸟声数据统一为单声道、采样率44.1KHz、32位的WAV格式音频,使用OpenSmile提取ComParE特征集。
本发明的有益效果如下:
使用大规模声学特征集ComParE,减弱了噪声对于识别结果的影响;通过基于KELM和Fscore(特征的F分数,一个特征评价标准)的混合特征选择算法,降低了特征集的冗余度,提高了识别准确率;通过GWO优化KELM分类模型,找到最佳参数充分发挥KELM模型的性能。
附图说明
图1为鸟声识别系统框图。
图2为GWO-KELM迭代结果图。
具体实施方式
下面结合附图对本发明创造做进一步详细说明。
本发明提供一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,其方法流程如图1所示,首先对鸟声数据统一为单声道、采样率44.1KHz、32位的WAV格式音频,使用OpenSmile(open Speech and Music Interpretation by Large Space Extraction,一个开源的音频特征提取软件)提取ComParE特征集,为选择合适特征,接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到最终适用于鸟声识别的特征子集(惩罚参数λ设置为0.001),然后将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数σ。最后在该参数上对KELM模型进行训练,得到识别结果。
实验采用的鸟声数据来自德国柏林自然科学博物馆,该数据库由专业的鸟类学家在自然野外环境中采集的鸟鸣声数据组成。为了保证足够的训练、测试数据,本文实验删除了数据库中鸟声音频文件数量低于25个的鸟类,采用了60种鸟类共计4468个鸟声音频文件。
本研究实验以MATALB 2018b为平台,十折交叉验证的方式为实验协议,采用准确率和F1-score(F1分数,一种分类模型的评价标准)作为分类模型评价指标。共分为三个部分实验。首先对比ComParE特征集在不同分类器上的表现,其次对比选择后的特征子集与原始ComParE特征集在不同分类器上的识别精度,最后对比采用网格搜索方式和GWO随机搜索方式所得参数在60类鸟声识别的结果。
表1:ComParE特征集在分类器上的表现
表1为ComParE特征集在不同分类器上的表现,从中可以看出KELM分类器在10类、30类和60类鸟声识别十折交叉验证正确率为96.67%、93.77%和93.23%,相对于其他分类器均具有更高的正确率,结果表明KELM算法相较于其他算法在高维度鸟声特征分类识别中更具优势,体现了KELM分类器的优越性。
表2:60类鸟声特征选择前后的特征集在分类器上的识别结果
表2为60类鸟声特征选择前后的特征集在分类器上的识别结果,从中可以看出选择后的特征子集在四个分类器上的识别正确率和F1-score均高于原始特征集,提升幅度在2%-5%左右。结果表明,基于Fscore和KELM特征选择算法减少了冗余特征,所选特征集具有良好的分类能力。
图2为GWO-KELM模型迭代结果示意图,经过100次的迭代最终选择的最优参数c和σ分别为316、6112。
表3:不同寻参方式下60类鸟声识别实验结果
表3为不同寻参方式下60类鸟声识别实验结果,从中可以看出特征子集在采用网格搜索方式的KELM模型(c=2048,σ=4096)上的识别正确率为93.89%,在GWO-KELM(c=316,σ=6112)识别正确率为94.45%,相比与网格搜索的方式提高0.5%左右,表明了GWO-KELM模型的有效性。
Claims (2)
1.一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,其特征在于,包括如下步骤:
(1)从鸟声数据提取ComParE特征集;
(2)接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到适用于鸟声识别的特征子集;
(3)将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数σ;
(4)最后在该参数上对KELM模型进行训练,得到识别结果。
2.根据权利要求1所述的一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,其特征在于,步骤(1)的具体过程如下
首先对鸟声数据统一为单声道、采样率44.1KHz、32位的WAV格式音频,使用OpenSmile提取ComParE特征集。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110347388.2A CN113066481B (zh) | 2021-03-31 | 2021-03-31 | 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110347388.2A CN113066481B (zh) | 2021-03-31 | 2021-03-31 | 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113066481A CN113066481A (zh) | 2021-07-02 |
CN113066481B true CN113066481B (zh) | 2023-05-09 |
Family
ID=76565018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110347388.2A Active CN113066481B (zh) | 2021-03-31 | 2021-03-31 | 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113066481B (zh) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107818779A (zh) * | 2017-09-15 | 2018-03-20 | 北京理工大学 | 一种婴幼儿啼哭声检测方法、装置、设备及介质 |
CN108694953A (zh) * | 2017-04-07 | 2018-10-23 | 南京理工大学 | 一种基于Mel子带参数化特征的鸟鸣自动识别方法 |
CN110120224A (zh) * | 2019-05-10 | 2019-08-13 | 平安科技(深圳)有限公司 | 鸟声识别模型的构建方法、装置、计算机设备及存储介质 |
CN113724712A (zh) * | 2021-08-10 | 2021-11-30 | 南京信息工程大学 | 一种基于多特征融合和组合模型的鸟声识别方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7377233B2 (en) * | 2005-01-11 | 2008-05-27 | Pariff Llc | Method and apparatus for the automatic identification of birds by their vocalizations |
KR102195897B1 (ko) * | 2013-06-05 | 2020-12-28 | 삼성전자주식회사 | 음향 사건 검출 장치, 그 동작 방법 및 그 동작 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능 기록 매체 |
-
2021
- 2021-03-31 CN CN202110347388.2A patent/CN113066481B/zh active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108694953A (zh) * | 2017-04-07 | 2018-10-23 | 南京理工大学 | 一种基于Mel子带参数化特征的鸟鸣自动识别方法 |
CN107818779A (zh) * | 2017-09-15 | 2018-03-20 | 北京理工大学 | 一种婴幼儿啼哭声检测方法、装置、设备及介质 |
CN110120224A (zh) * | 2019-05-10 | 2019-08-13 | 平安科技(深圳)有限公司 | 鸟声识别模型的构建方法、装置、计算机设备及存储介质 |
CN113724712A (zh) * | 2021-08-10 | 2021-11-30 | 南京信息工程大学 | 一种基于多特征融合和组合模型的鸟声识别方法 |
Non-Patent Citations (6)
Title |
---|
Jancovic P,等.Automatic detection and recognition of tonal bird sounds in noisy environments.《EURASIP Journal on Advances in Signal Processing》.2011,全文. * |
Kun Qian,等.Bird sounds classification by large scale acoustic features and extreme learning machine.《2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)》.2016,全文. * |
Kun Qian.Active learning for bird sound classification via a kernel-based extreme learning machine.《The Journal of the Acoustical Society of America》.2017,全文. * |
冯郁茜.基于深度学习的双模态特征融合鸟类物种识别算法.《中国优秀硕士学位论文全文数据库》.2020,全文. * |
李大鹏,等.基于特征选择和GWO-KELM的鸟声识别算法.《声学技术》.2022,全文. * |
李大鹏.自然场景下鸟鸣声识别算法研.《中国优秀硕士学位论文全文数据库》.2023,全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN113066481A (zh) | 2021-07-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8195459B1 (en) | Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments | |
CN107862070B (zh) | 基于文本聚类的线上课堂讨论短文本即时分组方法及系统 | |
Park et al. | Towards unsupervised pattern discovery in speech | |
CN112102813B (zh) | 基于用户评论中上下文的语音识别测试数据生成方法 | |
CN103077709B (zh) | 一种基于共有鉴别性子空间映射的语种识别方法及装置 | |
WO2003010754A1 (fr) | Systeme de recherche a entree vocale | |
JP2017058483A (ja) | 音声処理装置、音声処理方法及び音声処理プログラム | |
CN109243460A (zh) | 一种自动生成基于地方方言的讯或询问笔录的方法 | |
CN103474061A (zh) | 基于分类器融合的汉语方言自动辨识方法 | |
CN104750677A (zh) | 语音传译装置、语音传译方法及语音传译程序 | |
JP2013029690A (ja) | 話者分類装置、話者分類方法および話者分類プログラム | |
JP6556381B2 (ja) | モデル学習装置及びモデル学習方法 | |
Peshterliev et al. | Active learning for new domains in natural language understanding | |
Elizalde et al. | An i-vector representation of acoustic environments for audio-based video event detection on user generated content | |
CN111354354B (zh) | 一种基于语义识别的训练方法、训练装置及终端设备 | |
Ghaemmaghami et al. | Speaker attribution of australian broadcast news data | |
CN113066481B (zh) | 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 | |
CN114186022A (zh) | 基于语音转录与知识图谱的调度指令质检方法及系统 | |
Burred | Genetic motif discovery applied to audio analysis | |
CN112489689A (zh) | 基于多尺度差异对抗的跨数据库语音情感识别方法及装置 | |
Li | Information retrieval method of professional music teaching based on Hidden Markov Model | |
CN112395414A (zh) | 文本分类方法和分类模型的训练方法、装置、介质和设备 | |
Douze et al. | The INRIA-LIM-VocR and AXES submissions to TRECVID 2014 multimedia event detection | |
Clark et al. | An algorithm for identifying authors using synonyms | |
Tomalin et al. | Discriminatively trained Gaussian mixture models for sentence boundary detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |