CN113066481B - 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 - Google Patents

一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 Download PDF

Info

Publication number
CN113066481B
CN113066481B CN202110347388.2A CN202110347388A CN113066481B CN 113066481 B CN113066481 B CN 113066481B CN 202110347388 A CN202110347388 A CN 202110347388A CN 113066481 B CN113066481 B CN 113066481B
Authority
CN
China
Prior art keywords
kelm
feature
model
bird sound
bird
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110347388.2A
Other languages
English (en)
Other versions
CN113066481A (zh
Inventor
周晓彦
李大鹏
徐华南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202110347388.2A priority Critical patent/CN113066481B/zh
Publication of CN113066481A publication Critical patent/CN113066481A/zh
Application granted granted Critical
Publication of CN113066481B publication Critical patent/CN113066481B/zh
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明涉及一种基于混合特征选择和GWO‑KELM模型的鸟声识别方法,属于鸟鸣声分类识别技术领域。该方法包括如下步骤:首先从鸟声数据提取ComParE特征集,接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到适用于鸟声识别的特征子集,然后将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数
Figure DEST_PATH_IMAGE001
,最后在该参数上对KELM模型进行训练,得到识别结果。本发明使用大规模声学特征集ComParE,减弱了噪声对于识别结果的影响;通过基于KELM和Fscore的混合特征选择算法,降低了特征集的冗余度,提高了识别准确率;通过GWO优化KELM分类模型,找到最佳参数充分发挥KELM模型的性能。

Description

一种基于混合特征选择和GWO-KELM模型的鸟声识别方法
技术领域
本发明涉及一种基于混合特征选择和GWO-KELM(灰狼算法优化核极限学习机)模型的鸟声识别方法,属于鸟鸣声分类识别技术领域。
背景技术
鸟类作为生态系统的重要组成部分,对鸟类活动和分布的监测为了解一个地区的生物多样性变化和气候变化提供了重要的依据,因此对鸟类的监测与分类识别具有重要意义。鸟鸣声和形态特征是区分鸟类的重要特征,也是目前鸟类物种识别普遍采用的方式,在实际监测中鸟鸣声相较于形态特征更加便于监测。通过对鸟类物种的识别,对保护珍稀野生鸟类物种也具有重要意义。
目前国内外对于鸟声识别的研究并不是很多,通过调查研究发现目前国内外的鸟声识别技术主要通过改进鸟声提取算法,提取各种鸟声特征然后使用机器学习算法构建分类器进行识别。然而目前方法所提取的鸟声特征较为单一,识别效果易受环境噪声影响。
发明内容
为了解决鸟声识别算法中提取特征单一、分类准确率低等问题,本发明提出了一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,将广泛用于语音情感识别中的大规模声学特征集ComParE(Computational Paralinguistics ChallengE,InterSpeech挑战赛公开的特征集)引入鸟声识别领域,采用灰狼优化算法(GWO)寻找极限学习机(KELM)模型参数的全局最优值,提高准确率。
本发明为解决其技术问题采用如下技术方案:
一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,包括如下步骤:
(1)从鸟声数据提取ComParE特征集;
(2)接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到适用于鸟声识别的特征子集;
(3)将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数σ;
(4)最后在该参数上对KELM模型进行训练,得到识别结果。
步骤(1)的具体过程如下
首先对鸟声数据统一为单声道、采样率44.1KHz、32位的WAV格式音频,使用OpenSmile提取ComParE特征集。
本发明的有益效果如下:
使用大规模声学特征集ComParE,减弱了噪声对于识别结果的影响;通过基于KELM和Fscore(特征的F分数,一个特征评价标准)的混合特征选择算法,降低了特征集的冗余度,提高了识别准确率;通过GWO优化KELM分类模型,找到最佳参数充分发挥KELM模型的性能。
附图说明
图1为鸟声识别系统框图。
图2为GWO-KELM迭代结果图。
具体实施方式
下面结合附图对本发明创造做进一步详细说明。
本发明提供一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,其方法流程如图1所示,首先对鸟声数据统一为单声道、采样率44.1KHz、32位的WAV格式音频,使用OpenSmile(open Speech and Music Interpretation by Large Space Extraction,一个开源的音频特征提取软件)提取ComParE特征集,为选择合适特征,接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到最终适用于鸟声识别的特征子集(惩罚参数λ设置为0.001),然后将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数σ。最后在该参数上对KELM模型进行训练,得到识别结果。
实验采用的鸟声数据来自德国柏林自然科学博物馆,该数据库由专业的鸟类学家在自然野外环境中采集的鸟鸣声数据组成。为了保证足够的训练、测试数据,本文实验删除了数据库中鸟声音频文件数量低于25个的鸟类,采用了60种鸟类共计4468个鸟声音频文件。
本研究实验以MATALB 2018b为平台,十折交叉验证的方式为实验协议,采用准确率和F1-score(F1分数,一种分类模型的评价标准)作为分类模型评价指标。共分为三个部分实验。首先对比ComParE特征集在不同分类器上的表现,其次对比选择后的特征子集与原始ComParE特征集在不同分类器上的识别精度,最后对比采用网格搜索方式和GWO随机搜索方式所得参数在60类鸟声识别的结果。
表1:ComParE特征集在分类器上的表现
Figure BDA0003001184190000031
表1为ComParE特征集在不同分类器上的表现,从中可以看出KELM分类器在10类、30类和60类鸟声识别十折交叉验证正确率为96.67%、93.77%和93.23%,相对于其他分类器均具有更高的正确率,结果表明KELM算法相较于其他算法在高维度鸟声特征分类识别中更具优势,体现了KELM分类器的优越性。
表2:60类鸟声特征选择前后的特征集在分类器上的识别结果
Figure BDA0003001184190000041
表2为60类鸟声特征选择前后的特征集在分类器上的识别结果,从中可以看出选择后的特征子集在四个分类器上的识别正确率和F1-score均高于原始特征集,提升幅度在2%-5%左右。结果表明,基于Fscore和KELM特征选择算法减少了冗余特征,所选特征集具有良好的分类能力。
图2为GWO-KELM模型迭代结果示意图,经过100次的迭代最终选择的最优参数c和σ分别为316、6112。
表3:不同寻参方式下60类鸟声识别实验结果
Figure BDA0003001184190000042
表3为不同寻参方式下60类鸟声识别实验结果,从中可以看出特征子集在采用网格搜索方式的KELM模型(c=2048,σ=4096)上的识别正确率为93.89%,在GWO-KELM(c=316,σ=6112)识别正确率为94.45%,相比与网格搜索的方式提高0.5%左右,表明了GWO-KELM模型的有效性。

Claims (2)

1.一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,其特征在于,包括如下步骤:
(1)从鸟声数据提取ComParE特征集;
(2)接着通过基于KELM和Fscore的混合特征选择算法对ComParE特征集进行特征选择得到适用于鸟声识别的特征子集;
(3)将特征子集在KELM模型十折交叉验证正确率作为灰狼优化算法的适应度,迭代寻找最优的正则化参数c和核函数参数σ;
(4)最后在该参数上对KELM模型进行训练,得到识别结果。
2.根据权利要求1所述的一种基于混合特征选择和GWO-KELM模型的鸟声识别方法,其特征在于,步骤(1)的具体过程如下
首先对鸟声数据统一为单声道、采样率44.1KHz、32位的WAV格式音频,使用OpenSmile提取ComParE特征集。
CN202110347388.2A 2021-03-31 2021-03-31 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法 Active CN113066481B (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110347388.2A CN113066481B (zh) 2021-03-31 2021-03-31 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110347388.2A CN113066481B (zh) 2021-03-31 2021-03-31 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法

Publications (2)

Publication Number Publication Date
CN113066481A CN113066481A (zh) 2021-07-02
CN113066481B true CN113066481B (zh) 2023-05-09

Family

ID=76565018

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110347388.2A Active CN113066481B (zh) 2021-03-31 2021-03-31 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法

Country Status (1)

Country Link
CN (1) CN113066481B (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818779A (zh) * 2017-09-15 2018-03-20 北京理工大学 一种婴幼儿啼哭声检测方法、装置、设备及介质
CN108694953A (zh) * 2017-04-07 2018-10-23 南京理工大学 一种基于Mel子带参数化特征的鸟鸣自动识别方法
CN110120224A (zh) * 2019-05-10 2019-08-13 平安科技(深圳)有限公司 鸟声识别模型的构建方法、装置、计算机设备及存储介质
CN113724712A (zh) * 2021-08-10 2021-11-30 南京信息工程大学 一种基于多特征融合和组合模型的鸟声识别方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7377233B2 (en) * 2005-01-11 2008-05-27 Pariff Llc Method and apparatus for the automatic identification of birds by their vocalizations
KR102195897B1 (ko) * 2013-06-05 2020-12-28 삼성전자주식회사 음향 사건 검출 장치, 그 동작 방법 및 그 동작 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터 판독 가능 기록 매체

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108694953A (zh) * 2017-04-07 2018-10-23 南京理工大学 一种基于Mel子带参数化特征的鸟鸣自动识别方法
CN107818779A (zh) * 2017-09-15 2018-03-20 北京理工大学 一种婴幼儿啼哭声检测方法、装置、设备及介质
CN110120224A (zh) * 2019-05-10 2019-08-13 平安科技(深圳)有限公司 鸟声识别模型的构建方法、装置、计算机设备及存储介质
CN113724712A (zh) * 2021-08-10 2021-11-30 南京信息工程大学 一种基于多特征融合和组合模型的鸟声识别方法

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Jancovic P,等.Automatic detection and recognition of tonal bird sounds in noisy environments.《EURASIP Journal on Advances in Signal Processing》.2011,全文. *
Kun Qian,等.Bird sounds classification by large scale acoustic features and extreme learning machine.《2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)》.2016,全文. *
Kun Qian.Active learning for bird sound classification via a kernel-based extreme learning machine.《The Journal of the Acoustical Society of America》.2017,全文. *
冯郁茜.基于深度学习的双模态特征融合鸟类物种识别算法.《中国优秀硕士学位论文全文数据库》.2020,全文. *
李大鹏,等.基于特征选择和GWO-KELM的鸟声识别算法.《声学技术》.2022,全文. *
李大鹏.自然场景下鸟鸣声识别算法研.《中国优秀硕士学位论文全文数据库》.2023,全文. *

Also Published As

Publication number Publication date
CN113066481A (zh) 2021-07-02

Similar Documents

Publication Publication Date Title
US8195459B1 (en) Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
CN107862070B (zh) 基于文本聚类的线上课堂讨论短文本即时分组方法及系统
Park et al. Towards unsupervised pattern discovery in speech
CN112102813B (zh) 基于用户评论中上下文的语音识别测试数据生成方法
CN103077709B (zh) 一种基于共有鉴别性子空间映射的语种识别方法及装置
WO2003010754A1 (fr) Systeme de recherche a entree vocale
JP2017058483A (ja) 音声処理装置、音声処理方法及び音声処理プログラム
CN109243460A (zh) 一种自动生成基于地方方言的讯或询问笔录的方法
CN103474061A (zh) 基于分类器融合的汉语方言自动辨识方法
CN104750677A (zh) 语音传译装置、语音传译方法及语音传译程序
JP2013029690A (ja) 話者分類装置、話者分類方法および話者分類プログラム
JP6556381B2 (ja) モデル学習装置及びモデル学習方法
Peshterliev et al. Active learning for new domains in natural language understanding
Elizalde et al. An i-vector representation of acoustic environments for audio-based video event detection on user generated content
CN111354354B (zh) 一种基于语义识别的训练方法、训练装置及终端设备
Ghaemmaghami et al. Speaker attribution of australian broadcast news data
CN113066481B (zh) 一种基于混合特征选择和gwo-kelm模型的鸟声识别方法
CN114186022A (zh) 基于语音转录与知识图谱的调度指令质检方法及系统
Burred Genetic motif discovery applied to audio analysis
CN112489689A (zh) 基于多尺度差异对抗的跨数据库语音情感识别方法及装置
Li Information retrieval method of professional music teaching based on Hidden Markov Model
CN112395414A (zh) 文本分类方法和分类模型的训练方法、装置、介质和设备
Douze et al. The INRIA-LIM-VocR and AXES submissions to TRECVID 2014 multimedia event detection
Clark et al. An algorithm for identifying authors using synonyms
Tomalin et al. Discriminatively trained Gaussian mixture models for sentence boundary detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant