CN109741759A

CN109741759A - An acoustic automatic detection method for specific bird species

Info

Publication number: CN109741759A
Application number: CN201811566250.6A
Authority: CN
Inventors: 赵兆; 曾瑞文; 许志勇
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2019-05-10
Anticipated expiration: 2038-12-21
Also published as: CN109741759B

Abstract

The invention discloses a kind of acoustics automatic testing methods towards specific birds species.This method first uses the potential song segment of the specific birds species of acoustic events detection processing procedure extraction based on gauss hybrid models, in conjunction with based on candidate sound event energy, the duration that pipes, frequency distribution last handling process, complete robust detection and automatic segmentation；Then self-adapting signal noise reduction is carried out to each sound clip, corresponding mel cepstrum coefficients characteristic parameter is extracted to enhanced sound clip, obtains potential song segment characterizations collection；Potential song segment characterizations collection and support vector machines is finally combined to complete specific birds species Acoustic detection.The present invention realizes that process is convenient; furthermore the signal-to-noise ratio of sound event can be obviously improved by using the adaptive signal enhancement processing based on microphone array to improve particular species identification accuracy rate; the Acoustic detection for realizing specific birds species under the natural environment of field, is of great significance for the ecological protection of field Rare Birds species.

Description

An acoustic automatic detection method for specific bird species

技术领域technical field

本发明属于生态监测及声信号技术识别领域，特别涉及一种面向特定鸟类物种的声学自动检测方法。The invention belongs to the field of ecological monitoring and acoustic signal technology identification, and particularly relates to an acoustic automatic detection method for specific bird species.

背景技术Background technique

鸟类是一种分布很广的发声动物，相比于其它动物群落，鸟类的生活习性更易被人观察，且能够敏感地感知生态环境的微小变化，因此，鸟类被大多数生态学者认为是监测生态环境变化的理想物种。Birds are widely distributed vocal animals. Compared with other animal communities, the living habits of birds are more easily observed by humans, and they can sensitively perceive small changes in the ecological environment. Therefore, birds are considered by most ecologists. It is an ideal species for monitoring ecological environment changes.

然而，近年来由于人类城镇化的不断扩张，人类活动对生态环境的持续破坏，导致鸟类的数量和种类在大规模减少，这不仅会给全球物种多样性的带来损失，而且鸟类作为林地植被群落的指示性物种，鸟类的锐减会导致生态环境的不平衡。因此保护鸟类是生态领域的关注的焦点，尤其是对珍稀鸟类的监管更是重中之重。传统的计点法(pointcounts)在监测鸟类时成本需求较高且侵略了生物栖息地。为了更加快速地、无侵入地评估鸟类活动，有效地实现鸟类自动检测的需求越来越紧迫。However, in recent years, due to the continuous expansion of human urbanization and the continuous destruction of the ecological environment by human activities, the number and species of birds have decreased on a large scale. Indicative species of woodland vegetation community, the sharp decline of birds will lead to the imbalance of ecological environment. Therefore, the protection of birds is the focus of attention in the ecological field, especially the supervision of rare birds is the top priority. Traditional pointcounts are costly and invasive of habitats when monitoring birds. In order to assess bird activities more quickly and non-invasively, the need to effectively implement automatic bird detection is increasingly urgent.

其中，基于鸟类物种的鸣叫信号，利用声学信号分析手段对野外实测鸟声信号提取特征，是后续进行大规模数据分析及鸟声识别模型建立的基础。在真实复杂的环境中，考虑到基于特征参数的鸟类物种识别分类方法对环境噪声及其它干扰声源信号的敏感度较高，许多学者在积极探索噪声中鸟叫声识别方法。中国专利CN102930870A公开一种利用抗噪幂归一化倒谱系数的鸟类声音识别方法，该方法首先采用多频带谱减法对声音功率谱进行降噪处理，然后对降噪后的声音功率谱提取抗噪幂归一化倒谱系数，最后结合支持向量机(Support Vector Machine,SVM)对34种鸟类声音进行不同环境与信噪比情况下的识别。中国专利CN103489446A公开了一种复杂环境下基于自适应能量检测的鸟鸣识别方法，该方法首先将声音信号经过能量检测，对检测筛选出的声音信号帧提取基于Mel尺度的小波包分解子带倒谱系数(Wavelet Packet decomposition Subband Cepstral Coefficient,WPSCC)抗噪特征，结合SVM对在噪声环境下的15类鸟鸣进行分类识别。刘钊等介绍了一种基于随机森林和大规模声学特征提取的噪声环境下的鸟声识别算法(刘钊,张宇琛,胡海龙.随机森林和大规模声学特征的噪声环境鸟声识别仿真[J].系统仿真技术,2017(4):359-362.)。张赛花等首先采用基于高斯混合模型(Gaussian Mixture Model,GMM)声学事件检测处理过程提取潜在的鸟鸣片段，然后提取各片段的基于Mel子带参数化特征，并采用SVM对野外环境中的11类鸟鸣进行分类识别(张赛花,赵兆,许志勇,张怡.基于Mel子带参数化特征的自动鸟鸣识别[J].计算机应用,2017,37(4):1111-1115.)。Among them, based on the song signals of bird species, the use of acoustic signal analysis methods to extract features from the field measured bird sound signals is the basis for subsequent large-scale data analysis and bird sound recognition model establishment. In a real and complex environment, considering that the bird species identification and classification method based on characteristic parameters is highly sensitive to environmental noise and other interfering sound source signals, many scholars are actively exploring methods for identifying bird calls in noise. Chinese patent CN102930870A discloses a bird sound recognition method using anti-noise power normalized cepstral coefficients. The method first uses multi-band spectral subtraction to perform noise reduction processing on the sound power spectrum, and then extracts the noise power spectrum after noise reduction. The anti-noise power normalized cepstral coefficients, and finally combined with Support Vector Machine (SVM) to identify 34 kinds of bird sounds under different environments and signal-to-noise ratios. Chinese patent CN103489446A discloses a bird song recognition method based on self-adaptive energy detection in a complex environment. In this method, the sound signal is first subjected to energy detection, and the sound signal frames selected by the detection are extracted from the wavelet packet decomposition sub-band inversion based on Mel scale. Spectral coefficient (Wavelet Packet decomposition Subband Cepstral Coefficient, WPSCC) anti-noise feature, combined with SVM to classify and identify 15 types of bird song in noisy environment. Liu Zhao et al. introduced a bird sound recognition algorithm in noisy environment based on random forest and large-scale acoustic feature extraction (Liu Zhao, Zhang Yuchen, Hu Hailong. Simulation of bird sound recognition in noisy environment with random forest and large-scale acoustic features [J] ]. System Simulation Technology, 2017(4):359-362.). Zhang Saihua et al. first used the Gaussian Mixture Model (GMM)-based acoustic event detection process to extract potential bird song segments, and then extracted the Mel subband-based parametric features of each segment, and used SVM to analyze the 11 categories in the wild environment. Classification and recognition of birdsong

上述已有研究在实验时大多只加入一种噪声或者对噪声未做抑制处理，然而野外声学环境十分复杂，具有多种坏境干扰源，仅采用单通道音频增强前端处理无法满足实际野外复杂声学环境下的声学监测任务的要求。Most of the above-mentioned existing studies only add one kind of noise or do not suppress the noise in the experiment. However, the field acoustic environment is very complex, with a variety of environmental interference sources. Only single-channel audio enhancement front-end processing cannot meet the actual field complex acoustics. Requirements for acoustic monitoring tasks in the environment.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题在于提供一种面向特定鸟类物种的声学自动检测方法。The technical problem to be solved by the present invention is to provide an acoustic automatic detection method for specific bird species.

实验本发明目的的技术解决方案为：一种面向特定鸟类物种的声学自动检测方法，包括以下步骤：Experiment The technical solution for the purpose of the present invention is: an acoustic automatic detection method for a specific bird species, comprising the following steps:

步骤1、采集野外连续鸟声监测数据信号并进行自动分段，之后提取特定鸟类物种潜在鸣声片段；Step 1. Collect field continuous bird sound monitoring data signals and perform automatic segmentation, and then extract potential song fragments of specific bird species;

步骤2、对步骤1获得的各个潜在鸣声片段进行自适应信号降噪增强处理；Step 2, performing adaptive signal noise reduction enhancement processing on each potential sound segment obtained in step 1;

步骤3、对步骤2降噪增强后的每个潜在鸣声片段提取特征参数，构建潜在鸣声片段特征集；Step 3, extracting feature parameters for each potential sound segment after noise reduction and enhancement in step 2, and constructing a feature set of potential sound segments;

步骤4、结合步骤3获得的潜在鸣声片段特征集和机器学习中的识别算法完成特定鸟类物种的声学检测。Step 4. Acoustic detection of a specific bird species is completed by combining the feature set of potential song segments obtained in step 3 and the recognition algorithm in machine learning.

本发明与现有技术相比，其显著优点为：1)本发明中采用多元立体麦克风阵列采集连续鸟声监测数据，采集的数据包含了丰富的时间和空间信息，可实现在大时空尺度上对鸟类物种的监测；2)本发明通过基于高斯混合模型的特定鸟类物种潜在鸣声事件检测，结合基于候选声音事件的能量、鸣叫时长、频率分布的后处理过程，能够完成稳健检测与自动分段；3)本发明通过采用基于麦克风阵列的自适应信号降噪增强处理，提高了声音事件的信噪比，进而提高特定物种辨识准确率；4)本发明的方法实现过程便捷，易于实施。Compared with the prior art, the present invention has the following significant advantages: 1) In the present invention, a multi-dimensional stereo microphone array is used to collect continuous bird sound monitoring data. Monitoring of bird species; 2) The present invention can complete robust detection and detection by detecting potential song events of specific bird species based on a Gaussian mixture model, combined with a post-processing process based on the energy, song duration, and frequency distribution of candidate sound events. Automatic segmentation; 3) The present invention improves the signal-to-noise ratio of sound events by adopting the adaptive signal noise reduction enhancement processing based on the microphone array, thereby improving the identification accuracy rate of specific species; 4) The method of the present invention is convenient and easy to implement. implement.

下面结合附图对本发明作进一步详细的描述。The present invention will be described in further detail below with reference to the accompanying drawings.

附图说明Description of drawings

图1为本发明面向特定鸟类物种的声学自动检测方法的流程图。FIG. 1 is a flow chart of an acoustic automatic detection method for a specific bird species according to the present invention.

图2为本发明特定鸟类物种潜在鸣声事件检测的流程图。FIG. 2 is a flow chart of the detection of potential song events for a specific bird species according to the present invention.

图3为P元立体麦克风阵列自适应时延估计原理图。FIG. 3 is a schematic diagram of the adaptive delay estimation of the P-element stereo microphone array.

图4为P元立体麦克风阵列广义旁瓣对消器结构框图。FIG. 4 is a structural block diagram of a generalized sidelobe canceller of a P-element stereo microphone array.

图5为4元立体麦克风阵列自适应时延估计原理图。FIG. 5 is a schematic diagram of adaptive delay estimation of a 4-element stereo microphone array.

图6为4元立体麦克风阵列广义旁瓣对消器结构框图。FIG. 6 is a structural block diagram of a generalized sidelobe canceller of a 4-element stereo microphone array.

具体实施方式Detailed ways

结合图1，本发明的面向特定鸟类物种的声学自动检测方法，包括以下步骤：1, the acoustic automatic detection method for specific bird species of the present invention includes the following steps:

步骤1、采集野外连续鸟声监测数据信号并进行自动分段，之后提取特定鸟类物种潜在鸣声片段。Step 1. Collect field continuous bird sound monitoring data signals and perform automatic segmentation, and then extract potential song fragments of specific bird species.

进一步地，结合图2，步骤1具体为：Further, in conjunction with Fig. 2, step 1 is specifically:

步骤1-1、利用多元立体麦克风阵列采集多通道野外连续鸟声监测数据信号，并对采集的野外连续鸟声监测数据信号进行预加重处理，以补偿高频信号的过大衰减，同时抑制低频风噪声；Step 1-1. Use a multi-dimensional stereo microphone array to collect multi-channel field continuous bird sound monitoring data signals, and perform pre-emphasis processing on the collected field continuous bird sound monitoring data signals to compensate for the excessive attenuation of high-frequency signals and suppress low-frequency signals. wind noise;

步骤1-2、对步骤1-1处理后的连续鸟声监测数据信号进行分帧、加窗及快速傅里叶变换，获得功率谱图；Step 1-2, performing framing, windowing and fast Fourier transform on the continuous bird sound monitoring data signal processed in step 1-1 to obtain a power spectrogram;

步骤1-3、设置频率下限和上限分别为f_L和f_H，确定每一帧的短时对数能量le(l)，所用公式为：Steps 1-3, set the lower and upper frequency limits as f _L and f _H respectively, and determine the short-term logarithmic energy le(l) of each frame. The formula used is:

le(l)＝log₁₀(e(l))le(l)=log ₁₀ (e(l))

其中，in,

式中，l为帧序号，i为频率序号，S(i,l)表示在时频点(i,l)处的短时傅里叶变换结果，N_L和N_H分别表示f_L和f_H对应的频率点序号，e(l)为第l帧的短时能量；In the formula, l is the frame serial number, i is the frequency serial number, S(i,l) represents the short-time Fourier transform result at the time-frequency point (i,l), _{NL and NH represent f L} _and _f respectively The frequency point sequence number corresponding to _H , e(l) is the short-term energy of the lth frame;

步骤1-4、利用含有两个高斯分量的高斯混合模型生成帧对数能量分布，则两个高斯分量分别表示潜在鸣声事件帧集合以及环境噪声帧集合的概率密度函数；Steps 1-4, using a Gaussian mixture model containing two Gaussian components to generate a frame logarithmic energy distribution, then the two Gaussian components respectively represent the probability density function of the potential sound event frame set and the environmental noise frame set;

步骤1-5、针对每一帧，通过对比后验概率判断该帧属于潜在鸣声片段还是环境噪声段，获得若干个潜在鸣声片段，具体为：Steps 1-5: For each frame, determine whether the frame belongs to a potential sound segment or an environmental noise segment by comparing the posterior probability, and obtain several potential sound segments, specifically:

对比每一帧属于潜在鸣声事件帧集合的后验概率与属于环境噪声帧集合的后验概率，如果该帧属于潜在鸣声事件帧集合的后验概率大于属于环境噪声帧集合的后验概率，则该帧归属于某个潜在鸣声片段，且与该帧时间上连续且同样满足上述条件的其它帧也归属于该片段潜在鸣声片段，由此获得若干个潜在鸣声片段，所有潜在鸣声片段构成集合D＝{AE₁,AE₂,…,AE_K}，其中K为潜在鸣声片段个数；Compare the posterior probability of each frame belonging to the set of potential whistling event frames with the posterior probability of belonging to the set of environmental noise frames, if the posterior probability of the frame belonging to the set of potential whistling event frames is greater than the posterior probability of belonging to the set of environmental noise frames , then the frame belongs to a potential sound segment, and other frames that are temporally continuous with the frame and also meet the above conditions also belong to the potential sound segment of the segment, thereby obtaining several potential sound segments, all potential sound segments. The sound segment constitutes a set D={AE ₁ ,AE ₂ ,...,AE _K }, where K is the number of potential sound segments;

步骤1-6、求取步骤1-5获得的每个潜在鸣声片段的对数能量，所用公式为：Steps 1-6. Calculate the logarithmic energy of each potential sound segment obtained in steps 1-5. The formula used is:

并获取其中最大的对数能量ME：and get the largest logarithmic energy ME among them:

针对第k个潜在鸣声片段，若ME-EAE_k≥q，则认为该潜在鸣声片段为生态研究价值较小的过弱片段，将其剔除，其中q为根据实际情况预设的阈值，q的单位为dB；For the k-th potential vocal segment, if ME-EAE _k ≥q, the potential vocal segment is considered to be a weak segment with less ecological research value, and it is eliminated, where q is a preset threshold according to the actual situation, The unit of q is dB;

步骤1-7、基于现有的特定鸟类物种鸣声数据库数据，通过统计分析获取特定鸟类物种鸣声片段鸣叫时长的上下阈值即最长鸣叫时长t_H和最短鸣叫时长t_L，并根据信号采样率f_s将t_H和t_L转化为最大鸣叫长度n_H和最小鸣叫长度n_L:Steps 1-7, based on the existing specific bird species song database data, obtain the upper and lower thresholds of the specific bird species song segment song duration through statistical analysis, that is, the longest song duration _tH and the shortest song duration _tL , and according to The signal sampling rate f _s translates t _H and t _L into a maximum tweet length n _H and a minimum tweet length n _L :

n_H＝f_s×t_H n _H = f _s ×t _H

n_L＝f_s×t_L n _L = f _s ×t _L

针对步骤1-6获得的每个潜在鸣声片段，获取其长度T为：For each potential sound segment obtained in steps 1-6, obtain its length T as:

T＝帧长×潜在鸣声片段中的帧数T = frame length × number of frames in the potential sound segment

将潜在鸣声片段长度T小于n_L和大于n_H的潜在鸣声片段剔除；Eliminate potential sound segments whose length T is less than n _L and greater than n _H ;

步骤1-8、基于现有的特定鸟类物种鸣声数据库数据，通过统计分析获取特定鸟类物种鸣声片段的频率分布范围，针对步骤1-7获得的潜在鸣声片段，将超出频率范围的数据置零。Steps 1-8, based on the existing sound database data of specific bird species, obtain the frequency distribution range of specific bird species song fragments through statistical analysis, and the potential song fragments obtained in steps 1-7 will exceed the frequency range data is set to zero.

进一步地，步骤1-6所述q取20dB。Further, the q in steps 1-6 is taken as 20dB.

步骤2、对步骤1得到的各个潜在鸣声片段进行自适应信号降噪增强处理。Step 2: Perform adaptive signal noise reduction enhancement processing on each potential sound segment obtained in Step 1.

进一步地，步骤2对步骤1获得的各个潜在鸣声片段进行自适应信号降噪增强处理，具体为：Further, step 2 performs adaptive signal noise reduction enhancement processing on each potential sound segment obtained in step 1, specifically:

假设所述多元立体麦克风阵列为P元立体麦克风阵列，对P元立体麦克风阵列通道以一定的顺序依次进行编号为1,2,3….,P；Assuming that the multi-element stereo microphone array is a P-element stereo microphone array, the channels of the P-element stereo microphone array are numbered 1, 2, 3...., P in a certain order;

步骤2-1、结合图3，对每个潜在鸣声片段采用自适应滤波方法进行声源方向估计，具体为：Step 2-1. Combined with Figure 3, the adaptive filtering method is used to estimate the sound source direction for each potential sound segment, specifically:

步骤2-1-1、针对其中一个潜在鸣声片段的P通道信号数据，假设其P个通道信号分别为m₁(n)、m₂(n)、m₃(n)、…、m_P(n)，n＝1,2,3,...,L_m，L_m为信号长度，以通道1的信号为参考信号，构造通道2信号的快拍x_k：Step 2-1-1. For the P channel signal data of one of the potential sound segments, it is assumed that the P channel signals are m ₁ (n), m ₂ (n), m ₃ (n), ..., m _P (n), n=1,2,3,...,L _m , L _m is the signal length, and the signal of channel 1 is used as the reference signal to construct the snapshot x _k of the signal of channel 2:

TT

x_k＝[m₂(k),m₂(k+1),...,m₂(k+L-1)]；x _k =[m ₂ (k),m ₂ (k+1),...,m ₂ (k+L-1)];

式中，下标k＝1,2,...,L_m-L+1表示第k个快拍，L表示滤波器长度，上标T表示转置；In the formula, the subscript k=1,2,...,L _m -L+1 represents the k-th snapshot, L represents the filter length, and the superscript T represents the transposition;

步骤2-1-2、求取自相关矩阵R_xx，所用公式为：Step 2-1-2, obtain the autocorrelation matrix R _xx , the formula used is:

式中，K＝L_m-L+1为快拍数量；In the formula, K=L _m -L+1 is the number of snapshots;

步骤2-1-3、求取互相关矩阵r_xd，所用公式为：Step 2-1-3, to obtain the cross-correlation matrix r _xd , the formula used is:

式中，为滤波器中心点；In the formula, is the filter center point;

步骤2-1-4、求取权矢量w₁，所用公式为：Step 2-1-4, to obtain the weight vector w ₁ , the formula used is:

w₁＝R_xx ^-1r_xd；w ₁ =R _xx ⁻¹ r _xd ;

步骤2-1-5、对步骤2-1-4获得的权矢量w₁进行峰值检测，记峰值的横坐标为z，则通道1信号与通道2信号的时延点数d₁＝z-D；Step 2-1-5, perform peak detection on the weight vector w ₁ obtained in step 2-1-4, and mark the abscissa of the peak value as z, then the number of delay points between the channel 1 signal and the channel 2 signal d ₁ =zD;

步骤2-1-6、重复步骤2-1-1至步骤2-1-5，获得通道1信号与第c个通道信号的时延点数d_c，c＝1,2,...,P-1；Step 2-1-6, repeat steps 2-1-1 to 2-1-5 to obtain the delay points d _c of the channel 1 signal and the c-th channel signal, c=1,2,...,P -1;

步骤2-2、结合图4，采用广义旁瓣对消器对潜在鸣声片段进行自适应增强，具体为：Step 2-2, with reference to Fig. 4, use a generalized sidelobe canceller to adaptively enhance the potential sound segment, specifically:

步骤2-2-1、求主通道信号d(k)：Step 2-2-1. Find the main channel signal d(k):

d(k)＝w_c ^Tm(k)；d(k)=w _c ^T m(k);

式中，w_c＝[w_c1,w_c2,..,w_cP]^T为静态权矢量，w_c1、w_c2、..、w_cP为各个通道对应的权值且w_c1+w_c2+...+w_cP＝1；m(k)＝[m₁(k),m₂(k-d₁),...,m_P(k-d_P-1)]^T，k＝1,2,...,L_m；In the formula, w _c =[w _c1 ,w _c2 ,..,w _cP ] ^T is the static weight vector, w _c1 , w _c2 , .., w _cP are the corresponding weights of each channel and w _c1 +w _c2 + ...+w _cP =1; m(k)=[m ₁ (k),m ₂ (kd ₁ ),...,m _P (kd _P-1 )] ^T , k=1,2,. ..,L _m ;

步骤2-2-2、求辅助通道信号e(k)：Step 2-2-2, find the auxiliary channel signal e(k):

其中，W_S为维数P×(P-1)的阻塞矩阵；Among them, W _S is the blocking matrix of dimension P×(P-1);

步骤2-2-3、求增强后纯净的潜在鸣声片段信号y(k)：Step 2-2-3. Find the pure potential sound segment signal y(k) after enhancement:

y(k)＝d(k)-v^T(k)e(k)；y(k)=d(k)-v ^T (k)e(k);

式中，v(k)表示自适应干扰对消器的动态权矢量。where v(k) represents the dynamic weight vector of the adaptive interference canceler.

假设所述多元立体麦克风阵列为P＝4元立体麦克风阵列；对4元立体麦克风阵列以一定的顺序依次进行编号为1,2,3,4；Assuming that the multi-element stereo microphone array is P=4-element stereo microphone array; the 4-element stereo microphone array is numbered 1, 2, 3, 4 in a certain order;

步骤2-1、结合图5，对每个潜在鸣声片段采用自适应滤波方法进行声源方向估计，具体为：Step 2-1, with reference to Fig. 5, adopt the adaptive filtering method to estimate the sound source direction for each potential sound segment, specifically:

步骤2-1-1、针对其中一个潜在鸣声片段的4通道信号数据，假设其4通道信号分别为m₁(n)、m₂(n)、m₃(n)、m₄(n)，n＝1,2,3,...,L_m，L_m为信号长度，以通道4信号为参考信号，构造通道1信号的快拍x_k：Step 2-1-1. For the 4-channel signal data of one of the potential sound segments, it is assumed that the 4-channel signals are m ₁ (n), m ₂ (n), m ₃ (n), and m ₄ (n) respectively. , n=1,2,3,...,L _m , L _m is the signal length, and the channel 4 signal is used as the reference signal to construct the snapshot x _k of the channel 1 signal:

x_k＝[m₁(k),m₁(k+1),...,m₁(k+L-1)]^T；x _k =[m ₁ (k),m ₁ (k+1),...,m ₁ (k+L-1)] ^T ;

式中，为滤波器中心点；In the formula, is the filter center point;

w₁＝R_xx ^-1r_xd；w ₁ =R _xx ⁻¹ r _xd ;

步骤2-1-5、对步骤2-1-4获得的权矢量w₁进行峰值检测，记峰值的横坐标为z，则通道4信号与通道1信号的时延点数d₁＝z-D；Step 2-1-5, perform peak detection on the weight vector w ₁ obtained in step 2-1-4, and mark the abscissa of the peak value as z, then the number of delay points between the channel 4 signal and the channel 1 signal d ₁ =zD;

步骤2-1-6、重复步骤2-1-1至步骤2-1-5，获得通道4信号与通道2信号、通道3信号的时延点数，分别为d₂、d₃；Step 2-1-6, repeating steps 2-1-1 to 2-1-5, to obtain the delay points of the channel 4 signal, the channel 2 signal, and the channel 3 signal, which are respectively d ₂ and d ₃ ;

步骤2-2、结合图6，采用广义旁瓣对消器对潜在鸣声片段进行自适应增强，具体为：Step 2-2, with reference to Fig. 6, use a generalized sidelobe canceller to adaptively enhance the potential sound segment, specifically:

d(k)＝w_c ^Tm(k)；d(k)=w _c ^T m(k);

式中，w_c＝[w_c4,w_c1,w_c2,w_c3]^T为静态权矢量，w_c1、...、w_c4为各通道对应的权值且w_c1+w_c2+w_c3+w_c4＝1，m(k)＝[m₄(k),m₁(k-d₁),m₂(k-d₂),m₃(k-d₃)]^T，k＝1,2,...,L_m；In the formula, w _c =[w _c4 ,w _c1 ,w _c2 ,w _c3 ] ^T is the static weight vector, w _c1 ,...,w _c4 are the corresponding weights of each channel and w _c1 +w _c2 +w _c3 +w _c4 =1, m(k)=[m ₄ (k), m ₁ (kd ₁ ), m ₂ (kd ₂ ), m ₃ (kd ₃ )] ^T , k=1,2,... , L _m ;

其中，W_S为维数4×3的阻塞矩阵；Among them, W _S is the blocking matrix of dimension 4 × 3;

y(k)＝d(k)-v^T(k)e(k)；y(k)=d(k)-v ^T (k)e(k);

式中，v(k)表示自适应干扰对消器的权矢量。where v(k) represents the weight vector of the adaptive interference canceler.

步骤3、对步骤2降噪增强后的每个潜在鸣声片段提取特征参数，构建潜在鸣声片段特征集。Step 3: Extract feature parameters for each potential sound segment after noise reduction and enhancement in step 2, and construct a feature set of potential sound segments.

进一步地，步骤3具体为：Further, step 3 is specifically:

步骤3-1、根据步骤1-2计算降噪增强后每个潜在鸣声片段的功率谱图；Step 3-1, according to step 1-2, calculate the power spectrum of each potential sound segment after noise reduction enhancement;

步骤3-2、在特定鸟类物种鸣声片段的频率分布范围内设置Mel带通滤波器组，然后将潜在鸣声片段功率谱通过该滤波器组，求各个滤波器输出；Step 3-2, set up a Mel bandpass filter group within the frequency distribution range of the vocal segment of a specific bird species, and then pass the power spectrum of the potential vocal segment through the filter group to obtain the output of each filter;

步骤3-3、对输出结果取对数，并做离散余弦变换，获得梅尔倒谱系数特征参数；Step 3-3, take the logarithm of the output result, and perform discrete cosine transform to obtain the characteristic parameters of Mel cepstral coefficients;

步骤3-4、将所有潜在鸣声片段对应的梅尔倒谱系数特征参数组合构建获得潜在鸣声片段特征集。Step 3-4, combining the characteristic parameters of Mel cepstral coefficients corresponding to all potential singing segments to construct a feature set of potential singing segments.

进一步地，步骤4中识别算法具体采用支持向量机识别算法。Further, the identification algorithm in step 4 specifically adopts the support vector machine identification algorithm.

进一步地，步骤4具体为：Further, step 4 is specifically:

以已有的特定鸟类物种鸣声特征数据库中的梅尔倒谱系数特征作为训练样本，以潜在鸣声片段特征集作为支持向量机的输入样本，通过支持向量机的决策，自动检测出特定鸟类物种。Taking the Mel cepstral coefficient feature in the existing vocal feature database of a specific bird species as the training sample, and using the feature set of potential vocal segments as the input sample of the support vector machine, through the decision of the support vector machine, it can automatically detect the specific bird species.

综上所述本发明的一种面向特定鸟类物种的声学自动检测方法，其采用GMM的特定鸟类物种潜在鸣声事件检测结合基于候选声音事件的能量、帧长、频率分布的后处理过程，完成稳健检测与自动分段。此外，本发明通过采用基于麦克风阵的自适应降噪增强处理能够明显改善声音事件的信噪比从而提高特定物种辨识准确度，实现野外自然环境下特定鸟类物种的声学检测，对于野外珍稀鸟类物种的生态保护及相关生态学研究具有重要意义。To sum up, an acoustic automatic detection method for a specific bird species of the present invention adopts the GMM detection of potential song events of a specific bird species combined with a post-processing process based on the energy, frame length, and frequency distribution of candidate sound events. , complete robust detection and automatic segmentation. In addition, the present invention can significantly improve the signal-to-noise ratio of sound events by adopting the adaptive noise reduction enhancement processing based on the microphone array, thereby improving the identification accuracy of specific species, and realizing the acoustic detection of specific bird species in the wild natural environment. The ecological protection of species and related ecological research are of great significance.

Claims

1. an acoustic automatic detection method for specific bird species, is characterized in that, comprises the following steps:

Step 1. Collect field continuous bird sound monitoring data signals and perform automatic segmentation, and then extract potential song fragments of specific bird species;

Step 2, performing adaptive signal noise reduction enhancement processing on each potential sound segment obtained in step 1;

Step 3, extracting feature parameters for each potential sound segment after noise reduction and enhancement in step 2, and constructing a feature set of potential sound segments;

Step 4. Acoustic detection of a specific bird species is completed by combining the feature set of potential song segments obtained in step 3 and the recognition algorithm in machine learning.

2 . The acoustic automatic detection method for specific bird species according to claim 1 , characterized in that, according to step 1, the continuous bird sound monitoring data signal in the field is collected and automatically segmented, and then the potential song of the specific bird species is extracted. 3 . sound clip, which includes the following steps:

Step 1-1. Use a multi-dimensional stereo microphone array to collect multi-channel continuous bird sound monitoring data signals in the field, and perform pre-emphasis processing on the collected field continuous bird sound monitoring data signals;

Step 1-2, performing framing, windowing and fast Fourier transform on the continuous bird sound monitoring data signal processed in step 1-1 to obtain a power spectrogram;

Steps 1-3, set the lower and upper frequency limits as f _L and f _H respectively, and determine the short-term logarithmic energy le(l) of each frame. The formula used is:

le(l)=log ₁₀ (e(l))

in,

In the formula, l is the frame serial number, i is the frequency serial number, S(i,l) represents the short-time Fourier transform result at the time-frequency point (i,l), _{NL and NH represent f L} _and _f respectively The frequency point sequence number corresponding to _H , e(l) is the short-term energy of the lth frame;

Steps 1-4, using a Gaussian mixture model containing two Gaussian components to generate a frame logarithmic energy distribution, then the two Gaussian components respectively represent the probability density function of the potential sound event frame set and the environmental noise frame set;

Steps 1-5: For each frame, determine whether the frame belongs to a potential sound segment or an environmental noise segment by comparing the posterior probability, and obtain several potential sound segments, specifically:

Compare the posterior probability of each frame belonging to the set of potential whistling event frames with the posterior probability of belonging to the set of environmental noise frames, if the posterior probability of the frame belonging to the set of potential whistling event frames is greater than the posterior probability of belonging to the set of environmental noise frames , then the frame belongs to a potential sound segment, and other frames that are temporally continuous with the frame and also meet the above conditions also belong to the potential sound segment of the segment, thereby obtaining several potential sound segments, all potential sound segments. The sound segment constitutes a set D={AE ₁ ,AE ₂ ,...,AE _K }, where K is the number of potential sound segments;

Steps 1-6. Calculate the logarithmic energy of each potential sound segment obtained in steps 1-5. The formula used is:

and get the largest logarithmic energy ME among them:

For the k-th potential sound segment, if ME-EAE _k ≥ q, the potential sound segment is eliminated, where q is a preset threshold according to the actual situation, and the unit of q is dB;

Steps 1-7, based on the existing specific bird species song database data, obtain the upper and lower thresholds of the specific bird species song segment song duration through statistical analysis, that is, the longest song duration _tH and the shortest song duration _tL , and according to The signal sampling rate f _s translates t _H and t _L into a maximum tweet length n _H and a minimum tweet length n _L :

n _H = f _s ×t _H

n _L = f _s ×t _L

For each potential sound segment obtained in steps 1-6, obtain its length T as:

T = frame length × number of frames in the potential sound segment

Eliminate potential sound segments whose length T is less than n _L and greater than n _H ;

Steps 1-8, based on the existing sound database data of specific bird species, obtain the frequency distribution range of specific bird species song fragments through statistical analysis, and the potential song fragments obtained in steps 1-7 will exceed the frequency range data is set to zero.

3 . The acoustic automatic detection method for a specific bird species according to claim 2 , wherein the q in steps 1-6 is set to be 20 dB. 4 .

4. The acoustic automatic detection method for a specific bird species according to claim 1 or 2, wherein the step 2 performs adaptive signal noise reduction enhancement processing on each potential song segment obtained in step 1, specifically Include the following steps:

Assuming that the multi-element stereo microphone array is a P-element stereo microphone array, the channels of the P-element stereo microphone array are numbered 1, 2, 3...., P in a certain order;

Step 2-1. Use the adaptive filtering method to estimate the sound source direction for each potential singing segment, specifically:

Step 2-1-1. For the P channel signal data of one of the potential sound segments, it is assumed that the P channel signals are m ₁ (n), m ₂ (n), m ₃ (n), ..., m _P (n), n=1,2,3,...,L _m , L _m is the signal length, and the signal of channel 1 is used as the reference signal to construct the snapshot x _k of the signal of channel 2:

x _k =[m ₂ (k),m ₂ (k+1),...,m ₂ (k+L-1)] ^T ;

In the formula, the subscript k=1,2,...,L _m -L+1 represents the k-th snapshot, L represents the filter length, and the superscript T represents the transposition;

Step 2-1-2, obtain the autocorrelation matrix R _xx , the formula used is:

In the formula, K=L _m -L+1 is the number of snapshots;

Step 2-1-3, to obtain the cross-correlation matrix r _xd , the formula used is:

In the formula, is the filter center point;

Step 2-1-4, to obtain the weight vector w ₁ , the formula used is:

w ₁ =R _xx ⁻¹ r _xd ;

Step 2-1-5, perform peak detection on the weight vector w ₁ obtained in step 2-1-4, and mark the abscissa of the peak value as z, then the number of delay points between the channel 1 signal and the channel 2 signal d ₁ =zD;

Step 2-1-6, repeat steps 2-1-1 to 2-1-5 to obtain the delay points d _c of the channel 1 signal and the c-th channel signal, c=1,2,...,P -1;

Step 2-2, using a generalized sidelobe canceller to adaptively enhance the potential sound segment, specifically:

Step 2-2-1. Find the main channel signal d(k):

d(k)=w _c ^T m(k);

In the formula, w _c =[w _c1 ,w _c2 ,..,w _cP ] ^T is the static weight vector, w _c1 , w _c2 , .., w _cP are the corresponding weights of each channel and w _c1 +w _c2 + ...+w _cP =1; m(k)=[m ₁ (k),m ₂ (kd ₁ ),...,m _P (kd _P-1 )] ^T , k=1,2,. ..,L _m ;

Step 2-2-2, find the auxiliary channel signal e(k):

Among them, W _S is the blocking matrix of dimension P×(P-1);

Step 2-2-3. Find the pure potential sound segment signal y(k) after enhancement:

y(k)=d(k)-v ^T (k)e(k);

where v(k) represents the dynamic weight vector of the adaptive interference canceler.

5 . The automatic acoustic detection method for specific bird species according to claim 4 , wherein the step 2 performs adaptive signal noise reduction enhancement processing on each potential singing segment obtained in step 1, which specifically includes the following: 6 . step:

Assuming that the multi-element stereo microphone array is P=4-element stereo microphone array; the 4-element stereo microphone array is numbered 1, 2, 3, 4 in a certain order;

Step 2-1-1. For the 4-channel signal data of one of the potential sound segments, it is assumed that the 4-channel signals are m ₁ (n), m ₂ (n), m ₃ (n), and m ₄ (n) respectively. , n=1,2,3,...,L _m , L _m is the signal length, and the channel 4 signal is used as the reference signal to construct the snapshot x _k of the channel 1 signal:

x _k =[m ₁ (k),m ₁ (k+1),...,m ₁ (k+L-1)] ^T ;

Step 2-1-2, obtain the autocorrelation matrix R _xx , the formula used is:

In the formula, K=L _m -L+1 is the number of snapshots;

Step 2-1-3, to obtain the cross-correlation matrix r _xd , the formula used is:

In the formula, is the filter center point;

Step 2-1-4, to obtain the weight vector w ₁ , the formula used is:

w ₁ =R _xx ⁻¹ r _xd ;

Step 2-1-5, perform peak detection on the weight vector w ₁ obtained in step 2-1-4, and mark the abscissa of the peak value as z, then the number of delay points between the channel 4 signal and the channel 1 signal d ₁ =zD;

Step 2-1-6, repeating steps 2-1-1 to 2-1-5, to obtain the delay points of the channel 4 signal, the channel 2 signal, and the channel 3 signal, which are respectively d ₂ and d ₃ ;

Step 2-2-1. Find the main channel signal d(k):

d(k)=w _c ^T m(k);

In the formula, w _c =[w _c4 ,w _c1 ,w _c2 ,w _c3 ] ^T is the static weight vector, w _c1 ,...,w _c4 are the corresponding weights of each channel and w _c1 +w _c2 +w _c3 +w _c4 =1, m(k)=[m ₄ (k), m ₁ (kd ₁ ), m ₂ (kd ₂ ), m ₃ (kd ₃ )] ^T , k=1,2,... , L _m ;

Step 2-2-2, find the auxiliary channel signal e(k):

Among them, W _S is the blocking matrix of dimension 4 × 3;

y(k)=d(k)-v ^T (k)e(k);

where v(k) represents the weight vector of the adaptive interference canceler.

6. The acoustic automatic detection method for a specific bird species according to claim 4 or 5, characterized in that, in step 3, feature parameters are extracted from the potential song segment after noise reduction and enhancement in step 2, and a feature of the potential song segment is constructed. set, including the following steps:

Step 3-1, according to step 1-2, calculate the power spectrum of each potential sound segment after noise reduction enhancement;

Step 3-2, set up a Mel bandpass filter group within the frequency distribution range of the vocal segment of a specific bird species, and then pass the power spectrum of the potential vocal segment through the filter group to obtain the output of each filter;

Step 3-3, take the logarithm of the output result, and perform discrete cosine transform to obtain the characteristic parameters of Mel cepstral coefficients;

Step 3-4, combining the characteristic parameters of Mel cepstral coefficients corresponding to all potential singing segments to construct a feature set of potential singing segments.

7 . The acoustic automatic detection method for a specific bird species according to claim 6 , wherein the identification algorithm in step 4 specifically adopts a support vector machine identification algorithm. 8 .

8 . The acoustic automatic detection method for specific bird species according to claim 6 , wherein the specific bird is completed by combining the feature set of potential song segments obtained in step 3 and the identification algorithm in machine learning according to step 4 . Acoustic detection of species, specifically:

Taking the Mel cepstral coefficient feature in the existing specific bird species song feature database as the training sample, and the potential song segment feature set as the input sample of the support vector machine, through the decision of the support vector machine, the specific bird species.