CN107369451B

CN107369451B - Bird voice recognition method for assisting phenological study of bird breeding period

Info

Publication number: CN107369451B
Application number: CN201710583313.8A
Authority: CN
Inventors: 刘丰; 李晟; 申小莉
Original assignee: BEIJING COMPUTING CENTER
Current assignee: BEIJING COMPUTING CENTER
Priority date: 2017-07-18
Filing date: 2017-07-18
Publication date: 2020-12-22
Anticipated expiration: 2037-07-18
Also published as: CN107369451A

Abstract

A bird voice recognition method for assisting the phenological study of the breeding season of birds is characterized in that field recording segments are read, the sounds comprise a plurality of segments of bird singing, then a recognition algorithm can recognize the types of birds to which the segments of the singing in the recording belong, a recognition reliability is given, the actual recording date of the segment of sounds is recorded, finally the number of the birds which singing in all the recordings in the region, namely the number of the birds entering the breeding season, is calculated according to the recognition result of the algorithm, after a certain time, the number exceeds a preset threshold value, the birds in the region can be considered to enter the breeding season from the moment, and otherwise, after the number is reduced to exceed the threshold value, the birds can be considered to end the breeding season.

Description

Bird voice recognition method for assisting phenological study of bird breeding period

Technical Field

The invention relates to the technical field of bird voice recognition, in particular to a bird voice recognition method for assisting the phenological study of bird breeding season.

Background

Biologically, birds are classified into whiting (bird call) and singing (bird song). Among them, bird song (bird song) refers to a song made by birds in the breeding season. The sound pattern of the same bird is very constant. The sound sounds of different birds often differ greatly. The whine of birds can thus be used as a means of identifying the species of birds.

The phenological study is a subject for studying the relationship between animals and the periodic changes of the environment. One branch is to study the relationship between the reproductive stage of birds and the cyclic changes in the environment. While the reproductive stage of birds can be obtained by recognizing the sound of birds. Therefore, the research on the phenological condition of the breeding period of the birds can be assisted by the sound recognition of the birds.

Disclosure of Invention

The invention aims to provide a bird voice recognition method for assisting the phenological study of the breeding period of birds.

In order to solve the technical problems, the following technical scheme is adopted: a bird voice recognition method for assisting the phenological study of the breeding season of birds is characterized in that field recording segments are read, a plurality of bird song segments are contained in the sounds, then a recognition algorithm can recognize the types of birds to which the song segments belong in the recording, a recognition reliability is given, the occurrence time of the song segments in the recording segments is recorded, finally the number of the birds which send out the songs, namely the number of the birds entering the breeding season, can be calculated according to the recording time, and after the number exceeds a preset threshold after a certain time, the birds in the region can be considered to enter the breeding season from the beginning, and otherwise, after the number is reduced and exceeds the threshold, the birds can be considered to end the breeding season.

The specific steps of the recognition algorithm are as follows: 1) adopting semi-supervised non-negative matrix decomposition for source separation, 2) passing the signal through a low-pass filter and then performing frequency compensation; 3) dividing the sound: finding a conversion point from blank to call by using the short-time energy, firstly calculating the short-time energy of the recording:

then finding out sound segments according to a threshold value; 4) feature extraction: firstly, adding overlapped windows to a sound fragment, wherein each window becomes a frame, extracting time domain characteristics and frequency domain characteristics aiming at values in each window, most of the frequency domain characteristics are based on short-time Fourier transform (STFT), and then synthesizing the time domain characteristics and the frequency domain characteristics into a vector to be used as a characteristic vector of the frame; 5) dimension reduction and noise reduction: using PCA as a means of dimensionality reduction; 6) a mathematical model is established for each bird song by adopting a hidden Markov chain, firstly, a segmented k means is adopted for model initialization, then a forward-backward algorithm (forward-backward algorithm) is used for training an HMM, after the HMM model is established, a new recording needing to be processed is subjected to source separation, pretreatment, segmentation, feature extraction and PCA, and then the obtained feature sequence is compared with each trained HMM. Namely decoding by using a Viterbi Algorithm (Viterbi Algorithm) to obtain the reliability. And selecting the model with the highest credibility as the recognition result.

Drawings

FIG. 1 is a schematic diagram of a technical circuit of the present invention

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

Firstly, reading a field recording segment, wherein the sound comprises a plurality of segments of bird sounds, then identifying the type of birds to which the sound-calling segment belongs in the recording by an identification algorithm, giving an identification credibility, recording the occurrence time of the sound-calling segment in the recording segment, and finally calculating the number of the birds which send the sounds, namely the number of the birds entering the breeding period by combining the recording time.

The identification algorithm comprises the following specific steps:

1)semi-supervised NMF

Semi-super-NMF: for source separation (source separation). By source separation is meant that the sound recorded by the recorder is a mixture of multiple sounds, sometimes overlapping. Source separation is a technique used to separate different sounds.

The full name of NMF is non-negative matrix factorization, i.e., non-negative matrix factorization. The method is the best method for separating the source. It can decompose the sound into different base (base) weighted forms. A set of bases and corresponding weights may be obtained as a result of source separation.

Semi-hypervided NMF refers to training with some known data of a specific class to obtain a base corresponding to the class, and then applying an NMF algorithm to the data to be processed by using the base and another initial vector. The bases and weights of the known classes trained in advance are used to obtain separate results, which are subsequently processed.

The use of Semi-supervisedNMF can achieve a good separation effect and, in addition, can effectively suppress noise. This method may be better than other noise reduction methods in some environments. Since conventional noise reduction means require knowledge of the nature of the noise. But the conditions under which the noise is generated are very uncertain. The nature of the noise cannot be accurately described in advance. Thus, the traditional noise reduction method has poor effect. But semi-supervisedNMF based methods may not know the nature of the noise in advance. Therefore, the noise reduction effect of the semi-superimposed NMF-based method is better.

2) Pretreatment of

The pretreatment mainly does two parts of work. The signal is first passed through a low pass filter. And then frequency compensation is performed.

3) Segmentation

Recordings are long and contain blanks and beeps. It is therefore necessary to remove the blank part first, leaving only the part with the call. Therefore, the sound needs to be segmented (segmentation). Short-term energy is used to find the transition point (end point) of the blank to the call.

Firstly, the short-time energy of the recorded sound is calculated, and then the sound fragment is found according to the threshold value.

4) Feature extraction

For each of the obtained calls, their features need to be extracted. The sound segment is first added with overlapping windows, each referred to as a frame, and time-domain features and frequency-domain features are extracted for the values within each window. Most frequency domain features are based on a Short Time Fourier Transform (STFT). The time domain feature and the frequency domain feature are then combined into a vector as the feature vector for the frame.

The time domain characteristics are: zero crossing rate, Short timenergy, entry of energy

The frequency domain features are: MFCC, Spectral centroid, Spectral spread, Spectral entry, Spectral flux, Spectral roll

5)PCA

Because the obtained feature vector has a high dimension, the direct operation has a very large operation amount and has some noises. It is therefore desirable to perform dimensionality reduction on the data, where PCA is used as a means of dimensionality reduction.

PCA is known as principal component analysis, principal component analysis. PCA is an effective data dimension reduction means, and can reduce data dimension and reduce computation amount. And much noise can be reduced. Thereby improving system performance.

6)HMM

The full name of the HMM is a Hidden Markov chain (Hidden Markov Model). Is a well-known mathematical model for time series modeling. Compared with other methods, the HMM is higher in recognition efficiency and robustness.

An HMM was established for each bird's chirping. The model is initialized first with segment k means and then the HMM is trained using forward-backward algorithm (forward-backward algorithm).

After training is finished, for new feature vectors needing to be identified after PCA processing, decoding each feature vector by using a Viterbi algorithm (viterbi algorithm). The viterbi algorithm will obtain a probability, and several HMMs with the highest probability may be selected as the result according to the requirement.

The HMM outputs the type and credibility of the bird.

The above-described embodiments are merely illustrative of the principles and effects of the present invention, and some embodiments may be applied, and it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the inventive concept of the present invention, and these embodiments are within the scope of the present invention.

Claims

1. A bird voice recognition method for assisting the phenological study of bird breeding season is characterized in that field recording segments are read, the voice comprises a plurality of bird singing segments, then a recognition algorithm is used for recognizing the types of birds to which the singing segments belong in the recording, a recognition reliability is given, the actual recording date of the segment of voice is recorded, finally the number of the birds which send singing in all recordings in the region, namely the number of the birds entering the breeding season, is calculated according to the recognition result of the algorithm, after a certain time, the number exceeds a preset threshold value, the birds in the region are considered to enter the breeding season from the moment, otherwise, after the number is reduced to exceed the threshold value, the birds are considered to end the breeding season, and the specific steps of the recognition algorithm compriseComprises the following steps: 1) adopting semi-supervised non-negative matrix decomposition for source separation, 2) passing the signal through a low-pass filter and then performing frequency compensation; 3) dividing the sound: finding a conversion point from blank to call by using the short-time energy, firstly calculating the short-time energy of the recording:

then finding out sound segments according to a threshold value; 4) feature extraction: firstly, adding overlapped windows to a sound fragment, wherein each window becomes a frame, extracting time domain characteristics and frequency domain characteristics aiming at values in each window, most of the frequency domain characteristics are based on short-time Fourier transform (STFT), and then synthesizing the time domain characteristics and the frequency domain characteristics into a vector to be used as a characteristic vector of the frame; 5) dimension reduction and noise reduction: using PCA as a means of dimensionality reduction; 6) a mathematical model is established for each bird song by adopting a hidden Markov chain, firstly, a segmented k means is adopted for model initialization, then a forward-backward Algorithm (forward-backward Algorithm) is used for training an HMM, after the HMM model is established, a new recording needing to be processed is subjected to source separation, pretreatment, segmentation, feature extraction and PCA, then an obtained feature sequence is compared with each trained HMM, namely, decoding is carried out by adopting a Viterbi Algorithm (Viterbi Algorithm), so that the credibility is obtained, and the model with the highest credibility is selected as a recognition result.