CN109886214A

CN109886214A - A kind of chirm characteristic strengthening method based on image procossing

Info

Publication number: CN109886214A
Application number: CN201910139801.9A
Authority: CN
Inventors: 杨春勇; 祁宏达; 侯金; 陈少平
Original assignee: South Central University for Nationalities
Current assignee: South Central Minzu University
Priority date: 2019-02-26
Filing date: 2019-02-26
Publication date: 2019-06-14

Abstract

The chirm characteristic strengthening method based on image procossing that the invention discloses a kind of, is related to image procossing, deep learning and birds identification technology.This method is: 1. pre-processing to chirm data set, including resampling and normalization, obtains the spectrogram of chirm；2. carrying out SNR estimation and compensation to spectrogram, signal spectrogram and noise spectrum figure are obtained；3. carrying out piecemeal to all spectrograms；4. carrying out data enhancing to the signal spectrogram after piecemeal: 5. being mapped by Jet and carry out Pseudo Col ored Image to the enhanced gray level image of data, obtain triple channel RGB color image；6. obtaining recognition result by transfer learning mode.The beneficial effects of the present invention are: 1. for efficiently carrying out birds automatic identification；2. passing through the biological spectrum information that the chirping of birds acoustical signal of one-dimensional time domain is converted to two-dimentional time-frequency domain；It is that deep learning is further in the exploration of birds automatic identification field 3. image procossing includes: chirm SNR estimation and compensation, specific data enhancing and the enhancing of visual perception power.

Description

A kind of chirm characteristic strengthening method based on image procossing

Technical field

The present invention relates to image procossing, deep learning and birds identification technology fields, are based on image more particularly to one kind The chirm characteristic strengthening method of processing.

Background technique

Bio-diversity is condition for the survival of mankind, is the strategic resource of social sustainable development, is ecological safety With the important leverage of grain security.Due to having a very wide distribution, complete and to environment the sensibility of research data, birds are that biology is more The important instruction monoid of sample.Birds population status and its dynamic-change information are grasped, is all had to protecting and assessing the ecosystem It is significant.Birdvocalization is varied, contains important behavior meaning, species specificity and biology abundant letter Breath, while being also the main means that species identification is carried out to it.The significance for taxonomy of chirm, have become at present bird sound research with The hot spot that the research of birds systematics intersects.

The important component that biological spectrum is analyzed as Acoustic landscape ecological theory is monitoring, research and analysis ecology The important channel of system Biodiversity.The time-frequency characteristic that can sufficiently show sound is analyzed it using method appropriate, from And efficiently identify out the otherness of biology.Important channel of the chirping of birds sonograph as the species attribute of research bird, frequency spectrum point Analysis monitors Avian diversity most important.

Deep learning is the most flourishing branch of current machine learning discipline development, and is applied in entire artificial intelligence field Prospect technology the most wide.As deep learning is in the continuous breakthrough of computer vision field, image recognition is handled with it Task has become a kind of efficient and professional technology.Under the support of the technical background, present invention combination chirping of birds sound characteristics, to bird Song spectrogram is targetedly handled.

Summary of the invention

The present invention is intended to provide a kind of chirm characteristic strengthening method based on image procossing, is being known with solving the prior art Song characteristics during not do not protrude, noise jamming is serious and the incomplete problem of biological spectrum information.

To achieve the above object, the technical solution of the present invention is as follows:

Specifically, a kind of chirm characteristic strengthening method based on image procossing the following steps are included:

1. pre-processing to chirm data set, including resampling and normalization, the spectrogram of chirm is obtained；

2. carrying out SNR estimation and compensation to spectrogram, signal spectrogram and noise spectrum figure are obtained；Wherein, signal spectrogram includes and sings and pipe Part, noise spectrum figure include noise and mute part, and signal spectrogram is used for ambient noise as original training sample, noise spectrum figure A kind of approach of enhancing；

3. carrying out piecemeal to all spectrograms after step 2. middle SNR estimation and compensation, and the size of every fritter spectrogram is adjusted to be suitble to Input dimension for trained neural network；

4. data enhancing is carried out to the signal spectrogram after step 3. middle piecemeal, since spectrogram is different from traditional images, between Difference limits the direct application of extensive image processing techniques；The characteristic of comprehensive consideration chirm and spectrogram, notebook data increase Strength reason adhoc approach includes frequency-domain transform, noise addition and the mixing of similar sample:

5. for the visual perception power of enhancing gray level image, while conveniently carrying out transfer learning to different neural networks, pass through Jet Mapping carries out Pseudo Col ored Image to the enhanced gray level image of data, triple channel RGB color image is obtained, by these triple channels RGB color image is divided into training set and test set, and wherein training set accounts for 80%, and test set accounts for 20%；

6. suitable neural network model is chosen by transfer learning mode, it is finely adjusted and use step 5. in instruction Practice collection to be trained, collects verifying model accuracy rate eventually by verifying, obtain recognition result.

Compared to the prior art, the beneficial effects of the present invention are:

1. proposing a kind of bird based on image procossing for the species taxonomy problem based on chirm under nature complexity acoustic enviroment Song characteristics intensifying method；Alleviate that ambient noise of the chirm in identification process is high, song characteristics are not prominent, song data The problems such as uneven and biological spectrum information is not comprehensive, for efficiently carrying out birds automatic identification；

2. by the biological spectrum information that the chirping of birds acoustical signal of one-dimensional time domain is converted to two-dimentional time-frequency domain, then flexibly using a system The method of column image procossing highlights the chirping of birds acoustic signature in spectrogram, and enhancing biological spectrum information visuallization is presented, and can be led to The method for crossing deep learning is verified；

3. image procossing includes: chirm SNR estimation and compensation, specific data enhancing and the enhancing of visual perception power；The present invention is different from The image classification method of general visualization tasks is made and being directed to always around each presentation for identifying chirping of birds acoustic signature in link Property strong image procossing strategy, be that deep learning is further in the exploration of birds automatic identification field.

Detailed description of the invention

Fig. 1 is the flow chart of this method.

Specific embodiment

One, method

1, chirm SNR estimation and compensation

2. step carries out SNR estimation and compensation to spectrogram, obtain signal spectrogram and noise spectrum figure；

The separation method of signal spectrogram are as follows: one threshold value N of setting, if some pixel value is higher than corresponding line and its phase in spectrogram N times of the intermediate value that should be arranged then is set to 1, is otherwise set to 0；

The separation method of noise spectrum figure are as follows: one threshold value n (n < N) of setting, if some pixel value is higher than corresponding line and its accordingly N times of the intermediate value of column is then set to 0, is otherwise set to 1.

2, specific data enhances

4. step carries out data enhancing, including the addition of frequency-domain transform, noise and the mixing of similar sample to the signal spectrogram after piecemeal；

A, frequency-domain transform

1. 2. a, the pitch of original chirping of birds sound audio is changed at random, amplitude of fluctuation is no more than 5%, and then repeatedly step ③；

1. 2. b, the volume of original chirping of birds sound audio is changed at random, amplitude of fluctuation is no more than 5%, and then repeatedly step ③；

B, noise adds

A, by random Gaussian be added to step 2. in signal spectrogram, the image that new standardized of laying equal stress on obtains；

B, by step 2. in noise section be added at random step 2. in signal spectrogram, as training sample；

C, similar sample mixing

Signal spectrogram after the different audio SNR estimation and compensations of same bird is mixed at random.

3, visual perception power enhances

For the visual perception power for further enhancing gray level image, while in view of transfer learning is to data dimension before mode input It limits, is mapped by Jet and carry out Pseudo Col ored Image, increase the contrast between varying strength region to improve recognition performance.No Be mapped to three monochrome images of red, green, blue respectively with region, and it is corresponding it is high, in and low-power spectrum information, red indicate most The sound property of high-energy is approximately to sing/pipe characteristic.

According to the image processing method of above-mentioned chirm spectrogram, chooses mixed model SE-ResNeXt-50 and migrated Study can obtain accurate efficient recognition result from a large amount of chirping of birds sonographs.

Two, embodiment

This method is the chirping of birds acoustical signal of one-dimensional time domain to be converted to the biological spectrum information of two-dimentional time-frequency domain, then flexibly use one The method of image series processing highlights the chirping of birds acoustic signature in spectrogram, compared to general general image-recognizing method, this hair Bright more specific aim and identification high efficiency.

Experimental data derives from Xeno-Canto database, and the most of audio file sample rates of the database are 44.1kHz, 16bit, monophonic, the also unified standard as Primary Stage Data format.

1. pre-processing to known chirm data set, resampling is 44.1kHz sample rate, using with Hanning window The Short Time Fourier Transform (STFT) of function calculates the spectrogram of chirm, and carries out maximum value normalization to spectrogram, makes frequency The dynamic range of spectrum information is mapped in [0,1] range, then handles spectrogram for gray level image.

2. carrying out SNR estimation and compensation to spectrogram: signal spectrogram includes to sing and pipe part, noise spectrum figure include noise and Mute part；In most of chirping of birds sound audio, the amplitude of prospect chirping of birds acoustical signal is higher than ambient noise；We utilize this rule Rule reduces ambient noise to isolate signal spectrogram: one threshold value N of setting, if some pixel value is higher than corresponding line in spectrogram And its N times of the intermediate value of respective column, then it is set to 1, is otherwise set to 0.This way approximation has highlighted all heavy in spectrogram The chirping of birds acoustical signal wanted, because high amplitude generally corresponds to singing or piping for birds；Noise in different frequency region simultaneously Level is compensated and is reduced, and the WBD wideband distortion as caused by the ambient noise of uncontrollable factor is attenuated；

For the step reasons for its use noise, filter is corroded and expanded to application binary to eliminate noise and linkage section, or The means of bound fraction morphological image process；

Separation for noise spectrum figure, we follow similar step: one threshold value n (n < N) of setting, if some pixel value is high In n times of corresponding line and its intermediate value of respective column, then it is set to 0, is otherwise set to 1；With the institute of the separating step of signal spectrogram It is because threshold value N has been to highlight the excessive selection that signal section is suitably made to use different threshold values, it is intended that There is provided a safety thus extenuates leeway, and the signal for being in the buffer area neither has clearly song characteristics, nor affects on It is subsequently used for carrying out the information content of the noise section of data enhancing；

To sum up, all the elements for being not chosen as signal or noise spectrum figure hardly provide any effective letter to subsequent neural network Breath.

3. to signal spectrogram and noise spectrum figure progress piecemeal after SNR estimation and compensation, it is contemplated that subsequent migration learns the mind used Through network model, 299 × 299 pixels are cropped to by each piece；

4. carrying out data enhancing to the gray scale spectrogram after piecemeal, data enhancing technology can alleviate portion generally existing in data set Divide between the rare and different birds of chirm data and the case where data serious unbalance occurs；And by enriching training dataset, It can reduce the over-fitting during model training, enhance the generalization ability of model；Increase different from the frequently-used data of normal image Rival's section, for the time-frequency characteristic of the spectrogram of chirm, the present invention carries out data enhancing using following technology:

(1) frequency-domain transform: pitch and volume including the original chirping of birds sound audio to input are changed at random, and amplitude of fluctuation is not More than 5%；

(2) noise adds: noise includes noise sample and random Gaussian；Step 2. when, chirm is divided into letter Number spectrogram and noise spectrum figure, can be randomly selected the sample of noise section, are added in the training sample of signal spectrogram, should Step can improve classification results and accelerate entire training process；Random Gaussian equally can also help neural network to highlight image Feature, the step can restore the ambient noise under truth, assist in the characteristic of model learning noise, or even can support Noise source in anti-reality；

(3) similar sample mixing: under natural environment, it often will appear more birds while singing/piping, to simulate this true feelings Condition adds the spectrogram of same bird difference audio, and random combine；The step will not influence the distribution of sample label, and It can be improved the convergence rate of model, increase identification accuracy.

5. increasing the comparison between varying strength region to further enhance the visual perception power of signal section gray level image Degree carries out Pseudo Col ored Image to it to improve recognition performance, by Jet mapping.Quantify the dynamic range of spectrogram to not same district Domain, different zones are mapped to three monochrome images of red, green, blue respectively, and it is corresponding it is high, in and low-power spectrum information, red table Characteristic is sung/piped to the sound property for showing highest energy；The step for another main purpose be that converting gradation spectrogram is Triple channel RGB image, using the input as subsequent neural network.Obtained triple channel RGB subgraph is finally divided into training set And test set, the ratio of training set and test set are 4:1；

6. especially normal with the prior art to embody the above are the main contents of the chirm recognition methods based on image procossing The superiority of image classification method is advised, the present invention did not did multiprocessing to subsequent neural network, and took common in visual task Transfer learning method；It selects in 2017ILSVR contest and obtains the image recognition structure Squeeze-and- of champion Excitation Networks (SENeT), it is modeled by the correlation to feature interchannel, strengthens important channel Feature weakens the feature in insignificant channel, " highlights bird when chirm data it is considered herein that the identical processing band of this thinking is made an uproar Song inhibits noise " idea；

Therefore, final choice mixed model SE-ResNeXt-50, is finely adjusted it, freezes preceding several layers of weights of neural network, And combine and need to identify that the kind number of birds redefines full articulamentum, obtain pre-training model.By will be used to test 80% Triple channel RGB subgraph, which is input in pre-training model, to be trained, and is retained neural network parameter and is obtained identification model, then will remain The triple channel RGB subgraph of remaining 20% is used for prediction model accuracy rate, obtains recognition result.

The foregoing is only a preferred embodiment of the present invention, limitation in any form not is done to the present invention, Any person skilled in the art, without departing from the scope of the present invention, using in the technology of the disclosure above Hold the equivalent embodiment made a little change or be modified to equivalent variations, but anything that does not depart from the technical scheme of the invention content, Any simple modification, equivalent change and modification to made by above embodiments according to the technical essence of the invention, still fall within In the range of technical solution of the present invention.

Claims

1. a kind of chirm characteristic strengthening method based on image procossing, it is characterised in that the following steps are included:

2. chirm characteristic strengthening method according to claim 1, it is characterised in that step 2.:

3. chirm characteristic strengthening method according to claim 1, it is characterised in that step 4.:

Including frequency-domain transform, noise addition and the mixing of similar sample；

A, frequency-domain transform

B, noise adds

C, similar sample mixing