CN103531199A

CN103531199A - Ecological sound identification method on basis of rapid sparse decomposition and deep learning

Info

Publication number: CN103531199A
Application number: CN201310472330.6A
Authority: CN
Inventors: 李应; 欧阳桢
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2013-10-11
Filing date: 2013-10-11
Publication date: 2014-01-22
Anticipated expiration: 2033-10-11
Also published as: CN103531199B

Abstract

The invention relates to an ecological sound identification method on the basis of rapid sparse decomposition and deep learning, which is characterized by comprising the following steps: S01, respectively carrying out OMP (Orthogonal Matching Pursuit) sparse decomposition on pure sound and test tape noise and correspondingly outputting reconstruction signals and OMP characteristics of the pure sound and the test tape noise; S02, respectively extracting composite characteristics comprising the OMP characteristics from the pure sound and the test tape noise; S03, carrying out DBN (Dynamic Bayesian Network) model training on the composite characteristics extracted from the reconstructed pure sound; and S04, carrying out DBN model classification on the composite characteristics extracted from the reconstructed test tape noise and the trained pure sound and outputting an ecological sound category which the test tape noise belongs to. The ecological sound identification method has a more obvious effect of improving noise resistance and robustness of a system.

Description

Ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt

Technical field

The present invention relates to a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt.

Background technology

In recent years, habitat protection is subject to paying close attention to more and more widely, and real-time information is monitored to grasp in some areas large scale deployment.By analyzing and identify the audio-frequency information comprising in ecologic environment, can provide Data support for application such as intrusion detection, species prospectings.In actual environment, ground unrest complicated and changeable is ubiquitous, and therefore, the ecological voice recognition under noise circumstance has important practical significance.

Voice and music assorting recognition technology are more at present, and the research of ambient sound is relatively less.The audio-frequency information difference that varying environment comprises is very large, as dining room, in the noisy environments such as square, is more voice, clash or Che Sheng etc., and the audio frequency in ecologic environment more lays particular emphasis on animal and the natural sound producing.Have at present compared with multi-method for crying as bird or the frog such as is at the improved recognizer of single classification sound, range of application is comparatively limited, such as: the people such as Chen propose frequency domain character multi-grade mean spectrum (Multi-StageAverageSpectrum, MSAS), in conjunction with syllable length, 18 kinds of batrachia sound are carried out to discriminator twice, recognition effect is better than utilizing separately MSAS feature, but for overlapping animal cry, the classification of syllable length is not obviously prove effective; The people such as Lee use gauss hybrid models (GMM) to carry out modeling to spectrum form feature, and continuous type bird is cried and carries out Classification and Identification.The research that also has some multi-class ecological voice recognitions as: the people such as Raju extract fundamental tone, and resonance peak and short-time energy feature set combination supporting vector machine (SVM) carry out Classification and Identification to comprising 19 kinds of animal sounds of cat and dog lion; The people such as Zhang extract improved Mel frequency cepstral coefficient (Mel-FrequencyCepstralCoefficients, MFCCs) as feature and use GMM to identify various insects sound classification.

These methods part that all comes with some shortcomings above, GMM and Hidden Markov Model (HMM) (HMM) are applied comparatively extensive on the structured audios such as voice, and ecological sound randomness is larger, and be not all structurized, so use above-mentioned production model unstable.Discriminative model SVM and some traditional neural networks can be carried out modeling to Nonlinear separability class preferably, but at high dimensional feature and categorical measure when more, classifying quality is not as good as GMM or HMM.

Summary of the invention

In view of this, the object of this invention is to provide a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt.

The present invention adopts following scheme to realize: a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, it is characterized in that, and comprise the following steps:

S01: respectively pure sound and calibration tape noise sound are carried out to OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;

S02: respectively pure sound and calibration tape noise sound are extracted to the compound characteristics that comprises OMP feature;

S03: the compound characteristics that the pure sound after reconstruct is extracted carries out DBN model training;

S04: the compound characteristics that the pure sound after the calibration tape noise sound after reconstruct and training is extracted carries out DBN category of model, the ecological sound class under output calibration tape noise sound.

In an embodiment of the present invention, suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(gγ)γ ∈ Γ, time-frequency atom gγbe Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom gγcenter, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(aj, pajΔ u, ka-jΔ v, i Δ w), wherein, 0<j≤log2n, 0≤p≤N2-j+1, 0≤k<2j+1, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:

S011: initializing signal residual error R ₀f=f, iterations k=1, maximum iteration time L;

S012: select the iteration atom g the most relevant to signal residual error the k time from cross complete atom dictionary D _{γ k},

| < R_{k} f, g_{γk} > | &GreaterEqual; α \sup_{γ &Element; Γ} | < R_{k} f, g_{γ} > |, 0 < α \leq 1;

S013: judgement || Rkf||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || Rkf||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;

S014: utilize Gram-Schmidt method by gγ kabout selecting former subset gγ p(0<p≤k) orthogonalization obtains projection Pkand calculate respectively new approximate reconstruction signal f=Pkf+Rkf and residual error Rkf;

S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;

S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time

In an embodiment of the present invention, described step S012, utilizes GSO to search for optimum atom, and concrete steps comprise:

S001: initialization firefly population scale n, fluorescein l _i, decision domain radius r ₀, maximum iteration time t _maxand generate at random firefly;

S002: according to xi(t)=(si(t), ui(t), vi(t), wi) and f (x (t)i(t))=|<Rkf,gγ(xi(t))>| calculate firefly i at the present position of the t time iteration xi(t) desired value f (xi(t)), and according to li(t) l=(1-ρ)i(t-1)+η f (xi(t)) be converted into fluorescein value li(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;

S003: every firefly i is in its dynamic decision territory

interior search fluorescein is than self large individuality composition neighborhood collection N _i(t),

N_{i} (t) = {j : d_{ij} (t) < r_{d}^{i} (t), l_{i} (t) < l_{j} (t), 0 < r_{d}^{i} (t) \leq r_{s}},

R wherein _smaximal value for firefly decision domain;

S004: calculate the individual i of firefly to neighborhood collection N _i(t) probability P that in, arbitrary individual j moves _ij,

S005: adopt roulette wheel selection to choose the individual j that probability is the highest and move as mobile object, and upgrade position

x_{i} (t + 1) = x_{i} (t) + s (\frac{x_{j} (t) - x_{i} (t)}{| | x_{j} (t) - x_{i} (t) | |}),

Wherein, s is moving step length;

S006: Regeneration dynamics decision domain radius

r_{d}^{i} (t + 1) = \min {r_{s}, \max {0, r_{d}^{i} (t) + β (n_{t} - | N_{i} (t) |)}},

Wherein β is for controlling the proportionality constant of neighborhood variation range, n _tthe parameter of controlling firefly number in neighborhood, | N _i(t) | represent firefly number in neighborhood;

S007: if reach maximum iteration time t _max, preserve decomposition result and export atom time and frequency parameter, otherwise returning to step S002.

In an embodiment of the present invention, described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,

wherein, the frame index that λ is signal, i is for representing the former subindex of this frame signal, L is atomicity.

In an embodiment of the present invention, choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.

In an embodiment of the present invention, choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.

In an embodiment of the present invention, described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.

In an embodiment of the present invention, RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into solving RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively viand hj, it is P (v that each node of visual layers V is put 1 probabilityi=1),<img TranNum="141" file="BDA0000393855300000041.GIF" he="150" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer Hj=1),<img TranNum="143" file="BDA0000393855300000042.GIF" he="148" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights Wij∝<vihj>data-<vihj>reconstruct, wherein,<vihj>datarepresent known sample collection visual layers node viwith the unknown h of hidden nodejthe expectation value of joint probability distribution,<vihj>reconstructfor by known sample information updating Hidden unit, visual layers unit again after reconstruct<vihj>expectation value of joint probability distribution.

The present invention is more remarkable to the raising of system noise immunity and robustness, and the OMP denoising based on sparse property has improved the noiseproof feature of ecological voice recognition, than spectrum-subtraction and Wavelet Denoising Method, also has certain superiority under several scenes; Adopt the strategy of the optimum atom of GSO algorithm search can effectively reduce the computation complexity that OMP decomposes.

For making object of the present invention, technical scheme and advantage clearer, below will, by specific embodiment and relevant drawings, the present invention be described in further detail.

Accompanying drawing explanation

Fig. 1 is the ecological sound classification process flow diagram that the present invention is based on OMP.

Fig. 2 is atom time and frequency parameter variation diagram of the present invention.

Fig. 3 is the structural drawing of DBN network of the present invention.

Embodiment

The present invention utilizes rapid sparse to decompose and the method for degree of depth study is carried out ecological voice recognition.First, use orthogonal matching pursuit (OMP) limited number of time Its Sparse Decomposition reconstruct voice signal based on firefly algorithm (GSO), retain high Related Component, the low correlation noise of filtering; Secondly, according to atom Time-Frequency Information and frequency domain information, extract compound anti-noise feature; Finally, in conjunction with dark Belief Network (DBN), ecological sound is carried out to Classification and Identification under varying environment and signal to noise ratio (S/N ratio) situation.Experiment shows, the performance of the sparse denoising of OMP is better than spectrum-subtraction and Wavelet Denoising Method, compare with the method for SVM with at present conventional MFCCs, to ecological sound, the recognition performance under different signal to noise ratio (S/N ratio)s has improvement in various degree and has better noise immunity the method, is especially applicable to using under low signal-to-noise ratio noise situation.

OMP algorithm is compressed sensing (CompressedSensing, CS) a kind of greedy restructing algorithm in process, at match tracing (MatchingPursuit, MP) on algorithm basis, propose, these algorithm improvements are each atom of picking out from dictionary that decomposes, be referred to as optimum atom, first utilize Gram-Schmidt method to carry out orthogonalization process to guarantee the optimality of iteration with selecting atom set, thereby reduce iterations.Under the prerequisite requiring in same precision, use the signal degree of rarefication of OMP algorithm reconstruct higher, speed of convergence is faster, utilizing OMP is the feature of utilizing the sparse property of signal to ecological sound denoising, using useful information to be extracted as sparse composition, and using noise as the residual error composition of removing after sparse composition.Noise has certain randomness, owing to not comprising random atom in dictionary, therefore its correlativity is lower.Theoretical according to CS, band noise tone signal is carried out to low-dimensional projection, when observation dimension enough comprises useful information, noise does not have sparse property.The noise contribution of residual error part cannot recover when reconstruct, thereby realizes the object of denoising.Voice signal is mapped to atom dictionary and decomposes, every take turns decompose obtain and original signal inner product maximum, i.e. the highest atom of the degree of correlation, the atom going out by iterative extraction is more, signal residual error is just less, last weighted array atom obtains the best reconstruct of original signal.

The rarefaction representation that obtains signal from cross complete dictionary is a np hard problem, and because the mistakes completeness of dictionary determines the computation complexity of decomposable process, the while also determines sparse property and the reconstruction accuracy of end product.So in order to guarantee reconstruction signal quality, dictionary Atom number is much larger than signal length, the calculated amount of bringing is thus very huge.Searching for optimum atom is that in decomposable process, calculated amount expends the best part, belongs to optimization problem.Therefore the present invention proposes to utilize firefly group to optimize the optimum atom of (Glowworm SwarmOptimization, GSO) algorithm search, guaranteeing, under the prerequisite of solving precision, to improve search efficiency, realizes fast the Its Sparse Decomposition of voice signal.

As shown in Figure 1, the invention provides a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, comprise the following steps:

Suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(gγ)γ ∈ Γ, time-frequency atom gγbe Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom gγcenter, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(aj, pajΔ u, ka-jΔ v, i Δ w), wherein, 0<j≤log2n, 0≤p≤N2-j+1, 0≤k<2j+1, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:

| < R_{k} f, g_{γk} > | &GreaterEqual; α \sup_{γ &Element; Γ} | < R_{k} f, g_{γ} > |, 0 < α \leq 1;

S013: judgement || Rkf||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || Rkf||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;

S014: utilize Gram-Schmidt method by gγ kabout selecting former subset gγ p(0<p≤k) orthogonalization obtains projection Pkand calculate respectively new approximate reconstruction signal f=Pkf+Rkf and residual error Rkf;

Along with the continuous iteration of decomposing is carried out, residue signal energy is constantly decayed, and finally levels off to 0.

According to the description of OMP algorithm, searching for whole dictionary space, to obtain optimum atom be a typical optimization problem.Utilizing GSO to search for optimum atom is by atomic parameter group γk=(s, u, v, w) is as parameter group to be optimized, and corresponding is that firefly i is at the t time residing position x of iterationi, and the inner product of atom and residual signals (t) |<Rkf,gγ k>| as desired value function, that corresponding is the desired value f (x by firefly determining positionsi(t)), and by further calculating fluorescein value li(t).Movement and gathering through firefly, can obtain the position that can obtain maximum fluorescence element, and its physical meaning is optimum atomic parameter.

Utilize GSO to search for optimum atom, concrete steps comprise:

S002: according to xi(t)=(si(t), ui(t), vi(t), wi) and f (x (t)i(t))=|<Rkf,gγ(xi (t))>| calculate firefly i at the present position of the t time iteration xi(t) desired value f (xi(t)), and according to li(t) l=(1-ρ)i(t-1)+η f (xi(t)) be converted into fluorescein value li(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;

S003: every firefly i is in its dynamic decision territory

N_{i} (t) = {j : d_{ij} (t) < r_{d}^{i} (t), l_{i} (t) < l_{j} (t), 0 < r_{d}^{i} (t) \leq r_{s}},

R wherein _smaximal value for firefly decision domain;

x_{i} (t + 1) = x_{i} (t) + s (\frac{x_{j} (t) - x_{i} (t)}{| | x_{j} (t) - x_{i} (t) | |}),

Wherein, s is moving step length;

S006: Regeneration dynamics decision domain radius

r_{d}^{i} (t + 1) = \min {r_{s}, \max {0, r_{d}^{i} (t) + β (n_{t} - | N_{i} (t) |)}},

As shown in Figure 2, Fig. 2 is that single is searched in optimum atom process, and the atomic parameter group medium frequency factor v that firefly is corresponding and phase factor w assemble to the position that can obtain higher desired value.

The Gabor atom that OMP decomposed signal obtains is that the Gauss function by a modulation forms, and because Gauss type function all localizes in time domain and frequency domain, its local characteristics has guaranteed that atom time and frequency parameter can portray the non-stationary time-varying characteristics of signal preferably.

Described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,

For voice signal, along with support set atom number increases, reconstruction accuracy will constantly promote.But degree of rarefication is too high, can bring new problem again, also reconstruct together of low relevant noise during later stage reconstruct, makes system recognition rate not along with the proportional growth of increase of atom number.Guaranteeing under the prerequisite of reconstruction accuracy, the present invention determines that through experiment front 20 atoms of Its Sparse Decomposition are reconstructed best results.Because the degree of rarefication of alternative sounds and noise is not identical, utilize fixing degree of rarefication to be reconstructed and to have certain drawback all sound, so use separately the recognition effect of OMP time-frequency characteristics unsatisfactory.In order to overcome this problem, the present invention chooses MFCCs and fundamental frequency (PITCH) supplements the use of OMP feature.

Wherein, choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.

Choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.In conjunction with the compound anti-noise time-frequency characteristics of above three kinds of latent structures, jointly for portraying ecological sound.

DBN has the feedforward network (Back-Propagation, BP) of supervision to form by some layer unsupervised limited Boltzmann machines (RestrictedBoltzmannMachines, RBM) and one deck, and its structure as shown in Figure 3.

DBN model training comprises two steps, and the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.

RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into the solving of RBM parameter, supposes that the nodal value of visual layers and hidden layer is respectively viand hj, it is P (v that each node of visual layers V is put 1 probabilityi=1),<img TranNum="265" file="BDA0000393855300000081.GIF" he="152" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer Hj=1),<img TranNum="267" file="BDA0000393855300000082.GIF" he="153" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights Wij∝<vihj>data-<vihj>reconstruct, wherein,<vihj>datarepresent known sample collection visual layers node viwith the unknown h of hidden nodejthe expectation value of joint probability distribution,<vihj>reconstructfor by known sample information updating Hidden unit, visual layers unit again after reconstruct<vihj>expectation value of joint probability distribution.

Above-listed preferred embodiment; the object, technical solutions and advantages of the present invention are further described; institute is understood that; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention; within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims

1. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, is characterized in that, comprises the following steps:

2. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, is characterized in that, suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(gγ)γ ∈ Γ, time-frequency atom gγbe Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom gγcenter, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(aj, pajΔ u, ka-jΔ v, i Δ w), wherein, 0<j≤log2n, 0≤p≤N2-j+1, 0≤k<2j+1, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:

| < R_{k} f, g_{γk} > | &GreaterEqual; α \sup_{γ &Element; Γ} | < R_{k} f, g_{γ} > |, 0 < α \leq 1;

S013: judgement || Rkf||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || Rkf||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;

S014: utilize Gram-Schmidt method by gγ kabout selecting former subset gγ p(0<p≤k) orthogonalization obtains projection Pkand calculate respectively new approximate reconstruction signal f=Pkf+Rkf and residual error Rkf;

3. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 2, is characterized in that, described step S012, utilizes GSO to search for optimum atom, and concrete steps comprise:

S002: according to xi(t)=(si(t), ui(t), vi(t), wi) and f (x (t)i(t))=|<Rkf,gγ(xi(t))>| calculate firefly i at the present position of the t time iteration xi(t) desired value f (xi(t)), and according to li(t) l=(1-ρ)i(t-1)+η f (xi(t)) be converted into fluorescein value li(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;

S003: every firefly i is in its dynamic decision territory

interior search fluorescein forms neighborhood collection Ni (t) than self large individuality,

N_{i} (t) = {j : d_{ij} (t) < r_{d}^{i} (t), l_{i} (t) < l_{j} (t), 0 < r_{d}^{i} (t) \leq r_{s}},

R wherein _smaximal value for firefly decision domain;

x_{i} (t + 1) = x_{i} (t) + s (\frac{x_{j} (t) - x_{i} (t)}{| | x_{j} (t) - x_{i} (t) | |}),

Wherein, s is moving step length;

S006: Regeneration dynamics decision domain radius

r_{d}^{i} (t + 1) = \min {r_{s}, \max {0, r_{d}^{i} (t) + β (n_{t} - | N_{i} (t) |)}},

4. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, is characterized in that, described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,

5. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 4, it is characterized in that: choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.

6. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 4, is characterized in that: choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.

7. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, it is characterized in that: described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.

8. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 7, it is characterized in that: RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into solving RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively viand hj, it is P (v that each node of visual layers V is put 1 probabilityi=1),<img TranNum="377" file="FDA0000393855290000031.GIF" he="139" id="ifm0010" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer Hj=1),<img TranNum="379" file="FDA0000393855290000032.GIF" he="138" id="ifm0011" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights Wij∝<vihj>data-<vihj>reconstruct, wherein,<vihj>datarepresent known sample collection visual layers node viwith the unknown h of hidden nodejthe expectation value of joint probability distribution,<vihj>reconstructfor by known sample information updating Hidden unit, visual layers unit is the v after reconstruct againihjthe expectation value of joint probability distribution.