CN103531199A - Ecological sound identification method on basis of rapid sparse decomposition and deep learning - Google Patents

Ecological sound identification method on basis of rapid sparse decomposition and deep learning Download PDF

Info

Publication number
CN103531199A
CN103531199A CN201310472330.6A CN201310472330A CN103531199A CN 103531199 A CN103531199 A CN 103531199A CN 201310472330 A CN201310472330 A CN 201310472330A CN 103531199 A CN103531199 A CN 103531199A
Authority
CN
China
Prior art keywords
sub
trannum
sub trannum
sound
sup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310472330.6A
Other languages
Chinese (zh)
Other versions
CN103531199B (en
Inventor
李应
欧阳桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201310472330.6A priority Critical patent/CN103531199B/en
Publication of CN103531199A publication Critical patent/CN103531199A/en
Application granted granted Critical
Publication of CN103531199B publication Critical patent/CN103531199B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention relates to an ecological sound identification method on the basis of rapid sparse decomposition and deep learning, which is characterized by comprising the following steps: S01, respectively carrying out OMP (Orthogonal Matching Pursuit) sparse decomposition on pure sound and test tape noise and correspondingly outputting reconstruction signals and OMP characteristics of the pure sound and the test tape noise; S02, respectively extracting composite characteristics comprising the OMP characteristics from the pure sound and the test tape noise; S03, carrying out DBN (Dynamic Bayesian Network) model training on the composite characteristics extracted from the reconstructed pure sound; and S04, carrying out DBN model classification on the composite characteristics extracted from the reconstructed test tape noise and the trained pure sound and outputting an ecological sound category which the test tape noise belongs to. The ecological sound identification method has a more obvious effect of improving noise resistance and robustness of a system.

Description

Ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt
Technical field
The present invention relates to a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt.
Background technology
In recent years, habitat protection is subject to paying close attention to more and more widely, and real-time information is monitored to grasp in some areas large scale deployment.By analyzing and identify the audio-frequency information comprising in ecologic environment, can provide Data support for application such as intrusion detection, species prospectings.In actual environment, ground unrest complicated and changeable is ubiquitous, and therefore, the ecological voice recognition under noise circumstance has important practical significance.
Voice and music assorting recognition technology are more at present, and the research of ambient sound is relatively less.The audio-frequency information difference that varying environment comprises is very large, as dining room, in the noisy environments such as square, is more voice, clash or Che Sheng etc., and the audio frequency in ecologic environment more lays particular emphasis on animal and the natural sound producing.Have at present compared with multi-method for crying as bird or the frog such as is at the improved recognizer of single classification sound, range of application is comparatively limited, such as: the people such as Chen propose frequency domain character multi-grade mean spectrum (Multi-StageAverageSpectrum, MSAS), in conjunction with syllable length, 18 kinds of batrachia sound are carried out to discriminator twice, recognition effect is better than utilizing separately MSAS feature, but for overlapping animal cry, the classification of syllable length is not obviously prove effective; The people such as Lee use gauss hybrid models (GMM) to carry out modeling to spectrum form feature, and continuous type bird is cried and carries out Classification and Identification.The research that also has some multi-class ecological voice recognitions as: the people such as Raju extract fundamental tone, and resonance peak and short-time energy feature set combination supporting vector machine (SVM) carry out Classification and Identification to comprising 19 kinds of animal sounds of cat and dog lion; The people such as Zhang extract improved Mel frequency cepstral coefficient (Mel-FrequencyCepstralCoefficients, MFCCs) as feature and use GMM to identify various insects sound classification.
These methods part that all comes with some shortcomings above, GMM and Hidden Markov Model (HMM) (HMM) are applied comparatively extensive on the structured audios such as voice, and ecological sound randomness is larger, and be not all structurized, so use above-mentioned production model unstable.Discriminative model SVM and some traditional neural networks can be carried out modeling to Nonlinear separability class preferably, but at high dimensional feature and categorical measure when more, classifying quality is not as good as GMM or HMM.
Summary of the invention
In view of this, the object of this invention is to provide a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt.
The present invention adopts following scheme to realize: a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, it is characterized in that, and comprise the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: respectively pure sound and calibration tape noise sound are extracted to the compound characteristics that comprises OMP feature;
S03: the compound characteristics that the pure sound after reconstruct is extracted carries out DBN model training;
S04: the compound characteristics that the pure sound after the calibration tape noise sound after reconstruct and training is extracted carries out DBN category of model, the ecological sound class under output calibration tape noise sound.
In an embodiment of the present invention, suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(g<sub TranNum="64">γ</sub>)<sub TranNum="65">γ ∈ Γ</sub>, time-frequency atom g<sub TranNum="66">γ</sub>be Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g<sub TranNum="67">γ</sub>center, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a<sup TranNum="68">j</sup>, pa<sup TranNum="69">j</sup>Δ u, ka<sup TranNum="70">-j</sup>Δ v, i Δ w), wherein, 0<j≤log<sub TranNum="71">2</sub>n, 0≤p≤N2<sup TranNum="72">-j+1</sup>, 0≤k<2<sup TranNum="73">j+1</sup>, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R 0f=f, iterations k=1, maximum iteration time L;
S012: select the iteration atom g the most relevant to signal residual error the k time from cross complete atom dictionary D γ k,
| < R k f , g &gamma;k > | &GreaterEqual; &alpha; sup &gamma; &Element; &Gamma; | < R k f , g &gamma; > | , 0 < &alpha; &le; 1 ;
S013: judgement || R<sub TranNum="81">k</sub>f||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || R<sub TranNum="82">k</sub>f||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize Gram-Schmidt method by g<sub TranNum="84">γ k</sub>about selecting former subset g<sub TranNum="85">γ p</sub>(0<p≤k) orthogonalization obtains projection P<sub TranNum="86">k</sub>and calculate respectively new approximate reconstruction signal f=P<sub TranNum="87">k</sub>f+R<sub TranNum="88">k</sub>f and residual error R<sub TranNum="89">k</sub>f;
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
Figure BDA0000393855300000022
In an embodiment of the present invention, described step S012, utilizes GSO to search for optimum atom, and concrete steps comprise:
S001: initialization firefly population scale n, fluorescein l i, decision domain radius r 0, maximum iteration time t maxand generate at random firefly;
S002: according to x<sub TranNum="99">i</sub>(t)=(s<sub TranNum="100">i</sub>(t), u<sub TranNum="101">i</sub>(t), v<sub TranNum="102">i</sub>(t), w<sub TranNum="103">i</sub>) and f (x (t)<sub TranNum="104">i</sub>(t))=|<R<sup TranNum="105">k</sup>f,g<sub TranNum="106">γ</sub>(x<sub TranNum="107">i</sub>(t))>| calculate firefly i at the present position of the t time iteration x<sub TranNum="108">i</sub>(t) desired value f (x<sub TranNum="109">i</sub>(t)), and according to l<sub TranNum="110">i</sub>(t) l=(1-ρ)<sub TranNum="111">i</sub>(t-1)+η f (x<sub TranNum="112">i</sub>(t)) be converted into fluorescein value l<sub TranNum="113">i</sub>(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;
S003: every firefly i is in its dynamic decision territory
Figure BDA0000393855300000036
interior search fluorescein is than self large individuality composition neighborhood collection N i(t), N i ( t ) = { j : d ij ( t ) < r d i ( t ) , l i ( t ) < l j ( t ) , 0 < r d i ( t ) &le; r s } , R wherein smaximal value for firefly decision domain;
S004: calculate the individual i of firefly to neighborhood collection N i(t) probability P that in, arbitrary individual j moves ij,
Figure BDA0000393855300000032
S005: adopt roulette wheel selection to choose the individual j that probability is the highest and move as mobile object, and upgrade position x i ( t + 1 ) = x i ( t ) + s ( x j ( t ) - x i ( t ) | | x j ( t ) - x i ( t ) | | ) , Wherein, s is moving step length;
S006: Regeneration dynamics decision domain radius
Figure BDA0000393855300000037
r d i ( t + 1 ) = min { r s , max { 0 , r d i ( t ) + &beta; ( n t - | N i ( t ) | ) } } , Wherein β is for controlling the proportionality constant of neighborhood variation range, n tthe parameter of controlling firefly number in neighborhood, | N i(t) | represent firefly number in neighborhood;
S007: if reach maximum iteration time t max, preserve decomposition result and export atom time and frequency parameter, otherwise returning to step S002.
In an embodiment of the present invention, described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
Figure BDA0000393855300000035
wherein, the frame index that λ is signal, i is for representing the former subindex of this frame signal, L is atomicity.
In an embodiment of the present invention, choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.
In an embodiment of the present invention, choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.
In an embodiment of the present invention, described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.
In an embodiment of the present invention, RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into solving RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively v<sub TranNum="138">i</sub>and h<sub TranNum="139">j</sub>, it is P (v that each node of visual layers V is put 1 probability<sub TranNum="140">i</sub>=1),<img TranNum="141" file="BDA0000393855300000041.GIF" he="150" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer H<sub TranNum="142">j</sub>=1),<img TranNum="143" file="BDA0000393855300000042.GIF" he="148" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights W<sub TranNum="144">ij</sub>∝<v<sub TranNum="145">i</sub>h<sub TranNum="146">j</sub>><sub TranNum="147">data</sub>-<v<sub TranNum="148">i</sub>h<sub TranNum="149">j</sub>><sub TranNum="150">reconstruct</sub>, wherein,<v<sub TranNum="151">i</sub>h<sub TranNum="152">j</sub>><sub TranNum="153">data</sub>represent known sample collection visual layers node v<sub TranNum="154">i</sub>with the unknown h of hidden node<sub TranNum="155">j</sub>the expectation value of joint probability distribution,<v<sub TranNum="156">i</sub>h<sub TranNum="157">j</sub>><sub TranNum="158">reconstruct</sub>for by known sample information updating Hidden unit, visual layers unit again after reconstruct<v<sub TranNum="159">i</sub>h<sub TranNum="160">j</sub>>expectation value of joint probability distribution.
The present invention is more remarkable to the raising of system noise immunity and robustness, and the OMP denoising based on sparse property has improved the noiseproof feature of ecological voice recognition, than spectrum-subtraction and Wavelet Denoising Method, also has certain superiority under several scenes; Adopt the strategy of the optimum atom of GSO algorithm search can effectively reduce the computation complexity that OMP decomposes.
For making object of the present invention, technical scheme and advantage clearer, below will, by specific embodiment and relevant drawings, the present invention be described in further detail.
Accompanying drawing explanation
Fig. 1 is the ecological sound classification process flow diagram that the present invention is based on OMP.
Fig. 2 is atom time and frequency parameter variation diagram of the present invention.
Fig. 3 is the structural drawing of DBN network of the present invention.
Embodiment
The present invention utilizes rapid sparse to decompose and the method for degree of depth study is carried out ecological voice recognition.First, use orthogonal matching pursuit (OMP) limited number of time Its Sparse Decomposition reconstruct voice signal based on firefly algorithm (GSO), retain high Related Component, the low correlation noise of filtering; Secondly, according to atom Time-Frequency Information and frequency domain information, extract compound anti-noise feature; Finally, in conjunction with dark Belief Network (DBN), ecological sound is carried out to Classification and Identification under varying environment and signal to noise ratio (S/N ratio) situation.Experiment shows, the performance of the sparse denoising of OMP is better than spectrum-subtraction and Wavelet Denoising Method, compare with the method for SVM with at present conventional MFCCs, to ecological sound, the recognition performance under different signal to noise ratio (S/N ratio)s has improvement in various degree and has better noise immunity the method, is especially applicable to using under low signal-to-noise ratio noise situation.
OMP algorithm is compressed sensing (CompressedSensing, CS) a kind of greedy restructing algorithm in process, at match tracing (MatchingPursuit, MP) on algorithm basis, propose, these algorithm improvements are each atom of picking out from dictionary that decomposes, be referred to as optimum atom, first utilize Gram-Schmidt method to carry out orthogonalization process to guarantee the optimality of iteration with selecting atom set, thereby reduce iterations.Under the prerequisite requiring in same precision, use the signal degree of rarefication of OMP algorithm reconstruct higher, speed of convergence is faster, utilizing OMP is the feature of utilizing the sparse property of signal to ecological sound denoising, using useful information to be extracted as sparse composition, and using noise as the residual error composition of removing after sparse composition.Noise has certain randomness, owing to not comprising random atom in dictionary, therefore its correlativity is lower.Theoretical according to CS, band noise tone signal is carried out to low-dimensional projection, when observation dimension enough comprises useful information, noise does not have sparse property.The noise contribution of residual error part cannot recover when reconstruct, thereby realizes the object of denoising.Voice signal is mapped to atom dictionary and decomposes, every take turns decompose obtain and original signal inner product maximum, i.e. the highest atom of the degree of correlation, the atom going out by iterative extraction is more, signal residual error is just less, last weighted array atom obtains the best reconstruct of original signal.
The rarefaction representation that obtains signal from cross complete dictionary is a np hard problem, and because the mistakes completeness of dictionary determines the computation complexity of decomposable process, the while also determines sparse property and the reconstruction accuracy of end product.So in order to guarantee reconstruction signal quality, dictionary Atom number is much larger than signal length, the calculated amount of bringing is thus very huge.Searching for optimum atom is that in decomposable process, calculated amount expends the best part, belongs to optimization problem.Therefore the present invention proposes to utilize firefly group to optimize the optimum atom of (Glowworm SwarmOptimization, GSO) algorithm search, guaranteeing, under the prerequisite of solving precision, to improve search efficiency, realizes fast the Its Sparse Decomposition of voice signal.
As shown in Figure 1, the invention provides a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, comprise the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: respectively pure sound and calibration tape noise sound are extracted to the compound characteristics that comprises OMP feature;
S03: the compound characteristics that the pure sound after reconstruct is extracted carries out DBN model training;
S04: the compound characteristics that the pure sound after the calibration tape noise sound after reconstruct and training is extracted carries out DBN category of model, the ecological sound class under output calibration tape noise sound.
Suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(g<sub TranNum="177">γ</sub>)<sub TranNum="178">γ ∈ Γ</sub>, time-frequency atom g<sub TranNum="179">γ</sub>be Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g<sub TranNum="180">γ</sub>center, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a<sup TranNum="181">j</sup>, pa<sup TranNum="182">j</sup>Δ u, ka<sup TranNum="183">-j</sup>Δ v, i Δ w), wherein, 0<j≤log<sub TranNum="184">2</sub>n, 0≤p≤N2<sup TranNum="185">-j+1</sup>, 0≤k<2<sup TranNum="186">j+1</sup>, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R 0f=f, iterations k=1, maximum iteration time L;
S012: select the iteration atom g the most relevant to signal residual error the k time from cross complete atom dictionary D γ k,
| < R k f , g &gamma;k > | &GreaterEqual; &alpha; sup &gamma; &Element; &Gamma; | < R k f , g &gamma; > | , 0 < &alpha; &le; 1 ;
S013: judgement || R<sub TranNum="194">k</sub>f||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || R<sub TranNum="195">k</sub>f||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize Gram-Schmidt method by g<sub TranNum="197">γ k</sub>about selecting former subset g<sub TranNum="198">γ p</sub>(0<p≤k) orthogonalization obtains projection P<sub TranNum="199">k</sub>and calculate respectively new approximate reconstruction signal f=P<sub TranNum="200">k</sub>f+R<sub TranNum="201">k</sub>f and residual error R<sub TranNum="202">k</sub>f;
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
Figure BDA0000393855300000062
Along with the continuous iteration of decomposing is carried out, residue signal energy is constantly decayed, and finally levels off to 0.
According to the description of OMP algorithm, searching for whole dictionary space, to obtain optimum atom be a typical optimization problem.Utilizing GSO to search for optimum atom is by atomic parameter group γ<sub TranNum="208">k</sub>=(s, u, v, w) is as parameter group to be optimized, and corresponding is that firefly i is at the t time residing position x of iteration<sub TranNum="209">i</sub>, and the inner product of atom and residual signals (t) |<R<sup TranNum="210">k</sup>f,g<sub TranNum="211">γ k</sub>>| as desired value function, that corresponding is the desired value f (x by firefly determining positions<sub TranNum="212">i</sub>(t)), and by further calculating fluorescein value l<sub TranNum="213">i</sub>(t).Movement and gathering through firefly, can obtain the position that can obtain maximum fluorescence element, and its physical meaning is optimum atomic parameter.
Utilize GSO to search for optimum atom, concrete steps comprise:
S001: initialization firefly population scale n, fluorescein l i, decision domain radius r 0, maximum iteration time t maxand generate at random firefly;
S002: according to x<sub TranNum="220">i</sub>(t)=(s<sub TranNum="221">i</sub>(t), u<sub TranNum="222">i</sub>(t), v<sub TranNum="223">i</sub>(t), w<sub TranNum="224">i</sub>) and f (x (t)<sub TranNum="225">i</sub>(t))=|<R<sup TranNum="226">k</sup>f,g<sub TranNum="227">γ</sub>(xi (t))>| calculate firefly i at the present position of the t time iteration x<sub TranNum="228">i</sub>(t) desired value f (x<sub TranNum="229">i</sub>(t)), and according to l<sub TranNum="230">i</sub>(t) l=(1-ρ)<sub TranNum="231">i</sub>(t-1)+η f (x<sub TranNum="232">i</sub>(t)) be converted into fluorescein value l<sub TranNum="233">i</sub>(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;
S003: every firefly i is in its dynamic decision territory
Figure BDA0000393855300000063
interior search fluorescein is than self large individuality composition neighborhood collection N i(t), N i ( t ) = { j : d ij ( t ) < r d i ( t ) , l i ( t ) < l j ( t ) , 0 < r d i ( t ) &le; r s } , R wherein smaximal value for firefly decision domain;
S004: calculate the individual i of firefly to neighborhood collection N i(t) probability P that in, arbitrary individual j moves ij,
Figure BDA0000393855300000072
S005: adopt roulette wheel selection to choose the individual j that probability is the highest and move as mobile object, and upgrade position x i ( t + 1 ) = x i ( t ) + s ( x j ( t ) - x i ( t ) | | x j ( t ) - x i ( t ) | | ) , Wherein, s is moving step length;
S006: Regeneration dynamics decision domain radius
Figure BDA0000393855300000074
r d i ( t + 1 ) = min { r s , max { 0 , r d i ( t ) + &beta; ( n t - | N i ( t ) | ) } } , Wherein β is for controlling the proportionality constant of neighborhood variation range, n tthe parameter of controlling firefly number in neighborhood, | N i(t) | represent firefly number in neighborhood;
S007: if reach maximum iteration time t max, preserve decomposition result and export atom time and frequency parameter, otherwise returning to step S002.
As shown in Figure 2, Fig. 2 is that single is searched in optimum atom process, and the atomic parameter group medium frequency factor v that firefly is corresponding and phase factor w assemble to the position that can obtain higher desired value.
The Gabor atom that OMP decomposed signal obtains is that the Gauss function by a modulation forms, and because Gauss type function all localizes in time domain and frequency domain, its local characteristics has guaranteed that atom time and frequency parameter can portray the non-stationary time-varying characteristics of signal preferably.
Described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
Figure BDA0000393855300000076
wherein, the frame index that λ is signal, i is for representing the former subindex of this frame signal, L is atomicity.
For voice signal, along with support set atom number increases, reconstruction accuracy will constantly promote.But degree of rarefication is too high, can bring new problem again, also reconstruct together of low relevant noise during later stage reconstruct, makes system recognition rate not along with the proportional growth of increase of atom number.Guaranteeing under the prerequisite of reconstruction accuracy, the present invention determines that through experiment front 20 atoms of Its Sparse Decomposition are reconstructed best results.Because the degree of rarefication of alternative sounds and noise is not identical, utilize fixing degree of rarefication to be reconstructed and to have certain drawback all sound, so use separately the recognition effect of OMP time-frequency characteristics unsatisfactory.In order to overcome this problem, the present invention chooses MFCCs and fundamental frequency (PITCH) supplements the use of OMP feature.
Wherein, choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.
Choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.In conjunction with the compound anti-noise time-frequency characteristics of above three kinds of latent structures, jointly for portraying ecological sound.
DBN has the feedforward network (Back-Propagation, BP) of supervision to form by some layer unsupervised limited Boltzmann machines (RestrictedBoltzmannMachines, RBM) and one deck, and its structure as shown in Figure 3.
DBN model training comprises two steps, and the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.
RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into the solving of RBM parameter, supposes that the nodal value of visual layers and hidden layer is respectively v<sub TranNum="262">i</sub>and h<sub TranNum="263">j</sub>, it is P (v that each node of visual layers V is put 1 probability<sub TranNum="264">i</sub>=1),<img TranNum="265" file="BDA0000393855300000081.GIF" he="152" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer H<sub TranNum="266">j</sub>=1),<img TranNum="267" file="BDA0000393855300000082.GIF" he="153" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights W<sub TranNum="268">ij</sub>∝<v<sub TranNum="269">i</sub>h<sub TranNum="270">j</sub>><sub TranNum="271">data</sub>-<v<sub TranNum="272">i</sub>h<sub TranNum="273">j</sub>><sub TranNum="274">reconstruct</sub>, wherein,<v<sub TranNum="275">i</sub>h<sub TranNum="276">j</sub>><sub TranNum="277">data</sub>represent known sample collection visual layers node v<sub TranNum="278">i</sub>with the unknown h of hidden node<sub TranNum="279">j</sub>the expectation value of joint probability distribution,<v<sub TranNum="280">i</sub>h<sub TranNum="281">j</sub>><sub TranNum="282">reconstruct</sub>for by known sample information updating Hidden unit, visual layers unit again after reconstruct<v<sub TranNum="283">i</sub>h<sub TranNum="284">j</sub>>expectation value of joint probability distribution.
Above-listed preferred embodiment; the object, technical solutions and advantages of the present invention are further described; institute is understood that; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention; within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (8)

1. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, is characterized in that, comprises the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: respectively pure sound and calibration tape noise sound are extracted to the compound characteristics that comprises OMP feature;
S03: the compound characteristics that the pure sound after reconstruct is extracted carries out DBN model training;
S04: the compound characteristics that the pure sound after the calibration tape noise sound after reconstruct and training is extracted carries out DBN category of model, the ecological sound class under output calibration tape noise sound.
2. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, is characterized in that, suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(g<sub TranNum="295">γ</sub>)<sub TranNum="296">γ ∈ Γ</sub>, time-frequency atom g<sub TranNum="297">γ</sub>be Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g<sub TranNum="298">γ</sub>center, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a<sup TranNum="299">j</sup>, pa<sup TranNum="300">j</sup>Δ u, ka<sup TranNum="301">-j</sup>Δ v, i Δ w), wherein, 0<j≤log<sub TranNum="302">2</sub>n, 0≤p≤N2<sup TranNum="303">-j+1</sup>, 0≤k<2<sup TranNum="304">j+1</sup>, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R 0f=f, iterations k=1, maximum iteration time L;
S012: select the iteration atom g the most relevant to signal residual error the k time from cross complete atom dictionary D γ k,
| < R k f , g &gamma;k > | &GreaterEqual; &alpha; sup &gamma; &Element; &Gamma; | < R k f , g &gamma; > | , 0 < &alpha; &le; 1 ;
S013: judgement || R<sub TranNum="312">k</sub>f||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || R<sub TranNum="313">k</sub>f||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize Gram-Schmidt method by g<sub TranNum="315">γ k</sub>about selecting former subset g<sub TranNum="316">γ p</sub>(0<p≤k) orthogonalization obtains projection P<sub TranNum="317">k</sub>and calculate respectively new approximate reconstruction signal f=P<sub TranNum="318">k</sub>f+R<sub TranNum="319">k</sub>f and residual error R<sub TranNum="320">k</sub>f;
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
Figure FDA0000393855290000012
3. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 2, is characterized in that, described step S012, utilizes GSO to search for optimum atom, and concrete steps comprise:
S001: initialization firefly population scale n, fluorescein l i, decision domain radius r 0, maximum iteration time t maxand generate at random firefly;
S002: according to x<sub TranNum="331">i</sub>(t)=(s<sub TranNum="332">i</sub>(t), u<sub TranNum="333">i</sub>(t), v<sub TranNum="334">i</sub>(t), w<sub TranNum="335">i</sub>) and f (x (t)<sub TranNum="336">i</sub>(t))=|<R<sup TranNum="337">k</sup>f,g<sub TranNum="338">γ</sub>(x<sub TranNum="339">i</sub>(t))>| calculate firefly i at the present position of the t time iteration x<sub TranNum="340">i</sub>(t) desired value f (x<sub TranNum="341">i</sub>(t)), and according to l<sub TranNum="342">i</sub>(t) l=(1-ρ)<sub TranNum="343">i</sub>(t-1)+η f (x<sub TranNum="344">i</sub>(t)) be converted into fluorescein value l<sub TranNum="345">i</sub>(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;
S003: every firefly i is in its dynamic decision territory
Figure FDA0000393855290000021
interior search fluorescein forms neighborhood collection Ni (t) than self large individuality, N i ( t ) = { j : d ij ( t ) < r d i ( t ) , l i ( t ) < l j ( t ) , 0 < r d i ( t ) &le; r s } , R wherein smaximal value for firefly decision domain;
S004: calculate the individual i of firefly to neighborhood collection N i(t) probability P that in, arbitrary individual j moves ij,
Figure FDA0000393855290000023
S005: adopt roulette wheel selection to choose the individual j that probability is the highest and move as mobile object, and upgrade position x i ( t + 1 ) = x i ( t ) + s ( x j ( t ) - x i ( t ) | | x j ( t ) - x i ( t ) | | ) , Wherein, s is moving step length;
S006: Regeneration dynamics decision domain radius
Figure FDA0000393855290000025
r d i ( t + 1 ) = min { r s , max { 0 , r d i ( t ) + &beta; ( n t - | N i ( t ) | ) } } , Wherein β is for controlling the proportionality constant of neighborhood variation range, n tthe parameter of controlling firefly number in neighborhood, | N i(t) | represent firefly number in neighborhood;
S007: if reach maximum iteration time t max, preserve decomposition result and export atom time and frequency parameter, otherwise returning to step S002.
4. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, is characterized in that, described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
Figure FDA0000393855290000027
wherein, the frame index that λ is signal, i is for representing the former subindex of this frame signal, L is atomicity.
5. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 4, it is characterized in that: choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.
6. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 4, is characterized in that: choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.
7. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, it is characterized in that: described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.
8. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 7, it is characterized in that: RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into solving RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively v<sub TranNum="374">i</sub>and h<sub TranNum="375">j</sub>, it is P (v that each node of visual layers V is put 1 probability<sub TranNum="376">i</sub>=1),<img TranNum="377" file="FDA0000393855290000031.GIF" he="139" id="ifm0010" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer H<sub TranNum="378">j</sub>=1),<img TranNum="379" file="FDA0000393855290000032.GIF" he="138" id="ifm0011" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights W<sub TranNum="380">ij</sub>∝<v<sub TranNum="381">i</sub>h<sub TranNum="382">j</sub>><sub TranNum="383">data</sub>-<v<sub TranNum="384">i</sub>h<sub TranNum="385">j</sub>><sub TranNum="386">reconstruct</sub>, wherein,<v<sub TranNum="387">i</sub>h<sub TranNum="388">j</sub>><sub TranNum="389">data</sub>represent known sample collection visual layers node v<sub TranNum="390">i</sub>with the unknown h of hidden node<sub TranNum="391">j</sub>the expectation value of joint probability distribution,<v<sub TranNum="392">i</sub>h<sub TranNum="393">j</sub>><sub TranNum="394">reconstruct</sub>for by known sample information updating Hidden unit, visual layers unit is the v after reconstruct again<sub TranNum="395">i</sub>h<sub TranNum="396">j</sub>the expectation value of joint probability distribution.
CN201310472330.6A 2013-10-11 2013-10-11 Based on the ecological that rapid sparse decomposition and the degree of depth learn Expired - Fee Related CN103531199B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310472330.6A CN103531199B (en) 2013-10-11 2013-10-11 Based on the ecological that rapid sparse decomposition and the degree of depth learn

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310472330.6A CN103531199B (en) 2013-10-11 2013-10-11 Based on the ecological that rapid sparse decomposition and the degree of depth learn

Publications (2)

Publication Number Publication Date
CN103531199A true CN103531199A (en) 2014-01-22
CN103531199B CN103531199B (en) 2016-03-09

Family

ID=49933152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310472330.6A Expired - Fee Related CN103531199B (en) 2013-10-11 2013-10-11 Based on the ecological that rapid sparse decomposition and the degree of depth learn

Country Status (1)

Country Link
CN (1) CN103531199B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104157290A (en) * 2014-08-19 2014-11-19 大连理工大学 Speaker recognition method based on depth learning
CN104464727A (en) * 2014-12-11 2015-03-25 福州大学 Single-channel music singing separation method based on deep belief network
CN104850837A (en) * 2015-05-18 2015-08-19 西南交通大学 Handwritten character recognition method
CN105551503A (en) * 2015-12-24 2016-05-04 武汉大学 Audio matching tracking method based on atom pre-selection and system thereof
CN105654964A (en) * 2016-01-20 2016-06-08 司法部司法鉴定科学技术研究所 Recording audio device source determination method and device
CN106059971A (en) * 2016-07-07 2016-10-26 西北工业大学 Sparse reconstruction based correlation detection method under signal correlation attenuation condition
WO2017076211A1 (en) * 2015-11-05 2017-05-11 阿里巴巴集团控股有限公司 Voice-based role separation method and device
CN107039036A (en) * 2017-02-17 2017-08-11 南京邮电大学 A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN107293301A (en) * 2017-05-27 2017-10-24 深圳大学 Recognition methods and system based on dental articulation sound
CN107464556A (en) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 A kind of audio scene recognition method based on sparse coding
CN107729381A (en) * 2017-09-15 2018-02-23 广州嘉影软件有限公司 Interactive multimedia resource polymerization method and system based on multidimensional characteristic identification
CN109682892A (en) * 2018-12-26 2019-04-26 西安科技大学 A kind of signal based on time frequency analysis removes drying method
CN109862518A (en) * 2019-01-11 2019-06-07 福州大学 It is a kind of that equipment localization method is exempted from based on sparse analytic modell analytical model altogether
CN111507321A (en) * 2020-07-01 2020-08-07 中国地质大学(武汉) Training method, classification method and device of multi-output land cover classification model
CN112885357A (en) * 2021-01-13 2021-06-01 上海英粤汽车科技有限公司 Method for recognizing animal category through voice
CN113238189A (en) * 2021-05-24 2021-08-10 清华大学 Sound source identification method and system based on array measurement and sparse prior information
CN113470654A (en) * 2021-06-02 2021-10-01 国网浙江省电力有限公司绍兴供电公司 Voiceprint automatic identification system and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102592593A (en) * 2012-03-31 2012-07-18 山东大学 Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech
KR101232707B1 (en) * 2012-02-07 2013-02-13 고려대학교 산학협력단 Apparatus and method for reconstructing signal using compressive sensing algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101232707B1 (en) * 2012-02-07 2013-02-13 고려대학교 산학협력단 Apparatus and method for reconstructing signal using compressive sensing algorithm
CN102592593A (en) * 2012-03-31 2012-07-18 山东大学 Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李雨昕: "《语音信号MP稀疏分解快速算法及在语音识别中的初步应用》", 《中国优秀硕士学位论文全文数据库》, 30 October 2009 (2009-10-30) *
邵君: "《基于MP的信号稀疏分解算法研究》", 《中国优秀硕士学位论文全文数据库》, 5 March 2007 (2007-03-05) *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113789A (en) * 2014-07-10 2014-10-22 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104113789B (en) * 2014-07-10 2017-04-12 杭州电子科技大学 On-line video abstraction generation method based on depth learning
CN104157290A (en) * 2014-08-19 2014-11-19 大连理工大学 Speaker recognition method based on depth learning
CN104157290B (en) * 2014-08-19 2017-10-24 大连理工大学 A kind of method for distinguishing speek person based on deep learning
CN104464727A (en) * 2014-12-11 2015-03-25 福州大学 Single-channel music singing separation method based on deep belief network
CN104850837B (en) * 2015-05-18 2017-12-05 西南交通大学 The recognition methods of handwriting
CN104850837A (en) * 2015-05-18 2015-08-19 西南交通大学 Handwritten character recognition method
WO2017076211A1 (en) * 2015-11-05 2017-05-11 阿里巴巴集团控股有限公司 Voice-based role separation method and device
CN105551503A (en) * 2015-12-24 2016-05-04 武汉大学 Audio matching tracking method based on atom pre-selection and system thereof
CN105551503B (en) * 2015-12-24 2019-03-01 武汉大学 Based on the preselected Audio Matching method for tracing of atom and system
CN105654964A (en) * 2016-01-20 2016-06-08 司法部司法鉴定科学技术研究所 Recording audio device source determination method and device
CN107464556A (en) * 2016-06-02 2017-12-12 国家计算机网络与信息安全管理中心 A kind of audio scene recognition method based on sparse coding
CN106059971A (en) * 2016-07-07 2016-10-26 西北工业大学 Sparse reconstruction based correlation detection method under signal correlation attenuation condition
CN107039036A (en) * 2017-02-17 2017-08-11 南京邮电大学 A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network
CN107039036B (en) * 2017-02-17 2020-06-16 南京邮电大学 High-quality speaker recognition method based on automatic coding depth confidence network
CN107293301A (en) * 2017-05-27 2017-10-24 深圳大学 Recognition methods and system based on dental articulation sound
CN107729381B (en) * 2017-09-15 2020-05-08 广州嘉影软件有限公司 Interactive multimedia resource aggregation method and system based on multi-dimensional feature recognition
CN107729381A (en) * 2017-09-15 2018-02-23 广州嘉影软件有限公司 Interactive multimedia resource polymerization method and system based on multidimensional characteristic identification
CN109682892A (en) * 2018-12-26 2019-04-26 西安科技大学 A kind of signal based on time frequency analysis removes drying method
CN109682892B (en) * 2018-12-26 2021-07-09 西安科技大学 Signal denoising method based on time-frequency analysis
CN109862518A (en) * 2019-01-11 2019-06-07 福州大学 It is a kind of that equipment localization method is exempted from based on sparse analytic modell analytical model altogether
CN109862518B (en) * 2019-01-11 2021-05-18 福州大学 Equipment-free positioning method based on common sparse analysis model
CN111507321A (en) * 2020-07-01 2020-08-07 中国地质大学(武汉) Training method, classification method and device of multi-output land cover classification model
CN112885357A (en) * 2021-01-13 2021-06-01 上海英粤汽车科技有限公司 Method for recognizing animal category through voice
CN113238189A (en) * 2021-05-24 2021-08-10 清华大学 Sound source identification method and system based on array measurement and sparse prior information
CN113238189B (en) * 2021-05-24 2023-03-10 清华大学 Sound source identification method and system based on array measurement and sparse prior information
CN113470654A (en) * 2021-06-02 2021-10-01 国网浙江省电力有限公司绍兴供电公司 Voiceprint automatic identification system and method

Also Published As

Publication number Publication date
CN103531199B (en) 2016-03-09

Similar Documents

Publication Publication Date Title
CN103531199B (en) Based on the ecological that rapid sparse decomposition and the degree of depth learn
CN103474066B (en) Based on the ecological of multi-band signal reconstruct
CN109767759A (en) End-to-end speech recognition methods based on modified CLDNN structure
CN112364779B (en) Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion
CN108053836B (en) Audio automatic labeling method based on deep learning
Mitra et al. Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition
CN106228977B (en) Multi-mode fusion song emotion recognition method based on deep learning
CN100411011C (en) Pronunciation quality evaluating method for language learning machine
CN107680582A (en) Acoustic training model method, audio recognition method, device, equipment and medium
Tan et al. Cluster adaptive training for deep neural network
CN108694951B (en) Speaker identification method based on multi-stream hierarchical fusion transformation characteristics and long-and-short time memory network
CN104035996B (en) Field concept abstracting method based on Deep Learning
CN104751228A (en) Method and system for constructing deep neural network
CN110349597B (en) Voice detection method and device
CN106782511A (en) Amendment linear depth autoencoder network audio recognition method
CN110289002B (en) End-to-end speaker clustering method and system
CN109448749A (en) Voice extraction method, the system, device paid attention to based on the supervised learning sense of hearing
CN104424943A (en) A speech processing system and method
CN110490230A (en) The Acoustic Object recognition methods of confrontation network is generated based on depth convolution
CN108229659A (en) Piano singly-bound voice recognition method based on deep learning
US20200312336A1 (en) Method and apparatus for implementing speaker identification neural network
Mitra et al. Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks
CN106548775A (en) A kind of audio recognition method and system
Bacchiani et al. Context dependent state tying for speech recognition using deep neural network acoustic models
Zhao et al. Speech recognition system based on integrating feature and HMM

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160309

Termination date: 20191011

CF01 Termination of patent right due to non-payment of annual fee