CN103531199A - Ecological sound identification method on basis of rapid sparse decomposition and deep learning - Google Patents
Ecological sound identification method on basis of rapid sparse decomposition and deep learning Download PDFInfo
- Publication number
- CN103531199A CN103531199A CN201310472330.6A CN201310472330A CN103531199A CN 103531199 A CN103531199 A CN 103531199A CN 201310472330 A CN201310472330 A CN 201310472330A CN 103531199 A CN103531199 A CN 103531199A
- Authority
- CN
- China
- Prior art keywords
- sub
- trannum
- sub trannum
- sound
- sup
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention relates to an ecological sound identification method on the basis of rapid sparse decomposition and deep learning, which is characterized by comprising the following steps: S01, respectively carrying out OMP (Orthogonal Matching Pursuit) sparse decomposition on pure sound and test tape noise and correspondingly outputting reconstruction signals and OMP characteristics of the pure sound and the test tape noise; S02, respectively extracting composite characteristics comprising the OMP characteristics from the pure sound and the test tape noise; S03, carrying out DBN (Dynamic Bayesian Network) model training on the composite characteristics extracted from the reconstructed pure sound; and S04, carrying out DBN model classification on the composite characteristics extracted from the reconstructed test tape noise and the trained pure sound and outputting an ecological sound category which the test tape noise belongs to. The ecological sound identification method has a more obvious effect of improving noise resistance and robustness of a system.
Description
Technical field
The present invention relates to a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt.
Background technology
In recent years, habitat protection is subject to paying close attention to more and more widely, and real-time information is monitored to grasp in some areas large scale deployment.By analyzing and identify the audio-frequency information comprising in ecologic environment, can provide Data support for application such as intrusion detection, species prospectings.In actual environment, ground unrest complicated and changeable is ubiquitous, and therefore, the ecological voice recognition under noise circumstance has important practical significance.
Voice and music assorting recognition technology are more at present, and the research of ambient sound is relatively less.The audio-frequency information difference that varying environment comprises is very large, as dining room, in the noisy environments such as square, is more voice, clash or Che Sheng etc., and the audio frequency in ecologic environment more lays particular emphasis on animal and the natural sound producing.Have at present compared with multi-method for crying as bird or the frog such as is at the improved recognizer of single classification sound, range of application is comparatively limited, such as: the people such as Chen propose frequency domain character multi-grade mean spectrum (Multi-StageAverageSpectrum, MSAS), in conjunction with syllable length, 18 kinds of batrachia sound are carried out to discriminator twice, recognition effect is better than utilizing separately MSAS feature, but for overlapping animal cry, the classification of syllable length is not obviously prove effective; The people such as Lee use gauss hybrid models (GMM) to carry out modeling to spectrum form feature, and continuous type bird is cried and carries out Classification and Identification.The research that also has some multi-class ecological voice recognitions as: the people such as Raju extract fundamental tone, and resonance peak and short-time energy feature set combination supporting vector machine (SVM) carry out Classification and Identification to comprising 19 kinds of animal sounds of cat and dog lion; The people such as Zhang extract improved Mel frequency cepstral coefficient (Mel-FrequencyCepstralCoefficients, MFCCs) as feature and use GMM to identify various insects sound classification.
These methods part that all comes with some shortcomings above, GMM and Hidden Markov Model (HMM) (HMM) are applied comparatively extensive on the structured audios such as voice, and ecological sound randomness is larger, and be not all structurized, so use above-mentioned production model unstable.Discriminative model SVM and some traditional neural networks can be carried out modeling to Nonlinear separability class preferably, but at high dimensional feature and categorical measure when more, classifying quality is not as good as GMM or HMM.
Summary of the invention
In view of this, the object of this invention is to provide a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt.
The present invention adopts following scheme to realize: a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, it is characterized in that, and comprise the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: respectively pure sound and calibration tape noise sound are extracted to the compound characteristics that comprises OMP feature;
S03: the compound characteristics that the pure sound after reconstruct is extracted carries out DBN model training;
S04: the compound characteristics that the pure sound after the calibration tape noise sound after reconstruct and training is extracted carries out DBN category of model, the ecological sound class under output calibration tape noise sound.
In an embodiment of the present invention, suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(g<sub TranNum="64">γ</sub>)<sub TranNum="65">γ ∈ Γ</sub>, time-frequency atom g<sub TranNum="66">γ</sub>be Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g<sub TranNum="67">γ</sub>center, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a<sup TranNum="68">j</sup>, pa<sup TranNum="69">j</sup>Δ u, ka<sup TranNum="70">-j</sup>Δ v, i Δ w), wherein, 0<j≤log<sub TranNum="71">2</sub>n, 0≤p≤N2<sup TranNum="72">-j+1</sup>, 0≤k<2<sup TranNum="73">j+1</sup>, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R
0f=f, iterations k=1, maximum iteration time L;
S012: select the iteration atom g the most relevant to signal residual error the k time from cross complete atom dictionary D
γ k,
S013: judgement || R<sub TranNum="81">k</sub>f||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || R<sub TranNum="82">k</sub>f||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize Gram-Schmidt method by g<sub TranNum="84">γ k</sub>about selecting former subset g<sub TranNum="85">γ p</sub>(0<p≤k) orthogonalization obtains projection P<sub TranNum="86">k</sub>and calculate respectively new approximate reconstruction signal f=P<sub TranNum="87">k</sub>f+R<sub TranNum="88">k</sub>f and residual error R<sub TranNum="89">k</sub>f;
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
In an embodiment of the present invention, described step S012, utilizes GSO to search for optimum atom, and concrete steps comprise:
S001: initialization firefly population scale n, fluorescein l
i, decision domain radius r
0, maximum iteration time t
maxand generate at random firefly;
S002: according to x<sub TranNum="99">i</sub>(t)=(s<sub TranNum="100">i</sub>(t), u<sub TranNum="101">i</sub>(t), v<sub TranNum="102">i</sub>(t), w<sub TranNum="103">i</sub>) and f (x (t)<sub TranNum="104">i</sub>(t))=|<R<sup TranNum="105">k</sup>f,g<sub TranNum="106">γ</sub>(x<sub TranNum="107">i</sub>(t))>| calculate firefly i at the present position of the t time iteration x<sub TranNum="108">i</sub>(t) desired value f (x<sub TranNum="109">i</sub>(t)), and according to l<sub TranNum="110">i</sub>(t) l=(1-ρ)<sub TranNum="111">i</sub>(t-1)+η f (x<sub TranNum="112">i</sub>(t)) be converted into fluorescein value l<sub TranNum="113">i</sub>(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;
S003: every firefly i is in its dynamic decision territory
interior search fluorescein is than self large individuality composition neighborhood collection N
i(t),
R wherein
smaximal value for firefly decision domain;
S004: calculate the individual i of firefly to neighborhood collection N
i(t) probability P that in, arbitrary individual j moves
ij,
S005: adopt roulette wheel selection to choose the individual j that probability is the highest and move as mobile object, and upgrade position
Wherein, s is moving step length;
S006: Regeneration dynamics decision domain radius
Wherein β is for controlling the proportionality constant of neighborhood variation range, n
tthe parameter of controlling firefly number in neighborhood, | N
i(t) | represent firefly number in neighborhood;
S007: if reach maximum iteration time t
max, preserve decomposition result and export atom time and frequency parameter, otherwise returning to step S002.
In an embodiment of the present invention, described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
wherein, the frame index that λ is signal, i is for representing the former subindex of this frame signal, L is atomicity.
In an embodiment of the present invention, choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.
In an embodiment of the present invention, choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.
In an embodiment of the present invention, described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.
In an embodiment of the present invention, RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into solving RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively v<sub TranNum="138">i</sub>and h<sub TranNum="139">j</sub>, it is P (v that each node of visual layers V is put 1 probability<sub TranNum="140">i</sub>=1),<img TranNum="141" file="BDA0000393855300000041.GIF" he="150" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer H<sub TranNum="142">j</sub>=1),<img TranNum="143" file="BDA0000393855300000042.GIF" he="148" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights W<sub TranNum="144">ij</sub>∝<v<sub TranNum="145">i</sub>h<sub TranNum="146">j</sub>><sub TranNum="147">data</sub>-<v<sub TranNum="148">i</sub>h<sub TranNum="149">j</sub>><sub TranNum="150">reconstruct</sub>, wherein,<v<sub TranNum="151">i</sub>h<sub TranNum="152">j</sub>><sub TranNum="153">data</sub>represent known sample collection visual layers node v<sub TranNum="154">i</sub>with the unknown h of hidden node<sub TranNum="155">j</sub>the expectation value of joint probability distribution,<v<sub TranNum="156">i</sub>h<sub TranNum="157">j</sub>><sub TranNum="158">reconstruct</sub>for by known sample information updating Hidden unit, visual layers unit again after reconstruct<v<sub TranNum="159">i</sub>h<sub TranNum="160">j</sub>>expectation value of joint probability distribution.
The present invention is more remarkable to the raising of system noise immunity and robustness, and the OMP denoising based on sparse property has improved the noiseproof feature of ecological voice recognition, than spectrum-subtraction and Wavelet Denoising Method, also has certain superiority under several scenes; Adopt the strategy of the optimum atom of GSO algorithm search can effectively reduce the computation complexity that OMP decomposes.
For making object of the present invention, technical scheme and advantage clearer, below will, by specific embodiment and relevant drawings, the present invention be described in further detail.
Accompanying drawing explanation
Fig. 1 is the ecological sound classification process flow diagram that the present invention is based on OMP.
Fig. 2 is atom time and frequency parameter variation diagram of the present invention.
Fig. 3 is the structural drawing of DBN network of the present invention.
Embodiment
The present invention utilizes rapid sparse to decompose and the method for degree of depth study is carried out ecological voice recognition.First, use orthogonal matching pursuit (OMP) limited number of time Its Sparse Decomposition reconstruct voice signal based on firefly algorithm (GSO), retain high Related Component, the low correlation noise of filtering; Secondly, according to atom Time-Frequency Information and frequency domain information, extract compound anti-noise feature; Finally, in conjunction with dark Belief Network (DBN), ecological sound is carried out to Classification and Identification under varying environment and signal to noise ratio (S/N ratio) situation.Experiment shows, the performance of the sparse denoising of OMP is better than spectrum-subtraction and Wavelet Denoising Method, compare with the method for SVM with at present conventional MFCCs, to ecological sound, the recognition performance under different signal to noise ratio (S/N ratio)s has improvement in various degree and has better noise immunity the method, is especially applicable to using under low signal-to-noise ratio noise situation.
OMP algorithm is compressed sensing (CompressedSensing, CS) a kind of greedy restructing algorithm in process, at match tracing (MatchingPursuit, MP) on algorithm basis, propose, these algorithm improvements are each atom of picking out from dictionary that decomposes, be referred to as optimum atom, first utilize Gram-Schmidt method to carry out orthogonalization process to guarantee the optimality of iteration with selecting atom set, thereby reduce iterations.Under the prerequisite requiring in same precision, use the signal degree of rarefication of OMP algorithm reconstruct higher, speed of convergence is faster, utilizing OMP is the feature of utilizing the sparse property of signal to ecological sound denoising, using useful information to be extracted as sparse composition, and using noise as the residual error composition of removing after sparse composition.Noise has certain randomness, owing to not comprising random atom in dictionary, therefore its correlativity is lower.Theoretical according to CS, band noise tone signal is carried out to low-dimensional projection, when observation dimension enough comprises useful information, noise does not have sparse property.The noise contribution of residual error part cannot recover when reconstruct, thereby realizes the object of denoising.Voice signal is mapped to atom dictionary and decomposes, every take turns decompose obtain and original signal inner product maximum, i.e. the highest atom of the degree of correlation, the atom going out by iterative extraction is more, signal residual error is just less, last weighted array atom obtains the best reconstruct of original signal.
The rarefaction representation that obtains signal from cross complete dictionary is a np hard problem, and because the mistakes completeness of dictionary determines the computation complexity of decomposable process, the while also determines sparse property and the reconstruction accuracy of end product.So in order to guarantee reconstruction signal quality, dictionary Atom number is much larger than signal length, the calculated amount of bringing is thus very huge.Searching for optimum atom is that in decomposable process, calculated amount expends the best part, belongs to optimization problem.Therefore the present invention proposes to utilize firefly group to optimize the optimum atom of (Glowworm SwarmOptimization, GSO) algorithm search, guaranteeing, under the prerequisite of solving precision, to improve search efficiency, realizes fast the Its Sparse Decomposition of voice signal.
As shown in Figure 1, the invention provides a kind of ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, comprise the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: respectively pure sound and calibration tape noise sound are extracted to the compound characteristics that comprises OMP feature;
S03: the compound characteristics that the pure sound after reconstruct is extracted carries out DBN model training;
S04: the compound characteristics that the pure sound after the calibration tape noise sound after reconstruct and training is extracted carries out DBN category of model, the ecological sound class under output calibration tape noise sound.
Suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(g<sub TranNum="177">γ</sub>)<sub TranNum="178">γ ∈ Γ</sub>, time-frequency atom g<sub TranNum="179">γ</sub>be Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g<sub TranNum="180">γ</sub>center, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a<sup TranNum="181">j</sup>, pa<sup TranNum="182">j</sup>Δ u, ka<sup TranNum="183">-j</sup>Δ v, i Δ w), wherein, 0<j≤log<sub TranNum="184">2</sub>n, 0≤p≤N2<sup TranNum="185">-j+1</sup>, 0≤k<2<sup TranNum="186">j+1</sup>, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R
0f=f, iterations k=1, maximum iteration time L;
S012: select the iteration atom g the most relevant to signal residual error the k time from cross complete atom dictionary D
γ k,
S013: judgement || R<sub TranNum="194">k</sub>f||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || R<sub TranNum="195">k</sub>f||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize Gram-Schmidt method by g<sub TranNum="197">γ k</sub>about selecting former subset g<sub TranNum="198">γ p</sub>(0<p≤k) orthogonalization obtains projection P<sub TranNum="199">k</sub>and calculate respectively new approximate reconstruction signal f=P<sub TranNum="200">k</sub>f+R<sub TranNum="201">k</sub>f and residual error R<sub TranNum="202">k</sub>f;
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
Along with the continuous iteration of decomposing is carried out, residue signal energy is constantly decayed, and finally levels off to 0.
According to the description of OMP algorithm, searching for whole dictionary space, to obtain optimum atom be a typical optimization problem.Utilizing GSO to search for optimum atom is by atomic parameter group γ<sub TranNum="208">k</sub>=(s, u, v, w) is as parameter group to be optimized, and corresponding is that firefly i is at the t time residing position x of iteration<sub TranNum="209">i</sub>, and the inner product of atom and residual signals (t) |<R<sup TranNum="210">k</sup>f,g<sub TranNum="211">γ k</sub>>| as desired value function, that corresponding is the desired value f (x by firefly determining positions<sub TranNum="212">i</sub>(t)), and by further calculating fluorescein value l<sub TranNum="213">i</sub>(t).Movement and gathering through firefly, can obtain the position that can obtain maximum fluorescence element, and its physical meaning is optimum atomic parameter.
Utilize GSO to search for optimum atom, concrete steps comprise:
S001: initialization firefly population scale n, fluorescein l
i, decision domain radius r
0, maximum iteration time t
maxand generate at random firefly;
S002: according to x<sub TranNum="220">i</sub>(t)=(s<sub TranNum="221">i</sub>(t), u<sub TranNum="222">i</sub>(t), v<sub TranNum="223">i</sub>(t), w<sub TranNum="224">i</sub>) and f (x (t)<sub TranNum="225">i</sub>(t))=|<R<sup TranNum="226">k</sup>f,g<sub TranNum="227">γ</sub>(xi (t))>| calculate firefly i at the present position of the t time iteration x<sub TranNum="228">i</sub>(t) desired value f (x<sub TranNum="229">i</sub>(t)), and according to l<sub TranNum="230">i</sub>(t) l=(1-ρ)<sub TranNum="231">i</sub>(t-1)+η f (x<sub TranNum="232">i</sub>(t)) be converted into fluorescein value l<sub TranNum="233">i</sub>(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;
S003: every firefly i is in its dynamic decision territory
interior search fluorescein is than self large individuality composition neighborhood collection N
i(t),
R wherein
smaximal value for firefly decision domain;
S004: calculate the individual i of firefly to neighborhood collection N
i(t) probability P that in, arbitrary individual j moves
ij,
S005: adopt roulette wheel selection to choose the individual j that probability is the highest and move as mobile object, and upgrade position
Wherein, s is moving step length;
S006: Regeneration dynamics decision domain radius
Wherein β is for controlling the proportionality constant of neighborhood variation range, n
tthe parameter of controlling firefly number in neighborhood, | N
i(t) | represent firefly number in neighborhood;
S007: if reach maximum iteration time t
max, preserve decomposition result and export atom time and frequency parameter, otherwise returning to step S002.
As shown in Figure 2, Fig. 2 is that single is searched in optimum atom process, and the atomic parameter group medium frequency factor v that firefly is corresponding and phase factor w assemble to the position that can obtain higher desired value.
The Gabor atom that OMP decomposed signal obtains is that the Gauss function by a modulation forms, and because Gauss type function all localizes in time domain and frequency domain, its local characteristics has guaranteed that atom time and frequency parameter can portray the non-stationary time-varying characteristics of signal preferably.
Described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
wherein, the frame index that λ is signal, i is for representing the former subindex of this frame signal, L is atomicity.
For voice signal, along with support set atom number increases, reconstruction accuracy will constantly promote.But degree of rarefication is too high, can bring new problem again, also reconstruct together of low relevant noise during later stage reconstruct, makes system recognition rate not along with the proportional growth of increase of atom number.Guaranteeing under the prerequisite of reconstruction accuracy, the present invention determines that through experiment front 20 atoms of Its Sparse Decomposition are reconstructed best results.Because the degree of rarefication of alternative sounds and noise is not identical, utilize fixing degree of rarefication to be reconstructed and to have certain drawback all sound, so use separately the recognition effect of OMP time-frequency characteristics unsatisfactory.In order to overcome this problem, the present invention chooses MFCCs and fundamental frequency (PITCH) supplements the use of OMP feature.
Wherein, choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.
Choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.In conjunction with the compound anti-noise time-frequency characteristics of above three kinds of latent structures, jointly for portraying ecological sound.
DBN has the feedforward network (Back-Propagation, BP) of supervision to form by some layer unsupervised limited Boltzmann machines (RestrictedBoltzmannMachines, RBM) and one deck, and its structure as shown in Figure 3.
DBN model training comprises two steps, and the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.
RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into the solving of RBM parameter, supposes that the nodal value of visual layers and hidden layer is respectively v<sub TranNum="262">i</sub>and h<sub TranNum="263">j</sub>, it is P (v that each node of visual layers V is put 1 probability<sub TranNum="264">i</sub>=1),<img TranNum="265" file="BDA0000393855300000081.GIF" he="152" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer H<sub TranNum="266">j</sub>=1),<img TranNum="267" file="BDA0000393855300000082.GIF" he="153" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights W<sub TranNum="268">ij</sub>∝<v<sub TranNum="269">i</sub>h<sub TranNum="270">j</sub>><sub TranNum="271">data</sub>-<v<sub TranNum="272">i</sub>h<sub TranNum="273">j</sub>><sub TranNum="274">reconstruct</sub>, wherein,<v<sub TranNum="275">i</sub>h<sub TranNum="276">j</sub>><sub TranNum="277">data</sub>represent known sample collection visual layers node v<sub TranNum="278">i</sub>with the unknown h of hidden node<sub TranNum="279">j</sub>the expectation value of joint probability distribution,<v<sub TranNum="280">i</sub>h<sub TranNum="281">j</sub>><sub TranNum="282">reconstruct</sub>for by known sample information updating Hidden unit, visual layers unit again after reconstruct<v<sub TranNum="283">i</sub>h<sub TranNum="284">j</sub>>expectation value of joint probability distribution.
Above-listed preferred embodiment; the object, technical solutions and advantages of the present invention are further described; institute is understood that; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention; within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (8)
1. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt, is characterized in that, comprises the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: respectively pure sound and calibration tape noise sound are extracted to the compound characteristics that comprises OMP feature;
S03: the compound characteristics that the pure sound after reconstruct is extracted carries out DBN model training;
S04: the compound characteristics that the pure sound after the calibration tape noise sound after reconstruct and training is extracted carries out DBN category of model, the ecological sound class under output calibration tape noise sound.
2. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, is characterized in that, suppose signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, first constructs complete atom dictionary D=(g<sub TranNum="295">γ</sub>)<sub TranNum="296">γ ∈ Γ</sub>, time-frequency atom g<sub TranNum="297">γ</sub>be Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g<sub TranNum="298">γ</sub>center, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a<sup TranNum="299">j</sup>, pa<sup TranNum="300">j</sup>Δ u, ka<sup TranNum="301">-j</sup>Δ v, i Δ w), wherein, 0<j≤log<sub TranNum="302">2</sub>n, 0≤p≤N2<sup TranNum="303">-j+1</sup>, 0≤k<2<sup TranNum="304">j+1</sup>, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R
0f=f, iterations k=1, maximum iteration time L;
S012: select the iteration atom g the most relevant to signal residual error the k time from cross complete atom dictionary D
γ k,
S013: judgement || R<sub TranNum="312">k</sub>f||<ε, (ε>0) whether set up, the residue signal threshold value of ε for setting, if || R<sub TranNum="313">k</sub>f||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize Gram-Schmidt method by g<sub TranNum="315">γ k</sub>about selecting former subset g<sub TranNum="316">γ p</sub>(0<p≤k) orthogonalization obtains projection P<sub TranNum="317">k</sub>and calculate respectively new approximate reconstruction signal f=P<sub TranNum="318">k</sub>f+R<sub TranNum="319">k</sub>f and residual error R<sub TranNum="320">k</sub>f;
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
3. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 2, is characterized in that, described step S012, utilizes GSO to search for optimum atom, and concrete steps comprise:
S001: initialization firefly population scale n, fluorescein l
i, decision domain radius r
0, maximum iteration time t
maxand generate at random firefly;
S002: according to x<sub TranNum="331">i</sub>(t)=(s<sub TranNum="332">i</sub>(t), u<sub TranNum="333">i</sub>(t), v<sub TranNum="334">i</sub>(t), w<sub TranNum="335">i</sub>) and f (x (t)<sub TranNum="336">i</sub>(t))=|<R<sup TranNum="337">k</sup>f,g<sub TranNum="338">γ</sub>(x<sub TranNum="339">i</sub>(t))>| calculate firefly i at the present position of the t time iteration x<sub TranNum="340">i</sub>(t) desired value f (x<sub TranNum="341">i</sub>(t)), and according to l<sub TranNum="342">i</sub>(t) l=(1-ρ)<sub TranNum="343">i</sub>(t-1)+η f (x<sub TranNum="344">i</sub>(t)) be converted into fluorescein value l<sub TranNum="345">i</sub>(t), wherein, ρ ∈ (0,1) is fluorescein disappearance rate, and η ∈ (0,1) is fluorescein turnover rate;
S003: every firefly i is in its dynamic decision territory
interior search fluorescein forms neighborhood collection Ni (t) than self large individuality,
R wherein
smaximal value for firefly decision domain;
S004: calculate the individual i of firefly to neighborhood collection N
i(t) probability P that in, arbitrary individual j moves
ij,
S005: adopt roulette wheel selection to choose the individual j that probability is the highest and move as mobile object, and upgrade position
Wherein, s is moving step length;
S006: Regeneration dynamics decision domain radius
Wherein β is for controlling the proportionality constant of neighborhood variation range, n
tthe parameter of controlling firefly number in neighborhood, | N
i(t) | represent firefly number in neighborhood;
S007: if reach maximum iteration time t
max, preserve decomposition result and export atom time and frequency parameter, otherwise returning to step S002.
4. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, is characterized in that, described step S02 is specially: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting OMP feature is specially utilizes OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
wherein, the frame index that λ is signal, i is for representing the former subindex of this frame signal, L is atomicity.
5. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 4, it is characterized in that: choose MFCCs and supplement the use of OMP feature, first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add logarithm energy as its 13rd dimensional feature.
6. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 4, is characterized in that: choose PITCH and supplement the use of OMP feature, adopt circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.
7. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 1, it is characterized in that: described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, makes specific features abstract gradually like this; Second step is used correct markup information to have the BP network of supervision, and is transmitted to every one deck RBM and finely tunes update information is top-down.
8. the ecological sound identification method based on rapid sparse decomposes and the degree of depth is learnt according to claim 7, it is characterized in that: RBM network using ContrastiveDivergence criterion is as self-training strategy, every layer forms by a visual layers V and hidden layer H, by bottom-up interlayer weighting, connect a plurality of RBM of combination, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into solving RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively v<sub TranNum="374">i</sub>and h<sub TranNum="375">j</sub>, it is P (v that each node of visual layers V is put 1 probability<sub TranNum="376">i</sub>=1),<img TranNum="377" file="FDA0000393855290000031.GIF" he="139" id="ifm0010" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>in like manner, to put 1 probability be P (h to each node of hidden layer H<sub TranNum="378">j</sub>=1),<img TranNum="379" file="FDA0000393855290000032.GIF" he="138" id="ifm0011" img-content="drawing" img-format="GIF" inline="yes" orientation="portrait" wi="700"/>the update rule Δ w of weights W<sub TranNum="380">ij</sub>∝<v<sub TranNum="381">i</sub>h<sub TranNum="382">j</sub>><sub TranNum="383">data</sub>-<v<sub TranNum="384">i</sub>h<sub TranNum="385">j</sub>><sub TranNum="386">reconstruct</sub>, wherein,<v<sub TranNum="387">i</sub>h<sub TranNum="388">j</sub>><sub TranNum="389">data</sub>represent known sample collection visual layers node v<sub TranNum="390">i</sub>with the unknown h of hidden node<sub TranNum="391">j</sub>the expectation value of joint probability distribution,<v<sub TranNum="392">i</sub>h<sub TranNum="393">j</sub>><sub TranNum="394">reconstruct</sub>for by known sample information updating Hidden unit, visual layers unit is the v after reconstruct again<sub TranNum="395">i</sub>h<sub TranNum="396">j</sub>the expectation value of joint probability distribution.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310472330.6A CN103531199B (en) | 2013-10-11 | 2013-10-11 | Based on the ecological that rapid sparse decomposition and the degree of depth learn |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310472330.6A CN103531199B (en) | 2013-10-11 | 2013-10-11 | Based on the ecological that rapid sparse decomposition and the degree of depth learn |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103531199A true CN103531199A (en) | 2014-01-22 |
CN103531199B CN103531199B (en) | 2016-03-09 |
Family
ID=49933152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310472330.6A Expired - Fee Related CN103531199B (en) | 2013-10-11 | 2013-10-11 | Based on the ecological that rapid sparse decomposition and the degree of depth learn |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103531199B (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
CN104157290A (en) * | 2014-08-19 | 2014-11-19 | 大连理工大学 | Speaker recognition method based on depth learning |
CN104464727A (en) * | 2014-12-11 | 2015-03-25 | 福州大学 | Single-channel music singing separation method based on deep belief network |
CN104850837A (en) * | 2015-05-18 | 2015-08-19 | 西南交通大学 | Handwritten character recognition method |
CN105551503A (en) * | 2015-12-24 | 2016-05-04 | 武汉大学 | Audio matching tracking method based on atom pre-selection and system thereof |
CN105654964A (en) * | 2016-01-20 | 2016-06-08 | 司法部司法鉴定科学技术研究所 | Recording audio device source determination method and device |
CN106059971A (en) * | 2016-07-07 | 2016-10-26 | 西北工业大学 | Sparse reconstruction based correlation detection method under signal correlation attenuation condition |
WO2017076211A1 (en) * | 2015-11-05 | 2017-05-11 | 阿里巴巴集团控股有限公司 | Voice-based role separation method and device |
CN107039036A (en) * | 2017-02-17 | 2017-08-11 | 南京邮电大学 | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network |
CN107293301A (en) * | 2017-05-27 | 2017-10-24 | 深圳大学 | Recognition methods and system based on dental articulation sound |
CN107464556A (en) * | 2016-06-02 | 2017-12-12 | 国家计算机网络与信息安全管理中心 | A kind of audio scene recognition method based on sparse coding |
CN107729381A (en) * | 2017-09-15 | 2018-02-23 | 广州嘉影软件有限公司 | Interactive multimedia resource polymerization method and system based on multidimensional characteristic identification |
CN109682892A (en) * | 2018-12-26 | 2019-04-26 | 西安科技大学 | A kind of signal based on time frequency analysis removes drying method |
CN109862518A (en) * | 2019-01-11 | 2019-06-07 | 福州大学 | It is a kind of that equipment localization method is exempted from based on sparse analytic modell analytical model altogether |
CN111507321A (en) * | 2020-07-01 | 2020-08-07 | 中国地质大学(武汉) | Training method, classification method and device of multi-output land cover classification model |
CN112885357A (en) * | 2021-01-13 | 2021-06-01 | 上海英粤汽车科技有限公司 | Method for recognizing animal category through voice |
CN113238189A (en) * | 2021-05-24 | 2021-08-10 | 清华大学 | Sound source identification method and system based on array measurement and sparse prior information |
CN113470654A (en) * | 2021-06-02 | 2021-10-01 | 国网浙江省电力有限公司绍兴供电公司 | Voiceprint automatic identification system and method |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102592593A (en) * | 2012-03-31 | 2012-07-18 | 山东大学 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
KR101232707B1 (en) * | 2012-02-07 | 2013-02-13 | 고려대학교 산학협력단 | Apparatus and method for reconstructing signal using compressive sensing algorithm |
-
2013
- 2013-10-11 CN CN201310472330.6A patent/CN103531199B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101232707B1 (en) * | 2012-02-07 | 2013-02-13 | 고려대학교 산학협력단 | Apparatus and method for reconstructing signal using compressive sensing algorithm |
CN102592593A (en) * | 2012-03-31 | 2012-07-18 | 山东大学 | Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech |
Non-Patent Citations (2)
Title |
---|
李雨昕: "《语音信号MP稀疏分解快速算法及在语音识别中的初步应用》", 《中国优秀硕士学位论文全文数据库》, 30 October 2009 (2009-10-30) * |
邵君: "《基于MP的信号稀疏分解算法研究》", 《中国优秀硕士学位论文全文数据库》, 5 March 2007 (2007-03-05) * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
CN104113789B (en) * | 2014-07-10 | 2017-04-12 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
CN104157290A (en) * | 2014-08-19 | 2014-11-19 | 大连理工大学 | Speaker recognition method based on depth learning |
CN104157290B (en) * | 2014-08-19 | 2017-10-24 | 大连理工大学 | A kind of method for distinguishing speek person based on deep learning |
CN104464727A (en) * | 2014-12-11 | 2015-03-25 | 福州大学 | Single-channel music singing separation method based on deep belief network |
CN104850837B (en) * | 2015-05-18 | 2017-12-05 | 西南交通大学 | The recognition methods of handwriting |
CN104850837A (en) * | 2015-05-18 | 2015-08-19 | 西南交通大学 | Handwritten character recognition method |
WO2017076211A1 (en) * | 2015-11-05 | 2017-05-11 | 阿里巴巴集团控股有限公司 | Voice-based role separation method and device |
CN105551503A (en) * | 2015-12-24 | 2016-05-04 | 武汉大学 | Audio matching tracking method based on atom pre-selection and system thereof |
CN105551503B (en) * | 2015-12-24 | 2019-03-01 | 武汉大学 | Based on the preselected Audio Matching method for tracing of atom and system |
CN105654964A (en) * | 2016-01-20 | 2016-06-08 | 司法部司法鉴定科学技术研究所 | Recording audio device source determination method and device |
CN107464556A (en) * | 2016-06-02 | 2017-12-12 | 国家计算机网络与信息安全管理中心 | A kind of audio scene recognition method based on sparse coding |
CN106059971A (en) * | 2016-07-07 | 2016-10-26 | 西北工业大学 | Sparse reconstruction based correlation detection method under signal correlation attenuation condition |
CN107039036A (en) * | 2017-02-17 | 2017-08-11 | 南京邮电大学 | A kind of high-quality method for distinguishing speek person based on autocoding depth confidence network |
CN107039036B (en) * | 2017-02-17 | 2020-06-16 | 南京邮电大学 | High-quality speaker recognition method based on automatic coding depth confidence network |
CN107293301A (en) * | 2017-05-27 | 2017-10-24 | 深圳大学 | Recognition methods and system based on dental articulation sound |
CN107729381B (en) * | 2017-09-15 | 2020-05-08 | 广州嘉影软件有限公司 | Interactive multimedia resource aggregation method and system based on multi-dimensional feature recognition |
CN107729381A (en) * | 2017-09-15 | 2018-02-23 | 广州嘉影软件有限公司 | Interactive multimedia resource polymerization method and system based on multidimensional characteristic identification |
CN109682892A (en) * | 2018-12-26 | 2019-04-26 | 西安科技大学 | A kind of signal based on time frequency analysis removes drying method |
CN109682892B (en) * | 2018-12-26 | 2021-07-09 | 西安科技大学 | Signal denoising method based on time-frequency analysis |
CN109862518A (en) * | 2019-01-11 | 2019-06-07 | 福州大学 | It is a kind of that equipment localization method is exempted from based on sparse analytic modell analytical model altogether |
CN109862518B (en) * | 2019-01-11 | 2021-05-18 | 福州大学 | Equipment-free positioning method based on common sparse analysis model |
CN111507321A (en) * | 2020-07-01 | 2020-08-07 | 中国地质大学(武汉) | Training method, classification method and device of multi-output land cover classification model |
CN112885357A (en) * | 2021-01-13 | 2021-06-01 | 上海英粤汽车科技有限公司 | Method for recognizing animal category through voice |
CN113238189A (en) * | 2021-05-24 | 2021-08-10 | 清华大学 | Sound source identification method and system based on array measurement and sparse prior information |
CN113238189B (en) * | 2021-05-24 | 2023-03-10 | 清华大学 | Sound source identification method and system based on array measurement and sparse prior information |
CN113470654A (en) * | 2021-06-02 | 2021-10-01 | 国网浙江省电力有限公司绍兴供电公司 | Voiceprint automatic identification system and method |
Also Published As
Publication number | Publication date |
---|---|
CN103531199B (en) | 2016-03-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103531199B (en) | Based on the ecological that rapid sparse decomposition and the degree of depth learn | |
CN103474066B (en) | Based on the ecological of multi-band signal reconstruct | |
CN109767759A (en) | End-to-end speech recognition methods based on modified CLDNN structure | |
CN112364779B (en) | Underwater sound target identification method based on signal processing and deep-shallow network multi-model fusion | |
CN108053836B (en) | Audio automatic labeling method based on deep learning | |
Mitra et al. | Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition | |
CN106228977B (en) | Multi-mode fusion song emotion recognition method based on deep learning | |
CN100411011C (en) | Pronunciation quality evaluating method for language learning machine | |
CN107680582A (en) | Acoustic training model method, audio recognition method, device, equipment and medium | |
Tan et al. | Cluster adaptive training for deep neural network | |
CN108694951B (en) | Speaker identification method based on multi-stream hierarchical fusion transformation characteristics and long-and-short time memory network | |
CN104035996B (en) | Field concept abstracting method based on Deep Learning | |
CN104751228A (en) | Method and system for constructing deep neural network | |
CN110349597B (en) | Voice detection method and device | |
CN106782511A (en) | Amendment linear depth autoencoder network audio recognition method | |
CN110289002B (en) | End-to-end speaker clustering method and system | |
CN109448749A (en) | Voice extraction method, the system, device paid attention to based on the supervised learning sense of hearing | |
CN104424943A (en) | A speech processing system and method | |
CN110490230A (en) | The Acoustic Object recognition methods of confrontation network is generated based on depth convolution | |
CN108229659A (en) | Piano singly-bound voice recognition method based on deep learning | |
US20200312336A1 (en) | Method and apparatus for implementing speaker identification neural network | |
Mitra et al. | Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks | |
CN106548775A (en) | A kind of audio recognition method and system | |
Bacchiani et al. | Context dependent state tying for speech recognition using deep neural network acoustic models | |
Zhao et al. | Speech recognition system based on integrating feature and HMM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160309 Termination date: 20191011 |
|
CF01 | Termination of patent right due to non-payment of annual fee |