CN103474066A - Ecological voice recognition method based on multiband signal reconstruction - Google Patents

Ecological voice recognition method based on multiband signal reconstruction Download PDF

Info

Publication number
CN103474066A
CN103474066A CN2013104723429A CN201310472342A CN103474066A CN 103474066 A CN103474066 A CN 103474066A CN 2013104723429 A CN2013104723429 A CN 2013104723429A CN 201310472342 A CN201310472342 A CN 201310472342A CN 103474066 A CN103474066 A CN 103474066A
Authority
CN
China
Prior art keywords
signal
noise
omp
reconstruct
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013104723429A
Other languages
Chinese (zh)
Other versions
CN103474066B (en
Inventor
李应
欧阳桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201310472342.9A priority Critical patent/CN103474066B/en
Publication of CN103474066A publication Critical patent/CN103474066A/en
Application granted granted Critical
Publication of CN103474066B publication Critical patent/CN103474066B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention relates to an ecological voice recognition method based on multiband signal reconstruction. The ecological voice recognition method comprises the steps of: firstly, using OMP (Orthogonal Matching Pursuit) sparse decomposition as a first-stage reconstruction, and reserving a main body structure of foreground voice; secondly, allocating remained components decomposed in the former stage according to bands, and carrying out adaptive compensation on reconstruction signals according to the frequency distribution of the foreground voice and background noise to complete a second-stage reconstruction; finally, extracting compound noise-proof characteristics according to atom time-frequency information and frequency-domain information in a support set, and carrying out classification and recognition on ecologic voice by using a high-credibility network under different environments and signal to noise ratio conditions. According to the ecological voice recognition method, the noise can be inhibited by adopting two times of reconstruction, and the reconstruction precision of the foreground voice is improved; better noise robustness is achieved under a natural environment.

Description

Ecological sound identification method based on multi-band signal reconstruct
Technical field
The present invention relates to a kind of ecological sound identification method based on multi-band signal reconstruct.
Background technology
Ecological voice recognition is extracted feature and is done identification various voice signals in physical environment.By the audio-frequency information comprised in analysis and environment-identification, can be for intrusion detection, species prospecting etc.In actual environment, a large amount of nonstationary noises produces and disturbs voice recognition.Therefore, the ecological voice recognition of anti-noise has important practical significance.
During the current audio signal is processed, voice are controlled with speaker Recognition Technology relatively many, and the research of ecologic environment sound is relatively less.That commonly used is frequency domain character Mel frequency cepstral coefficient (Mel-FrequencyCepstralCoefficients, MFCCs) and the short time discrete Fourier transform of time-frequency domain and wavelet transformation etc., carry out discriminator in conjunction with gauss hybrid models (GMM) or Hidden Markov Model (HMM) (HMM).Because ecological sound randomness is large and be not all structurized, so above method is not necessarily effective to it.In order to address the above problem, some new work are suggested, and such as: people such as Khunarsal, propose to utilize the sonograph method for mode matching to be identified in conjunction with the KNN sorter ambient sound in short-term; The people such as Zhang use improved MFCCs as feature and use GMM to identify the insect sound classification; The people such as Lee use the spectrum form feature to carry out modeling, and the continuous type bird is cried and carries out Classification and Identification; The people such as Raju extract fundamental tone, and resonance peak and short-time energy feature set combination supporting vector machine (SVM) carry out Classification and Identification to 19 kinds of animal sounds that comprise the cat and dog lion.
The FAQs of identifying ecological sound existence with said method is, faces the voice signal of uncertain structure, designs suitable sorter more difficult.Discriminative model, as support vector machine (SVM) and traditional neural network etc., can carry out modeling to the Nonlinear separability class preferably, but at high dimensional feature and categorical measure when more, classifying quality is not as good as GMM or HMM.In addition, under noise circumstance, especially recognition capability rapid drawdown during low signal-to-noise ratio.Denoising method commonly used has spectrum-subtraction, Wiener filtering etc. at present.Thereby spectrum deducts the easily introducing music noise of making an uproar causes distorted signals.Filtering and noise reduction can be realized optimal filtering under the prerequisite of picked up signal and noise statistics, but noise is complicated and changeable in physical environment, and these prior imformations often can't obtain, so range of application is comparatively limited.
Based on match tracing (MatchingPursuit, MP) denoising method of reconstruction signal is to utilize the sparse property of sound, signal decomposition reconstruct is carried out to self-adaptation to be meaned, do not need the acquisition signal to be detected of priori and the statistical property of noise, therefore can be applicable to different scene multi-signals.Yet in actual applications, signal and noise can overlap, reduce as much as possible noise and be that to increase distorted signals be cost, so denoise algorithm must be weighed reducing between noise and distorted signals.Yet, simply utilize the sparse denoising of MP also to have some limitations.In the MP decomposable process, higher from the computation complexity of crossing the optimum atom of complete dictionary space search.Existing way is the restriction dictionary size, or obtains the atom high with the original signal degree of correlation by intelligent algorithm when reducing the decomposition number of times as far as possible.But not noise entirely in the residual components after reconstruction signal, also comprise effectively sound of part.If increase merely the decomposition number of times in order to improve reconstruction accuracy, increased new calculated amount on the one hand, also can't suppress noise on the other hand, follow-up recognition effect is poor.
Summary of the invention
In view of this, the purpose of this invention is to provide a kind of ecological sound identification method based on multi-band signal reconstruct.
The present invention adopts following scheme to realize: a kind of ecological sound identification method based on multi-band signal reconstruct, it is characterized in that, and comprise the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to the OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: pure sound is extracted and comprises the compound characteristics of OMP feature and carry out the DBN model training;
S03: extraction calibration tape noise sound carries out the power spectrum of the residue signal after the OMP Its Sparse Decomposition and carries out the multiband compensation;
S04: extract the power spectrum that calibration tape noise sound carries out the reconstruction signal after the OMP Its Sparse Decomposition, and carry out secondary reconstruct in conjunction with the power spectrum that carries out the residue signal after the multiband compensation in described step S03;
S05: the signal after secondary reconstruct in described step S04 is extracted to the compound characteristics that comprises the OMP feature;
S06: the feature that meets that comprises the OMP feature of extracting in the compound characteristics that carries out in described step S02 extracting after the DBN model training and described step S05 is carried out to the DBN category of model, the ecological sound class under output calibration tape noise sound.
In an embodiment of the present invention, suppose band noise tone signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, at first constructs complete atom dictionary D=(g γ) γ ∈ Γ, time-frequency atom g γbe the Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g γcenter, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a j, pa jΔ u, ka -jΔ v, i Δ w), wherein, 0<j≤log 2n, 0≤p≤N2 -j+1, 0≤k<2 j+1, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R 0y'=f, iterations k=1, maximum iteration time L;
S012: from cross complete atom dictionary D, select the iteration atom g the most relevant to the signal residual error the k time γ k, | < R k y &prime; , g &gamma;k > | &GreaterEqual; &alpha; sup &gamma; &Element; &Gamma; | < R k y &prime; , g &gamma; > | , 0 < &alpha; &le; 1 ;
S013: judgement || R ky'||<ε, ε>0 whether set up, the residue signal threshold value of ε for setting, if || R ky'||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize the Gram-Schmidt method by g γ kabout selecting former subset g γ p, 0<p≤k orthogonalization obtains projection P kand calculate respectively new approximate reconstruction signal y'=P kf and residual error R k+1y'=f-y';
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
Figure BDA0000393844730000031
In an embodiment of the present invention, described extraction comprises that the compound characteristics concrete grammar of OMP feature is: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting the OMP feature is to utilize OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
Figure BDA0000393844730000032
wherein, the frame index that λ is signal, i is for meaning the former subindex of this frame signal, L is atomicity.
In an embodiment of the present invention, choose MFCCs and supplement the use of OMP feature, at first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add the logarithm energy as its 13rd dimensional feature.
In an embodiment of the present invention, choose PITCH and supplement the use of OMP feature, adopt the circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.
In an embodiment of the present invention, described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, make specific features abstract gradually like this; Second step is used correct markup information that the BP network of supervision is arranged, and is transmitted to every one deck RBM and is finely tuned update information is top-down.
In an embodiment of the present invention, RBM network using ContrastiveDivergence criterion is as the self-training strategy, every layer forms by a visual layers V and hidden layer H, connect a plurality of RBM of combination by bottom-up interlayer weighting, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into to solving the RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively v iand h j, each node of visual layers V is put 1 probability and is
Figure BDA0000393844730000033
in like manner, to put 1 probability be P (h to each node of hidden layer H j=1),
Figure BDA0000393844730000041
the update rule Δ w of weights W ij∝<v ih j? data-<v ih j? reconstruct, wherein,<v ih j? datamean known sample collection visual layers node v iwith the unknown h of hidden node jthe expectation value of joint probability distribution,<v ih j? reconstructfor by known sample information updating Hidden unit, the visual layers unit again after reconstruct<v ih jthe expectation value of joint probability distribution.
In an embodiment of the present invention, the distribution of foreground sounds on frequency spectrum is not uniformly, in order to determine its dominant frequency structure, the power spectrum that reconstruct is for the first time obtained | and Y'(λ, j) | 2on average be divided into M linear sub-band, to voiced frame λ, calculate the energy proportion on frequency band i
Figure BDA0000393844730000042
wherein, K is the rank of FFT coefficient, FFT λ, pp the FFT coefficient for frame λ.
In an embodiment of the present invention, determine a threshold gamma, when energy proportion surpasses threshold value, subband i is in the dominant frequency scope, and foreground sounds frequency factor α (λ) sets higher weight, and outside the dominant frequency scope, the weight that respective settings is lower, that is,
Figure BDA0000393844730000043
noise frequency factor-beta (λ) characterizes the degree height of current sub-band noise effect, can utilize the signal of last stage reconstruct to estimate noise as prior imformation, then calculate the power spectrum signal to noise ratio (S/N ratio) of frame λ i subband SNR i ( &lambda; ) = 10 log 10 ( &Sigma; p = K M &CenterDot; ( i - 1 ) K M &CenterDot; i | Y i &prime; ( &lambda; , p ) | 2 &Sigma; p = K M &CenterDot; ( i - 1 ) K M &CenterDot; i | F i ( &lambda; , p ) | 2 - | Y i &prime; ( &lambda; , p ) | 2 ) , The noise frequency factor of frame λ i subband &beta; i ( &lambda; ) = 0.1 , SNR i ( &lambda; ) < 0 0.1 + 0.04 SNR i ( &lambda; ) , 0 &le; SNR i ( &lambda; ) &le; 20 0.9 , SNR i ( &lambda; ) > 20 ; By solving foreground sounds frequency factor α (λ) and noise frequency factor-beta (λ) carries out the multiband gain, obtain the sound power spectrum of reconstruct for the second time | Y (λ, j) | 2≈ | Y (λ, j) | 2=| Y'(λ, j) | 2+ α (λ) β (λ) (| F (λ, j) | 2-| Y'(λ, j) | 2), when the foreground sounds power spectrum of reconstruct surpasses former noise sound power spectrum, use
Figure BDA0000393844730000046
upgraded.
The present invention adopts secondary reconstruct can not only suppress noise, and has improved the reconstruction accuracy to foreground sounds.With Mel frequency cepstral coefficient (MFCC) at present commonly used, with the method for SVM, compare, the method has noise robustness preferably under physical environment.
For making purpose of the present invention, technical scheme and advantage clearer, below will, by specific embodiment and relevant drawings, the present invention be described in further detail.
The accompanying drawing explanation
Fig. 1 the present invention is based on OMP multi-band signal reconstruct process flow diagram.
Fig. 2 a is the oscillogram of pure thrush cry.
Fig. 2 b is the sonograph of pure thrush cry.
Fig. 2 c is that Fig. 2 a adds the oscillogram that signal to noise ratio (S/N ratio) is 10dB flowing water noise.
Fig. 2 d is that Fig. 2 b adds the sonograph that signal to noise ratio (S/N ratio) is 10dB flowing water noise.
Fig. 2 e is the reconstruct sonograph that Fig. 2 d degree of rarefication is 10.
Fig. 2 f is the reconstruct sonograph that Fig. 2 d degree of rarefication is 30.
Fig. 2 g is the oscillogram of secondary reconstruct.
Fig. 2 h is the sonograph of secondary reconstruct.
Fig. 3 is DBN discriminator process flow diagram of the present invention.
Embodiment
The present invention proposes a kind of ecological sound identification method based on multi-band signal reconstruct, and built the Classification and Identification framework based on degree of depth study.At first, use the OMP Its Sparse Decomposition to do first stage reconstruct, retain the agent structure of foreground sounds; Secondly, the residual components that will decompose the last stage, by frequency band division, according to the frequency distribution of foreground sounds and ground unrest, is carried out adaptive equalization to reconstruction signal, completes subordinate phase reconstruct; Finally, according to support set atom Time-Frequency Information and frequency domain information, extract compound anti-noise feature, use dark Belief Network (DBN) to carry out Classification and Identification to ecological sound under varying environment and signal to noise ratio (S/N ratio) situation.As shown in Figure 1, specifically comprise the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to the OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: pure sound is extracted and comprises the compound characteristics of OMP feature and carry out the DBN model training;
S03: extraction calibration tape noise sound carries out the power spectrum of the residue signal after the OMP Its Sparse Decomposition and carries out the multiband compensation;
S04: extract the power spectrum that calibration tape noise sound carries out the reconstruction signal after the OMP Its Sparse Decomposition, and carry out secondary reconstruct in conjunction with the power spectrum that carries out the residue signal after the multiband compensation in described step S03;
S05: the signal after secondary reconstruct in described step S04 is extracted to the compound characteristics that comprises the OMP feature;
S06: the feature that meets that comprises the OMP feature of extracting in the compound characteristics that carries out in described step S02 extracting after the DBN model training and described step S05 is carried out to the DBN category of model, the ecological sound class under output calibration tape noise sound.
The OMP algorithm is compressed sensing (CompressedSensing, CS) a kind of greedy restructing algorithm in process, at match tracing (MatchingPursuit, MP) on the algorithm basis, propose, these algorithm improvements are each atom of picking out from dictionary that decomposes, be referred to as optimum atom, first utilize the Gram-Schmidt method to carry out orthogonalization process to guarantee the optimality of iteration with selecting atom set, thereby reduce iterations.Under the prerequisite required in same precision, use the signal degree of rarefication of OMP algorithm reconstruct higher, speed of convergence is faster, utilizing OMP is the feature of utilizing the sparse property of signal to ecological sound denoising, using useful information to be extracted as sparse composition, and using noise the residual error composition after removing sparse composition.Noise has certain randomness, owing to not comprising random atom in dictionary, therefore its correlativity is lower.According to the CS theory, band noise tone signal is carried out to the low-dimensional projection, when the observation dimension enough comprises useful information, noise does not have sparse property.The noise contribution of residual error part can't recover when reconstruct, thereby realizes the purpose of denoising.Voice signal is mapped to the atom dictionary and is decomposed, every take turns to decompose obtain and original signal inner product maximum, i.e. the highest atom of the degree of correlation, the atom gone out by iterative extraction is more, the signal residual error is just less, last weighted array atom obtains the best reconstruct of original signal.
Suppose band noise tone signal f to be decomposed, length is N, before carrying out Its Sparse Decomposition, at first constructs complete atom dictionary D=(g γ) γ ∈ Γ, time-frequency atom g γbe the Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g γcenter, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a j, pa jΔ u, ka -jΔ v, i Δ w), wherein, 0<j≤log 2n, 0≤p≤N2 -j+1, 0≤k<2 j+1, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; OMP Its Sparse Decomposition concrete steps comprise:
S011: initializing signal residual error R 0y'=f, iterations k=1, maximum iteration time L;
S012: from cross complete atom dictionary D, select the iteration atom g the most relevant to the signal residual error the k time γ k, | < R k y &prime; , g &gamma;k > | &GreaterEqual; &alpha; sup &gamma; &Element; &Gamma; | < R k y &prime; , g &gamma; > | , 0 < &alpha; &le; 1 ;
S013: judgement || R ky'||<ε, ε>0 whether set up, the residue signal threshold value of ε for setting, if || R ky'||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize the Gram-Schmidt method by g γ kabout selecting former subset g γ p, 0<p≤k orthogonalization obtains projection P kand calculate respectively new approximate reconstruction signal y'=P kf and residual error R k+1y'=f-y';
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
The process that OMP decomposes is to select optimum atom every in taking turns iteration successively according to the height of the size of energy and degree of correlation, and these selecteed optimum atoms form the support set of reconstruction signals.Noise has certain randomness, owing to not comprising random atom in dictionary, therefore its correlativity is lower.For coloured noise, utilize the pure sound principle different with the ground unrest degree of rarefication, according to the CS theory, band noise tone signal is carried out to the low-dimensional projection, when the observation dimension enough comprises useful information, noise does not have sparse property.This has just guaranteed when early stage reconstruct, and the noise contribution of residual error part can't recover, and effectively the agent structure of sound is retained.Voice signal is mapped to the atom dictionary and is decomposed, every take turns to decompose obtain and original signal inner product maximum, the i.e. the highest atom of the degree of correlation.The atom gone out by iterative extraction is more, and the signal residual error is just less, and last weighted array atom obtains the best reconstruct of original signal.
Described extraction comprises that the compound characteristics concrete grammar of OMP feature is: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting the OMP feature is to utilize OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
Figure BDA0000393844730000071
wherein, the frame index that λ is signal, i is for meaning the former subindex of this frame signal, L is atomicity.
Choose MFCCs and supplement the use of OMP feature, at first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add the logarithm energy as its 13rd dimensional feature.
Choose PITCH and supplement the use of OMP feature, adopt the circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.
Described DBN model training comprises two steps, and the first step adopts without supervising the successively strategy of greed and trains in advance, and the state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, make specific features abstract gradually like this; Second step is used correct markup information that the BP network of supervision is arranged, and is transmitted to every one deck RBM and is finely tuned update information is top-down.
RBM network using Contrastive Divergence criterion is as the self-training strategy, every layer forms by a visual layers V and hidden layer H, connect a plurality of RBM of combination by bottom-up interlayer weighting, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into the solving of RBM parameter, supposes that the nodal value of visual layers and hidden layer is respectively v iand h j, it is P (v that each node of visual layers V is put 1 probability i=1),
Figure BDA0000393844730000072
in like manner, to put 1 probability be P (h to each node of hidden layer H j=1),
Figure BDA0000393844730000081
the update rule Δ w of weights W ij∝<v ih j? data-<v ih j? reconstruct, wherein,<v ih j? datamean known sample collection visual layers node v iwith the unknown h of hidden node jthe expectation value of joint probability distribution,<v ih j? reconstructfor by known sample information updating Hidden unit, the visual layers unit is the v after reconstruct again ih jthe expectation value of joint probability distribution.
Suppose that additive noise and prospect sound to be identified are incoherent, be with noise tone signal f (t) to be expressed as f (t)=y (t)+n (t), wherein, t is time index, y (t) is pure foreground sounds, n (t) is ground unrest, and it is F (λ, j) that f (t) is carried out after Fast Fourier Transform (FFT) obtaining amplitude spectrum, wherein λ is frame index, j is frequency indices, power spectrum | F (λ, j) | 2be decomposed into the foreground sounds power spectrum | Y (λ, j) | 2and noise power spectrum | N (λ, j) | 2, that is, | F (λ, j) | 2=| Y (λ, j) | 2+ | N (λ, j) | 2; Band noise tone signal, by the OMP Its Sparse Decomposition, obtains the linear weighted array of front limited atom that the degree of correlation is higher and carries out reconstruct for the first time.Compare the foreground sounds power spectrum of reconstruct with original signal | Y (λ, j) | 2≈ (1-δ (λ)) | Y'(λ, j) | 2+ δ (λ) | F (λ, j) | 2in fact be not complete, can think that signal and the noise of disappearance is present in residual components jointly, wherein, δ (λ), for the gain factor that this paper introduces, characterizes the disappearance amount of λ frame and the proportionate relationship of original signal.Experiment shows, the variation of foreground sounds and noise this ratio of distribution joint effect on frequency spectrum.The remaining component of foreground sounds exists that probability is relative and other are higher in its main frequency distributes (hereinafter referred dominant frequency) scope, and, in the larger frequency band of noise effect, there is the probability less in the remaining component of foreground sounds.Therefore, gain factor can be subdivided into prospect acoustic frequency factor-alpha (λ) and noise frequency factor-beta (λ), that is: δ (λ)=α (λ) β (λ).
The distribution of foreground sounds on frequency spectrum is not uniformly, in order to determine its dominant frequency structure, the power spectrum that reconstruct is for the first time obtained | and Y'(λ, j) | 2on average be divided into M linear sub-band, to voiced frame λ, calculate the energy proportion on frequency band i
Figure BDA0000393844730000082
wherein, K is the rank of FFT coefficient, FFT λ, pp the FFT coefficient for frame λ.
Determine a threshold gamma, when energy proportion surpasses threshold value, subband i is in the dominant frequency scope, and foreground sounds frequency factor α (λ) sets higher weight, and outside the dominant frequency scope, the weight that respective settings is lower, that is,
Figure BDA0000393844730000083
noise frequency factor-beta (λ) characterizes the degree height of current sub-band noise effect, can utilize the signal of last stage reconstruct to estimate noise as prior imformation, then calculate the power spectrum signal to noise ratio (S/N ratio) of frame λ i subband SNR i ( &lambda; ) = 10 log 10 ( &Sigma; p = K M &CenterDot; ( i - 1 ) K M &CenterDot; i | Y i &prime; ( &lambda; , p ) | 2 &Sigma; p = K M &CenterDot; ( i - 1 ) K M &CenterDot; i | F i ( &lambda; , p ) | 2 - | Y i &prime; ( &lambda; , p ) | 2 ) , The noise frequency factor of frame λ i subband &beta; i ( &lambda; ) = 0.1 , SNR i ( &lambda; ) < 0 0.1 + 0.04 SNR i ( &lambda; ) , 0 &le; SNR i ( &lambda; ) &le; 20 0.9 , SNR i ( &lambda; ) > 20 ; By solving foreground sounds frequency factor α (λ) and noise frequency factor-beta (λ) carries out the multiband gain, obtain the sound power spectrum of reconstruct for the second time | Y (λ, j) | 2≈ | Y (λ, j) | 2=| Y'(λ, j) | 2+ α (λ) β (λ) (| F (λ, j) | 2-| Y'(λ, j) | 2), when the foreground sounds power spectrum of reconstruct surpasses former noise sound power spectrum, use
Figure BDA0000393844730000093
upgraded.
The precision of ecological voice recognition, depend on the validity of noise abatement de-noising to a great extent.For nonstationary noise complicated and changeable in ecologic scene, use the method for OMP Its Sparse Decomposition reconstruct band noise tone signal, can retain the agent structure of foreground sounds.For the validity that guarantees that subsequent characteristics is extracted, higher signal reconstruction precision is prerequisite.And improving the signal reconstruction precision, the most direct method is to decompose number of times by increase, has increased computation complexity on the one hand, on the other hand can't the burbling noise composition in restructuring procedure.This paper is used distinguishing extraction component of signal the residual components that the method for multiband compensation decomposes from OMP, for compensating the reconstruction signal of first stage, thereby adaptively carries out secondary reconstruct.Afterwards, extract compound anti-noise time-frequency characteristics for building the DBN model, to ecological sound classification identification, idiographic flow is described below efficiently.
Pre-service and first stage OMP reconstruct:
All sample sounds are done to normalized, adopt the Hamming window to carry out level and smooth rear minute frame, frame length is got 23ms (512 sample points), and frame pipettes 11.6ms (256 sample points).Fig. 2 a and Fig. 2 b are one section thrush sound signal waveform figure and spectrogram of comprising three effective syllables.As example, sneaking into after signal to noise ratio (S/N ratio) is 10dB flowing water noise, from Fig. 2 c and Fig. 2 d, can find out, the distribution of noise on frequency spectrum is not uniformly, and original signal is caused largely and disturbs.According to formula | < R k y &prime; , g &gamma;k > | &GreaterEqual; &alpha; sup &gamma; &Element; &Gamma; | < R k y &prime; , g &gamma; > | , 0 < &alpha; &le; 1 ; y &prime; = P k f , R k + 1 y &prime; = f - y &prime; , y &prime; ( t ) &ap; &Sigma; n = 1 L P n g &gamma;n ( t ) Each frame signal is carried out to reconstruct after Its Sparse Decomposition, and Fig. 2 e and Fig. 2 f are respectively the reconstruction signal spectrograms that degree of rarefication is 10 and 30.Clearly can find out, after degree of rarefication improves, the whole reducing degree of signals with noise has lifting to a certain degree, but noise contribution has inevitably also carried out reconstruct.And the lower reconstruction signal of degree of rarefication, agent structure still retains, and the noise contribution lower with the original signal degree of correlation weakened significantly, and thrush is called incomplete part needs to carry out next step multiband reconstruct.
The reconstruct of subordinate phase multiband:
Frequency distribution according to prospect thrush cry and background flowing water noise, be divided into 8 linear sub-bands by spectrum averaging.The OMP reconstruction signal is done to spectrum analysis, according to formula
Figure BDA0000393844730000101
the dominant frequency band that calculates the thrush cry is 2000Hz-3000Hz, and the residual components in this frequency band will obtain higher weighting compensation, may also be referred to as the part of " more paying attention to ".Otherwise all the other frequency bands can be thought the part of " being left in the basket ".Then, the reconstruction signal that still utilizes the OMP decomposition to obtain, as prior imformation, calculates each sub-band power spectrum signal to noise ratio (S/N ratio).The part that signal to noise ratio (S/N ratio) is high, the higher frequency band of noise energy, further utilize low weights to be weakened.By two stage self-adapting reconstruction, noise obtains the inhibition of higher degree.Fig. 2 g and Fig. 2 h are that the thrush signal passes through two stage self-adapting reconstruction, obtain final signal waveforms and sonograph.Compare Fig. 2 c and Fig. 2 d, also illustrated comparatively effectively noise reduction of multiband self-adapting reconstruction.
Compound characteristics extracts:
The Gabor atom that the present invention chooses is to consist of the Gauss function of modulating.Because Gauss type function all localizes in time domain and frequency domain, its local characteristics has guaranteed that the atom time and frequency parameter can portray the non-stationary time-varying characteristics of signal preferably.By OMP, decompose, in front 10 atom time-frequency parameter group of this segment signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features.Due to the information of the sign original sound that reconstruction signal can not be complete for the first time, so use separately the recognition effect of OMP time-frequency characteristics unsatisfactory.Because there is different pitch period scopes in the animal cry, therefore use fundamental frequency (PITCH), as feature, ecological sound is had to certain differentiation.The present invention, after carrying out the secondary self-adapting reconstruction, uses short-time energy and zero-crossing rate to carry out end-point detection to reconstruction signal, and non-mute frame is extracted MFCCs and forms compound characteristics in conjunction with the OMP feature.
Obtaining of MFCCs feature is divided into following step, at first adopts 24 rank Mel ripple device groups, obtains 12 dimension MFCCs static natures after discrete Fourier transformation (DFT), adds the logarithm energy as its 13rd dimensional feature.In addition, adopt circular AMDF function (CAMDF) method to obtain the 1 dimension PITCH feature that every frame is corresponding.
The process of pre-training DBN model is by the visible layer node state value of the ecological sound characteristic initialization DBN bottom that mark is good, obtain proper vector through the limited Boltzmann machine of unsupervised training (RBM) model successively, as the input value of end BP network.Then, use correct markup information that the BP network of supervision is arranged, the error message backpropagation, to bottom RBM model, is finely tuned to whole DBN model.Idiographic flow as shown in Figure 3.
The classification capacity of DBN is subject to the RBM hidden layer number of plies and each node layer number affects simultaneously.Increase the hidden layer number and can improve the nicety of grading of DBN to proper vector, but learning time also increases thereupon.Best hidden layer number and nodes configuration increase the approximation capability that nodes improves the DBN network, but nodes too much can reduce again the generalization ability of network, so will be determined by experiment.
Above-listed preferred embodiment; the purpose, technical solutions and advantages of the present invention are further described; institute is understood that; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention; within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (9)

1. the ecological sound identification method based on multi-band signal reconstruct, is characterized in that, comprises the following steps:
S01: respectively pure sound and calibration tape noise sound are carried out to the OMP Its Sparse Decomposition, reconstruction signal and the OMP feature of the pure sound of corresponding output and calibration tape noise sound;
S02: pure sound is extracted and comprises the compound characteristics of OMP feature and carry out the DBN model training;
S03: extraction calibration tape noise sound carries out the power spectrum of the residue signal after the OMP Its Sparse Decomposition and carries out the multiband compensation;
S04: extract the power spectrum that calibration tape noise sound carries out the reconstruction signal after the OMP Its Sparse Decomposition, and carry out secondary reconstruct in conjunction with the power spectrum that carries out the residue signal after the multiband compensation in described step S03;
S05: the signal after secondary reconstruct in described step S04 is extracted to the compound characteristics that comprises the OMP feature;
S06: the feature that meets that comprises the OMP feature of extracting in the compound characteristics that carries out in described step S02 extracting after the DBN model training and described step S05 is carried out to the DBN category of model, the ecological sound class under output calibration tape noise sound.
2. the ecological sound identification method based on multi-band signal reconstruct according to claim 1, is characterized in that, supposes band noise tone signal f to be decomposed, and length is N, before carrying out Its Sparse Decomposition, at first constructed complete atom dictionary D=(g γ) γ ∈ Γ, time-frequency atom g γbe the Gabor atom, by parameter group γ=(s, u, v, w) definition, shift factor u defines an atom g γcenter, contraction-expansion factor s, frequency factor v and phase factor w define its waveform, its discretize time and frequency parameter γ=(s, u, v, w)=(a j, pa jΔ u, ka -jΔ v, i Δ w), wherein, 0<j≤log 2n, 0≤p≤N2 -j+1, 0≤k<2 j+1, 0≤i≤12, a=2, Δ u=1/2, Δ v=π, Δ w=π/6; Described step S01 concrete steps comprise:
S011: initializing signal residual error R 0y'=f, iterations k=1, maximum iteration time L;
S012: from cross complete atom dictionary D, select the iteration atom g the most relevant to the signal residual error the k time γ k, | < R k y &prime; , g &gamma;k > | &GreaterEqual; &alpha; sup &gamma; &Element; &Gamma; | < R k y &prime; , g &gamma; > | , 0 < &alpha; &le; 1 ;
S013: judgement || R ky'||<ε, ε>0 whether set up, the residue signal threshold value of ε for setting, if || R ky'||<ε sets up, and goes to step S016 and finishes to decompose, if be false, continues to decompose;
S014: utilize the Gram-Schmidt method by g γ kabout selecting former subset g γ p, 0<p≤k orthogonalization obtains projection P kand calculate respectively new approximate reconstruction signal y'=P kf and residual error R k+1y'=f-y';
S015: if also do not reach maximum iteration time, k=k+1 is set, returns to step S012 and continue iteration, otherwise go to step S016;
S016: by successively decomposing and obtain a series of atoms, export approximate atom expansion the L time
Figure FDA0000393844720000021
3. the ecological sound identification method based on multi-band signal reconstruct according to claim 1, it is characterized in that, described extraction comprises that the compound characteristics concrete grammar of OMP feature is: extract the compound characteristics that comprises OMP feature, MFCCs feature and fundamental tone feature; Wherein, the method for extracting the OMP feature is to utilize OMP to decompose each frame voice signal, and in front L the atom time-frequency parameter group of support set of this frame signal of acquisition expression, average and the standard deviation of contraction-expansion factor s and frequency factor v, form 4 dimension OMP features,
Figure FDA0000393844720000022
wherein, the frame index that λ is signal, i is for meaning the former subindex of this frame signal, L is atomicity.
4. the ecological sound identification method based on multi-band signal reconstruct according to claim 3, it is characterized in that: choose MFCCs and supplement the use of OMP feature, at first adopt 24 rank Mel bank of filters, reconstruction signal is done to obtain 12 dimension MFCCs static natures after discrete Fourier transformation, add the logarithm energy as its 13rd dimensional feature.
5. the ecological sound identification method based on multi-band signal reconstruct according to claim 3, is characterized in that: choose PITCH and supplement the use of OMP feature, adopt the circular AMDF function method to obtain the 1 dimension PITCH feature that every frame is corresponding.
6. the ecological sound identification method based on multi-band signal reconstruct according to claim 1, it is characterized in that: described DBN model training comprises two steps, the first step adopts without supervising the successively strategy of greed and trains in advance, state value by the visible layer node of the ecological sound characteristic initialization DBN bottom that mark is good, make specific features abstract gradually like this; Second step is used correct markup information that the BP network of supervision is arranged, and is transmitted to every one deck RBM and is finely tuned update information is top-down.
7. the ecological sound identification method based on multi-band signal reconstruct according to claim 6, it is characterized in that: RBM network using ContrastiveDivergence criterion is as the self-training strategy, every layer forms by a visual layers V and hidden layer H, connect a plurality of RBM of combination by bottom-up interlayer weighting, input with the output of Hidden unit as upper strata RBM visual layers, thereby build a DBN framework, RBM comprises three parameters, respectively the weights W between visible layer and hidden layer, and amount of bias b and c separately, therefore the process of DBN sorter training is converted into to solving the RBM parameter, the nodal value of supposing visual layers and hidden layer is respectively v iand h j, it is P (v that each node of visual layers V is put 1 probability i=1), in like manner, to put 1 probability be P (h to each node of hidden layer H j=1),
Figure FDA0000393844720000024
the update rule Δ w of weights W ij∝<v ih j? data-<v ih j? reconstruct, wherein,<v ih j? datamean known sample collection visual layers node v iwith the unknown h of hidden node jthe expectation value of joint probability distribution,<v ih j? reconstructfor by known sample information updating Hidden unit, the visual layers unit again after reconstruct<v ih jthe expectation value of joint probability distribution.
8. the ecological sound identification method based on multi-band signal reconstruct according to claim 1, it is characterized in that: the distribution of foreground sounds on frequency spectrum is not uniform, in order to determine its dominant frequency structure, the power spectrum that reconstruct is for the first time obtained | Y'(λ, j) | 2on average be divided into M linear sub-band, to voiced frame λ, calculate the energy proportion on frequency band i wherein, K is the rank of FFT coefficient, FFT λ, pp the FFT coefficient for frame λ.
9. the ecological sound identification method based on multi-band signal reconstruct according to claim 1, it is characterized in that: determine a threshold gamma, when energy proportion surpasses threshold value, subband i is in the dominant frequency scope, foreground sounds frequency factor α (λ) sets higher weight, and outside the dominant frequency scope, the weight that respective settings is lower,
Figure FDA0000393844720000032
noise frequency factor-beta (λ) characterizes the degree height of current sub-band noise effect, can utilize the signal of last stage reconstruct to estimate noise as prior imformation, then calculate the power spectrum signal to noise ratio (S/N ratio) of frame λ i subband SNR i ( &lambda; ) = 10 log 10 ( &Sigma; p = K M &CenterDot; ( i - 1 ) K M &CenterDot; i | Y i &prime; ( &lambda; , p ) | 2 &Sigma; p = K M &CenterDot; ( i - 1 ) K M &CenterDot; i | F i ( &lambda; , p ) | 2 - | Y i &prime; ( &lambda; , p ) | 2 ) , The noise frequency factor of frame λ i subband &beta; i ( &lambda; ) = 0.1 , SNR i ( &lambda; ) < 0 0.1 + 0.04 SNR i ( &lambda; ) , 0 &le; SNR i ( &lambda; ) &le; 20 0.9 , SNR i ( &lambda; ) > 20 ; By solving foreground sounds frequency factor α (λ) and noise frequency factor-beta (λ) carries out the multiband gain, obtain the sound power spectrum of reconstruct for the second time | Y (λ, j) | 2≈ | Y (λ, j) | 2=| Y'(λ, j) | 2+ α (λ) β (λ) (| F (λ, j) | 2-| Y'(λ, j) | 2), when the foreground sounds power spectrum of reconstruct surpasses former noise sound power spectrum, use
Figure FDA0000393844720000035
upgraded.
CN201310472342.9A 2013-10-11 2013-10-11 Based on the ecological of multi-band signal reconstruct Expired - Fee Related CN103474066B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310472342.9A CN103474066B (en) 2013-10-11 2013-10-11 Based on the ecological of multi-band signal reconstruct

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310472342.9A CN103474066B (en) 2013-10-11 2013-10-11 Based on the ecological of multi-band signal reconstruct

Publications (2)

Publication Number Publication Date
CN103474066A true CN103474066A (en) 2013-12-25
CN103474066B CN103474066B (en) 2016-01-06

Family

ID=49798887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310472342.9A Expired - Fee Related CN103474066B (en) 2013-10-11 2013-10-11 Based on the ecological of multi-band signal reconstruct

Country Status (1)

Country Link
CN (1) CN103474066B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268125A (en) * 2014-09-28 2015-01-07 江南大学 Method of Chirp time-frequency atoms denoted with three parameters
CN104882144A (en) * 2015-05-06 2015-09-02 福州大学 Animal voice identification method based on double sound spectrogram characteristics
CN105551503A (en) * 2015-12-24 2016-05-04 武汉大学 Audio matching tracking method based on atom pre-selection and system thereof
CN106297825A (en) * 2016-07-25 2017-01-04 华南理工大学 A kind of speech-emotion recognition method based on integrated degree of depth belief network
CN106356058A (en) * 2016-09-08 2017-01-25 河海大学 Robust speech recognition method based on multi-band characteristic compensation
CN106653032A (en) * 2016-11-23 2017-05-10 福州大学 Animal sound detecting method based on multiband energy distribution in low signal-to-noise-ratio environment
CN106683663A (en) * 2015-11-06 2017-05-17 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN107276938A (en) * 2017-06-28 2017-10-20 北京邮电大学 A kind of digital signal modulation mode recognition methods and device
CN107729288A (en) * 2017-09-30 2018-02-23 中国人民解放军战略支援部队航天工程大学 A kind of Polynomial Phase Signals time-frequency conversion method based on particle group optimizing
CN107831549A (en) * 2017-11-20 2018-03-23 中国地质大学(武汉) A kind of NMP cepstrum SST Time-frequency methods of ENPEMF signals
CN109344751A (en) * 2018-09-20 2019-02-15 上海工程技术大学 A kind of reconstructing method of internal car noise signal
CN111711918A (en) * 2020-05-25 2020-09-25 中国科学院声学研究所 Coherent sound and environmental sound extraction method and system of multichannel signal
CN112863517A (en) * 2021-01-19 2021-05-28 苏州大学 Speech recognition method based on perceptual spectrum convergence rate
CN113053417A (en) * 2021-03-29 2021-06-29 济南大学 Method, system, equipment and storage medium for recognizing emotion of voice with noise
CN114822567A (en) * 2022-06-22 2022-07-29 天津大学 Pathological voice frequency spectrum reconstruction method based on energy operator
CN116705017A (en) * 2022-09-14 2023-09-05 荣耀终端有限公司 Voice detection method and electronic equipment
CN116821644A (en) * 2023-03-23 2023-09-29 南京航空航天大学 Flight data identification method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102034478A (en) * 2010-11-17 2011-04-27 南京邮电大学 Voice secret communication system design method based on compressive sensing and information hiding
US20120078621A1 (en) * 2010-09-24 2012-03-29 International Business Machines Corporation Sparse representation features for speech recognition
CN103345923A (en) * 2013-07-26 2013-10-09 电子科技大学 Sparse representation based short-voice speaker recognition method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078621A1 (en) * 2010-09-24 2012-03-29 International Business Machines Corporation Sparse representation features for speech recognition
CN102034478A (en) * 2010-11-17 2011-04-27 南京邮电大学 Voice secret communication system design method based on compressive sensing and information hiding
CN103345923A (en) * 2013-07-26 2013-10-09 电子科技大学 Sparse representation based short-voice speaker recognition method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
刘亚峰 等;: "基于DSP的OMP算法实现及音频信号处理", 《电声技术》 *
孙林慧: "语音压缩感知关键技术研究", 《南京邮电大学博士学位论文》 *
陈臻圆: "基于OMP方法的语音信号重构", 《知识经济》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268125A (en) * 2014-09-28 2015-01-07 江南大学 Method of Chirp time-frequency atoms denoted with three parameters
CN104882144B (en) * 2015-05-06 2018-10-30 福州大学 Animal sounds recognition methods based on sonograph bicharacteristic
CN104882144A (en) * 2015-05-06 2015-09-02 福州大学 Animal voice identification method based on double sound spectrogram characteristics
WO2016176887A1 (en) * 2015-05-06 2016-11-10 福州大学 Animal sound identification method based on double spectrogram features
CN106683663B (en) * 2015-11-06 2022-01-25 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN106683663A (en) * 2015-11-06 2017-05-17 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN105551503A (en) * 2015-12-24 2016-05-04 武汉大学 Audio matching tracking method based on atom pre-selection and system thereof
CN105551503B (en) * 2015-12-24 2019-03-01 武汉大学 Based on the preselected Audio Matching method for tracing of atom and system
CN106297825A (en) * 2016-07-25 2017-01-04 华南理工大学 A kind of speech-emotion recognition method based on integrated degree of depth belief network
CN106297825B (en) * 2016-07-25 2019-10-18 华南理工大学 A kind of speech-emotion recognition method based on integrated deepness belief network
CN106356058B (en) * 2016-09-08 2019-08-20 河海大学 A kind of robust speech recognition methods based on multiband feature compensation
CN106356058A (en) * 2016-09-08 2017-01-25 河海大学 Robust speech recognition method based on multi-band characteristic compensation
CN106653032A (en) * 2016-11-23 2017-05-10 福州大学 Animal sound detecting method based on multiband energy distribution in low signal-to-noise-ratio environment
CN106653032B (en) * 2016-11-23 2019-11-12 福州大学 Based on the animal sounds detection method of multiband Energy distribution under low signal-to-noise ratio environment
CN107276938A (en) * 2017-06-28 2017-10-20 北京邮电大学 A kind of digital signal modulation mode recognition methods and device
CN107729288A (en) * 2017-09-30 2018-02-23 中国人民解放军战略支援部队航天工程大学 A kind of Polynomial Phase Signals time-frequency conversion method based on particle group optimizing
CN107729288B (en) * 2017-09-30 2020-11-06 中国人民解放军战略支援部队航天工程大学 Polynomial phase signal time-frequency transformation method based on particle swarm optimization
CN107831549A (en) * 2017-11-20 2018-03-23 中国地质大学(武汉) A kind of NMP cepstrum SST Time-frequency methods of ENPEMF signals
CN109344751B (en) * 2018-09-20 2021-10-08 上海工程技术大学 Reconstruction method of noise signal in vehicle
CN109344751A (en) * 2018-09-20 2019-02-15 上海工程技术大学 A kind of reconstructing method of internal car noise signal
CN111711918A (en) * 2020-05-25 2020-09-25 中国科学院声学研究所 Coherent sound and environmental sound extraction method and system of multichannel signal
CN112863517A (en) * 2021-01-19 2021-05-28 苏州大学 Speech recognition method based on perceptual spectrum convergence rate
CN113053417A (en) * 2021-03-29 2021-06-29 济南大学 Method, system, equipment and storage medium for recognizing emotion of voice with noise
CN113053417B (en) * 2021-03-29 2022-04-19 济南大学 Method, system, equipment and storage medium for recognizing emotion of voice with noise
CN114822567A (en) * 2022-06-22 2022-07-29 天津大学 Pathological voice frequency spectrum reconstruction method based on energy operator
CN114822567B (en) * 2022-06-22 2022-09-27 天津大学 Pathological voice frequency spectrum reconstruction method based on energy operator
CN116705017A (en) * 2022-09-14 2023-09-05 荣耀终端有限公司 Voice detection method and electronic equipment
CN116821644A (en) * 2023-03-23 2023-09-29 南京航空航天大学 Flight data identification method

Also Published As

Publication number Publication date
CN103474066B (en) 2016-01-06

Similar Documents

Publication Publication Date Title
CN103474066B (en) Based on the ecological of multi-band signal reconstruct
CN103531199B (en) Based on the ecological that rapid sparse decomposition and the degree of depth learn
Mitra et al. Hybrid convolutional neural networks for articulatory and acoustic information based speech recognition
Chang et al. Robust CNN-based speech recognition with Gabor filter kernels.
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN107680582A (en) Acoustic training model method, audio recognition method, device, equipment and medium
CN102968990B (en) Speaker identifying method and system
CN102436809B (en) Network speech recognition method in English oral language machine examination system
CN104616663A (en) Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN104392718A (en) Robust voice recognition method based on acoustic model array
CN103824557A (en) Audio detecting and classifying method with customization function
CN102982803A (en) Isolated word speech recognition method based on HRSF and improved DTW algorithm
Baby et al. Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition
CN104978507A (en) Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition
Mitra et al. Joint modeling of articulatory and acoustic spaces for continuous speech recognition tasks
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
Seo et al. A maximum a posterior-based reconstruction approach to speech bandwidth expansion in noise
CN106373559A (en) Robustness feature extraction method based on logarithmic spectrum noise-to-signal weighting
Soe Naing et al. Discrete Wavelet Denoising into MFCC for Noise Suppressive in Automatic Speech Recognition System.
CN106356058A (en) Robust speech recognition method based on multi-band characteristic compensation
Wiśniewski et al. Automatic detection of disorders in a continuous speech with the hidden Markov models approach
CN110136746B (en) Method for identifying mobile phone source in additive noise environment based on fusion features
CN104392719B (en) A kind of center sub-band model self-adapting method for speech recognition system
Meutzner et al. A generative-discriminative hybrid approach to multi-channel noise reduction for robust automatic speech recognition
Chandra et al. Spectral-subtraction based features for speaker identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160106

Termination date: 20191011

CF01 Termination of patent right due to non-payment of annual fee