CN108172214A - A kind of small echo speech recognition features parameter extracting method based on Mel domains - Google Patents
A kind of small echo speech recognition features parameter extracting method based on Mel domains Download PDFInfo
- Publication number
- CN108172214A CN108172214A CN201711439300.XA CN201711439300A CN108172214A CN 108172214 A CN108172214 A CN 108172214A CN 201711439300 A CN201711439300 A CN 201711439300A CN 108172214 A CN108172214 A CN 108172214A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- window
- signal
- wavelet
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G10L2015/025—Phonemes, fenemes or fenones being the recognition units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
Abstract
The invention discloses a kind of small echo speech recognition features parameter extracting methods based on Mel domains, the voice signal of input is pre-processed first, then the character vector of extraction reflection signal characteristic, then set up the reference model library of trained voice, then identification candidate result output is obtained by comparing, and finally identification candidate result is handled to obtain final recognition result by phonic knowledge.The present invention proposes parameter WPCC, and wavelet filter replaces Mel wave filters, and wavelet transform substitution discrete cosine transform by the parameter for consonant and recognition of vowels, has preferable effect.
Description
Technical field
The present invention relates to speech parameter generation method field, specifically a kind of small echo speech recognition features based on Mel domains
Parameter extracting method.
Background technology
Signal processing is generally all using Fourier transformation in speech recognition.Fourier transformation physical significance is intuitive, meter
It is simple and direct, it is widely used in the spectrum analysis of signal.But also there is serious deficiency.Fourier transformation illustrates signal spectrum
Statistical property, it is integration of the signal in entire time domain, the spectrum characterization of the signal overall strength of signal intermediate frequency rate component, but
But it not can be shown that when these frequency components generate, without the function of partial analysis signal, do not have transient information.And
To that in the analysis of time-varying or non stationary speech signal (especially consonant), should know that signal is neighbouring at any time as far as possible
Frequency domain character, therefore one-dimensional time-domain signal is mapped to the time-frequency characteristic that a two-dimentional time-frequency plane carrys out observation signal, i.e.,
The phase space of signal is built, then forms the time frequency analysis of signal.Wavelet transformation taking on time-frequency domain to different frequency contents
Sample step-length is modulability, its sampling step length in high frequency is small, and sampling step length is big in low frequency.Wavelet transformation in time-frequency domain all
There is partial analysis ability, exactly these characteristics so that wavelet transformation has the advantage of bigger in speech signal processing.
Fourier transform processing stationary signal is preferable, and poor to nonstationary random response effect.Consonant is changed in time-frequency domain
Fast signal, wavelet transformation are preferably to select.Farooq et al. [1] propositions obtain local frequencies section feature with wavelet packet, small
Frequency partition is multiple subbands by wave packet, and sub-belt energy value is as characteristic parameter, and in plosive identification, discrimination is than parameter MFCC
Improve 10 percentage points.Voice make an uproar relative to being superimposed interference value on time-frequency domain in clean speech, in characteristic parameter
Subtract a definite value in extraction, this value be equivalent to white noise spectrum value and clean speech characteristic close to [2];Farooq[3]
Local frequencies section is divided with wavelet transform again, low frequency part obtains thinner division, in phoneme recognition medial vowel discrimination
It is best.Physiologic Studies prove that basilar memebrane in the cochlea to play a crucial role to the sense of hearing functions as an establishment and stands in film
The band logical frequency analyzer of permanent Q on the basis of vibration.And length shows the high fdrequency component duration after physiological signal is decomposed
It is shorter, the long-term feature of low frequency component.This also just with the property matches mutually of wavelet analysis.For this purpose, Zhang Xueying etc.
People [4] proposes, based on Bark domains WAVELET PACKET DECOMPOSITION, to apply in speech recognition, discrimination is 10 higher than parameter MFCC in noise
Percentage point.WAVELET PACKET DECOMPOSITION is decomposed in wavelet space and scale space, obtains numerous frequency ranges, from the viewpoint of signal processing
It sees, use coefficient few as possible reflects information as much as possible, this needs Optimization of Wavelet packet to decompose.Jorge Silva [5] are proposed
Lowest costs tree trimming algorithm carries out WAVELET PACKET DECOMPOSITION, and preferable effect, P.K.Sahu et al. proposition are obtained in phoneme recognition
Cochlea bandpass filter group, then extracting parameter are replaced based on Bark domains WAVELET PACKET DECOMPOSITION [6] [7], identified in isolated word recognition
Effect is preferable, especially in noisy environment.
Parameter MFCC final steps are cepstrum operations, and cepstrum operation includes discrete cosine transform, and discrete cosine transform is Fu
Family name transformation real part, Fourier transform is the statistical property of signal, it is the integration in the entire time domain of signal, when a frequency range by
Influence of noise, entire frequency range, which is subjected to, to be involved.And Fourier transform has serious spectrum leakage in high frequency.Wavelet transform
Strong to signal partial analysis ability, it can characterize the local feature of signal.Replaced in cepstrum operation using wavelet transform
Discrete cosine transform, noise generally in high frequency coefficient, extract low frequency coefficient [8], achieve the effect that denoising, used in speaker
In the feature extraction [9] and the feature extraction of speech recognition of identification [10], discrimination is preferable in voice of making an uproar.
One frame voice signal may include two phonemes, if previous phoneme is consonant, the latter phoneme is vowel, then
The low frequency and high frequency of previous phoneme frequency are influenced by the latter phoneme low frequency and high frequency, and MFCC parameter extractions are to whole
A frequency range processing, can not overcome the influence for closing on phoneme.And wavelet transform captures the information of phoneme transition, and this mistake
Some local frequencies sections may be only present in by crossing information, and Nehe N.S. [11] divide signal frequency range with wavelet transform,
LPCC (Linear Predictive Cepstral Coefficient) is in subband, preferable knot is achieved to speech recognition
Fruit.Similary Weaam Alkhaldi [12] are applied in Arabic identification and call voice identification [13] system.Malik
[14] with same approach application in Speaker Identification.Mangesh S.Deshpande [15] divide frequency with WAVELET PACKET DECOMPOSITION
Section, Jian-Da Wu [16] are decomposed with irregular wavelet packet and are divided frequency range, and preferable effect is all achieved in Speaker Identification.
Attached bibliography
【1】.Farooq O.and Datta S.,Robust features for speech recognition
based on admissible wavelet packets,[J].Electronics letters 6th December
2001Vol. 37,No.25,pp.1554-1556
【2】.Farooq O.and Datta S.,Wavelet based robust sub-band features for
phoneme recognition,[J].IEE Proc.-Vis.Image Signal Process,Vol.151,No.3,June
2004,pp.187-193
【3】.Farooq O.,Datta S.,Phoneme recognition using wavelet based
features,[J]. Information Sciences 150(2003),pp.5-15
【4】.Xue-ying Zhang,The Speech Recognition System Based On Bark
Wavelet MFCC,[C].8th International Conference on Signal Processing,2007
【5】.P.K.Sahu and Astik Biswas,Hindi phoneme classification using
Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic
feature, Computers and Electrical Engineering 42,2015,pp.12-22
【6】.P.K.Sahu and Astik Biswas,Admissible wavelet packet features
based on human inner ear frequency response for Hindi consonant recognition,
Computers and Electrical Engineering 40,2014,pp.1111-1122
【7】.Jorge Silva,Shrikanth S.Narayanan,Discriminative Wavelet Packet
Filter Bank Selection for Pattern Recognition,[J].IEEE Transactions on signal
processing, VOL.57,NO.5,May 2009,pp.1796-1810
【8】.Tufekci Z,Gowdy J.N.,Feature extraction using discrete wavelet
transform for speech recognition,[C].Conference Proceedings-IEEE
Southeastcon,2000,pp.116-123
【9】.Tufekci Z,Noise Robust Speaker Verification Using Mel-Frequency
Discrete Wavelet Coefficients and Parallel Model Compensation,[C].IEEE
International Conference on Acoustics,Speech and Signal Processing-
Proceedings,v I,2005,pp.1657-1660
【10】.Tufekci Z,Gowdy John N.Applied mel-frequency discrete wavelet
coefficients and parallel model compensation for noise-robust speech
recognition,[J].Speech Communication 48,2006,pp.1295-1307
【11】.Nehe N.S.,New Robust Subband Cepstral Feature for Isolated Word
Recognition,[C].International Conference on Advances in Computing,
Communication and Control,Mumbai,Maharashtra,India.January 23–24,2009,
pp.326-330
【12】.Weaam Alkhaldi,Waleed Falbr and Nadder Hamdy,Multi-band based
recognition of spoken arabic numerals using wavelet transform, [C]
.Proceedings of The 19th National Radio Science Conference,Alexandria,Egypt,
March 2002,pp.224-229
【13】.Alkhaldi W.,Automatic Speech/Speaker Recognition In Noisy
Enviroments Using Wavelet Transform,[C].The 45th Midwest Symposium on
Circuits and Systems,2002,pp.463-466
【14】.Malik S.,Wavelet Transform Based Automatic Speaker Recognition,
[C].IEEE 13th International Multitopic Conference,2009
【15】.Mangesh S.Deshpande,Speaker Identification Using Admissible
Wavelet Packet Based Decomposition,[J].International Journal of Signal
Processing 6:1,2010,pp.20-23
【16】.Jian-Da Wu,Speaker identification using discrete wavelet packet
transform technique with irregular decomposition,[J].Expert Systems with
Applications 36,2009,pp.3136-3143
Invention content
The object of the present invention is to provide a kind of small echo speech recognition features parameter extracting method based on Mel domains, to solve
Prior art Fourier transformation handle voice signal there are the problem of.
In order to achieve the above object, the technical solution adopted in the present invention is:
A kind of small echo speech recognition features parameter extracting method based on Mel domains, it is characterised in that:Include the following steps:
(1), input speech signal;
(2), the voice signal of input is pre-processed;
(3), after pre-processing, the character vector of reflection signal characteristic is extracted from voice signal based on wavelet transformation;
(4), according to the character vector of extraction, the reference model library of training voice is established;
(5), the model of the character vector of the voice signal of input and reference model library is compared, selected similar
Highest model is spent as identification candidate result output;
(6), the identification candidate result in step 5 is handled to obtain final recognition result by phonic knowledge.
A kind of small echo speech recognition features parameter extracting method based on Mel domains, it is characterised in that:Step (3)
Middle process is as follows:
(1), the voice signal of input is pre-processed, framing, windowed function;
(2), the voice signal of every frame adding window is subjected to wavelet package transforms, obtains sub-band;
(3), the energy spectrum of each sub-band is extracted;
(4), wavelet transform is taken to energy spectrum, obtains 13 and maintain number.
Input Chinese vowel, consonant x (t), t are time variable,
Voice signal is sampled:Sample frequency f is carried out to input speech signalsFor the sampling of 8kHz, the letter after sampling
Number for x (t) ',, then carry out preemphasis
1-0.98Z-1Processing, 1-0.98Z-1Forms of time and space beVoice signal after preemphasis isWherein,For impulse function;
With the long 32ms of window, the Hamming window that window moves 16ms carries out voice signal windowing process, and framing is using overlapping segmentation
The overlapping part of method, former frame and a later frame is moved for frame, is realized with the method that moveable finite length window is weighted,
I.e. with window function w'(t) multiply the voice signal a (t) after preemphasis, so as to forming adding window voice signal b (t), b (t)=a (t)
×w'(t)
Its window function is:
N is long for window, and window length is frame length, and the i-th frame signal obtained after adding window sub-frame processing is
xi(t)=w'(t) b (t), 0≤t≤N-1
Characteristic parameter extraction stage:
24 frequency range WAVELET PACKET DECOMPOSITIONs of Fig. 3 are carried out to pretreated each frame voice signal, energy is taken to each frequency band
Amount spectrum, then 3 layer scattering wavelet transformations is taken to obtain parameter WPCC to parameter;
Compared with the prior art, beneficial effects of the present invention are embodied in:
Wavelet transformation all has the ability of extraction signal local feature in two domain of time-frequency.The present invention proposes parameter WPCC,
Wavelet filter replaces Mel wave filters, and wavelet transform substitution discrete cosine transform knows the parameter for consonant and vowel
Not, it is higher in fast consonant (plosive, the breach sound, affricate) discrimination of variation, small echo high-pass and low-pass filter and preferable high low pass
Wave filter is close, is associated with small between frequency range, and discrete wavelet spectral sidelobes component is small, and discrete cosine spectral sidelobes value is big, discrete wavelet
It is influenced compared with discrete cosine more noise resistance.Meanwhile above-mentioned parameter is used in isolated word recognition, also obtain preferable effect.
Description of the drawings
Fig. 1 is the flow chart of the present invention.
The hardware architecture diagram of Fig. 2 positions present invention.
Fig. 3 is the WAVELET PACKET DECOMPOSITION figure of the present invention.
Table one is WAVELET PACKET DECOMPOSITION centre frequency of the present invention and bandwidth and Mel domains centre frequency and bandwidth.
Specific embodiment
As shown in Figure 1-Figure 3, a kind of small echo speech recognition features parameter extracting method based on Mel domains, including following step
Suddenly:
(1), input speech signal;
(2), the voice signal of input is pre-processed;
(3), after pre-processing, the character vector of reflection signal characteristic is extracted from voice signal based on wavelet transformation;
(4), according to the character vector of extraction, the reference model library of training voice is established;
(5), the model of the character vector of the voice signal of input and reference model library is compared, selected similar
Highest model is spent as identification candidate result output;
(6), the identification candidate result in step 5 is handled to obtain final recognition result by phonic knowledge.
2nd, a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 1, it is special
Sign is:Process is as follows in step (3):
(1), the voice signal of input is pre-processed, framing, windowed function;
(2), the voice signal of every frame adding window is subjected to wavelet package transforms, obtains sub-band;
(3), the energy spectrum of each sub-band is extracted;
(4), wavelet transform is taken to energy spectrum, obtains 13 and maintain number.
3rd, a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 2, specifically
Step is as follows:
Input Chinese vowel, consonant x (t), t are time variable,
Voice signal is sampled:Sample frequency f is carried out to input speech signalsFor the sampling of 8kHz, the letter after sampling
Number for x (t) ',, then carry out preemphasis
1-0.98Z-1Processing, 1-0.98Z-1Forms of time and space beVoice signal after preemphasis isWhereinFor impulse function;
With the long 32ms of window, the Hamming window that window moves 16ms carries out voice signal windowing process, and framing is using overlapping segmentation
The overlapping part of method, former frame and a later frame is moved for frame, is realized with the method that moveable finite length window is weighted,
I.e. with window function w'(t) multiply the voice signal a (t) after preemphasis, so as to forming adding window voice signal b (t), b (t)=a (t)
×w'(t)
Its window function is:
N is long for window, and window length is frame length, and the i-th frame signal obtained after adding window sub-frame processing is
xi(t)=w'(t) b (t), 0≤t≤N-1
Characteristic parameter extraction stage:
24 frequency range WAVELET PACKET DECOMPOSITIONs of Fig. 3 are carried out to pretreated each frame voice signal, energy is taken to each frequency band
Amount spectrum, then 3 layer scattering wavelet transformations is taken to obtain parameter WPCC to parameter, as shown in table 1:
1 24 frequency range centre frequencies of table and bandwidth
Claims (3)
1. a kind of small echo speech recognition features parameter extracting method based on Mel domains, it is characterised in that:Include the following steps:
(1), input speech signal;
(2), the voice signal of input is pre-processed;
(3), after pre-processing, the character vector of reflection signal characteristic is extracted from voice signal based on wavelet transformation;
(4), according to the character vector of extraction, the reference model library of training voice is established;
(5), the model of the character vector of the voice signal of input and reference model library is compared, selects similarity most
High model is as identification candidate result output;
(6), the identification candidate result in step 5 is handled to obtain final recognition result by phonic knowledge.
2. a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 1, feature exist
In:Process is as follows in step (3):
(1), the voice signal of input is pre-processed, framing, windowed function;
(2), the voice signal of every frame adding window is subjected to wavelet package transforms, obtains sub-band;
(3), the energy spectrum of each sub-band is extracted;
(4), wavelet transform is taken to energy spectrum, obtains 13 and maintain number.
3. a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 2, specific steps
It is as follows:
Input Chinese vowel, consonant x (t), t are time variable,
Voice signal is sampled:Sample frequency f is carried out to input speech signalsFor the sampling of 8kHz, the signal after sampling is x
(t)',, then carry out preemphasis
1-0.98Z-1Processing, 1-0.98Z-1Forms of time and space beVoice signal after preemphasis isWherein,For impulse function;
With the long 32ms of window, the Hamming window that window moves 16ms carries out voice signal windowing process, and framing uses the overlapping method being segmented,
The overlapping part of former frame and a later frame is moved for frame, is realized, that is, used with the method that moveable finite length window is weighted
Window function w'(t) multiply the voice signal a (t) after preemphasis, so as to forming adding window voice signal b (t), b (t)=a (t) × w'
(t)
Its window function is:
N is long for window, and window length is frame length, and the i-th frame signal obtained after adding window sub-frame processing is
xi(t)=w'(t) b (t), 0≤t≤N-1
Characteristic parameter extraction stage:
Carry out 24 frequency range WAVELET PACKET DECOMPOSITIONs of Fig. 3 to pretreated each frame voice signal, each mid-band frequency and
Bandwidth is shown in Table 1, and energy spectrum is taken to each frequency band, then 3 layer scattering wavelet transformations is taken to obtain parameter WPCC to parameter;
Using word as a recognition unit, it is identified using template matching method, it, will be each in training data in the training stage
The characteristic vector time series of word extraction is as template deposit template library, in cognitive phase, by the characteristic vector of voice to be identified
Time series carries out similarity-rough set with each template in template library successively, is exported similarity soprano as recognition result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711439300.XA CN108172214A (en) | 2017-12-27 | 2017-12-27 | A kind of small echo speech recognition features parameter extracting method based on Mel domains |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711439300.XA CN108172214A (en) | 2017-12-27 | 2017-12-27 | A kind of small echo speech recognition features parameter extracting method based on Mel domains |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108172214A true CN108172214A (en) | 2018-06-15 |
Family
ID=62521723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711439300.XA Pending CN108172214A (en) | 2017-12-27 | 2017-12-27 | A kind of small echo speech recognition features parameter extracting method based on Mel domains |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108172214A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109300486A (en) * | 2018-07-30 | 2019-02-01 | 四川大学 | Fricative automatic identifying method is swallowed based on the cleft palate speech that PICGTFs and SSMC enhances |
CN111292753A (en) * | 2020-02-28 | 2020-06-16 | 广州国音智能科技有限公司 | Offline voice recognition method, device and equipment |
CN111563451A (en) * | 2020-05-06 | 2020-08-21 | 浙江工业大学 | Mechanical ventilation ineffective inspiration effort identification method based on multi-scale wavelet features |
CN111951783A (en) * | 2020-08-12 | 2020-11-17 | 北京工业大学 | Speaker recognition method based on phoneme filtering |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040083094A1 (en) * | 2002-10-29 | 2004-04-29 | Texas Instruments Incorporated | Wavelet-based compression and decompression of audio sample sets |
CN101188107A (en) * | 2007-09-28 | 2008-05-28 | 中国民航大学 | A voice recognition method based on wavelet decomposition and mixed Gauss model estimation |
CN101944359A (en) * | 2010-07-23 | 2011-01-12 | 杭州网豆数字技术有限公司 | Voice recognition method facing specific crowd |
CN104523268A (en) * | 2015-01-15 | 2015-04-22 | 江南大学 | Electroencephalogram signal recognition fuzzy system and method with transfer learning ability |
-
2017
- 2017-12-27 CN CN201711439300.XA patent/CN108172214A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040083094A1 (en) * | 2002-10-29 | 2004-04-29 | Texas Instruments Incorporated | Wavelet-based compression and decompression of audio sample sets |
CN101188107A (en) * | 2007-09-28 | 2008-05-28 | 中国民航大学 | A voice recognition method based on wavelet decomposition and mixed Gauss model estimation |
CN101944359A (en) * | 2010-07-23 | 2011-01-12 | 杭州网豆数字技术有限公司 | Voice recognition method facing specific crowd |
CN104523268A (en) * | 2015-01-15 | 2015-04-22 | 江南大学 | Electroencephalogram signal recognition fuzzy system and method with transfer learning ability |
Non-Patent Citations (4)
Title |
---|
杨丽坤,徐洋: "基于小波包变换的加权语音特征参数", 《计算机应用与软件》 * |
杨凯峰,牟莉,许亮: "基于离散小波变换和RBF神经网络的说话人识别", 《西安理工大学学报》 * |
汪峥,连翰,王建军: "说话人识别中特征参数提取的一种新方法", 《复旦学报( 自然科学版)》 * |
陈若珠,曾番,李战明: "基于一种新特征参数的说话人识别", 《兰州理工大学学报》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109300486A (en) * | 2018-07-30 | 2019-02-01 | 四川大学 | Fricative automatic identifying method is swallowed based on the cleft palate speech that PICGTFs and SSMC enhances |
CN109300486B (en) * | 2018-07-30 | 2021-06-25 | 四川大学 | PICGTFs and SSMC enhanced cleft palate speech pharynx fricative automatic identification method |
CN111292753A (en) * | 2020-02-28 | 2020-06-16 | 广州国音智能科技有限公司 | Offline voice recognition method, device and equipment |
CN111563451A (en) * | 2020-05-06 | 2020-08-21 | 浙江工业大学 | Mechanical ventilation ineffective inspiration effort identification method based on multi-scale wavelet features |
CN111563451B (en) * | 2020-05-06 | 2023-09-12 | 浙江工业大学 | Mechanical ventilation ineffective inhalation effort identification method based on multi-scale wavelet characteristics |
CN111951783A (en) * | 2020-08-12 | 2020-11-17 | 北京工业大学 | Speaker recognition method based on phoneme filtering |
CN111951783B (en) * | 2020-08-12 | 2023-08-18 | 北京工业大学 | Speaker recognition method based on phoneme filtering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bhat et al. | A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone | |
CN108198545B (en) | Speech recognition method based on wavelet transformation | |
CN109256138B (en) | Identity verification method, terminal device and computer readable storage medium | |
Dişken et al. | A review on feature extraction for speaker recognition under degraded conditions | |
CN108172214A (en) | A kind of small echo speech recognition features parameter extracting method based on Mel domains | |
WO2022141868A1 (en) | Method and apparatus for extracting speech features, terminal, and storage medium | |
CN108564956B (en) | Voiceprint recognition method and device, server and storage medium | |
Abdalla et al. | DWT and MFCCs based feature extraction methods for isolated word recognition | |
CN108922561A (en) | Speech differentiation method, apparatus, computer equipment and storage medium | |
CN105679321B (en) | Voice recognition method, device and terminal | |
Manurung et al. | Speaker recognition for digital forensic audio analysis using learning vector quantization method | |
Krishnan et al. | Features of wavelet packet decomposition and discrete wavelet transform for malayalam speech recognition | |
WO2021152566A1 (en) | System and method for shielding speaker voice print in audio signals | |
Amelia et al. | DWT-MFCC Method for Speaker Recognition System with Noise | |
Adam et al. | Wavelet cesptral coefficients for isolated speech recognition | |
CN110176243A (en) | Sound enhancement method, model training method, device and computer equipment | |
Gaafar et al. | An improved method for speech/speaker recognition | |
Joy et al. | Deep Scattering Power Spectrum Features for Robust Speech Recognition. | |
Jawarkar et al. | Effect of nonlinear compression function on the performance of the speaker identification system under noisy conditions | |
Adam et al. | Wavelet based Cepstral Coefficients for neural network speech recognition | |
Chandra et al. | Spectral-subtraction based features for speaker identification | |
Singh et al. | A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters | |
Ahmad et al. | The impact of low-pass filter in speaker identification | |
Indumathi et al. | An efficient speaker recognition system by employing BWT and ELM | |
Skariah et al. | Review of speech enhancement methods using generative adversarial networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180615 |
|
RJ01 | Rejection of invention patent application after publication |