CN108172214A - A kind of small echo speech recognition features parameter extracting method based on Mel domains - Google Patents

A kind of small echo speech recognition features parameter extracting method based on Mel domains Download PDF

Info

Publication number
CN108172214A
CN108172214A CN201711439300.XA CN201711439300A CN108172214A CN 108172214 A CN108172214 A CN 108172214A CN 201711439300 A CN201711439300 A CN 201711439300A CN 108172214 A CN108172214 A CN 108172214A
Authority
CN
China
Prior art keywords
voice signal
window
signal
wavelet
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711439300.XA
Other languages
Chinese (zh)
Inventor
胡宁
胡晓宁
程海峰
宁璐
朱方敢
洪英举
王龙峰
王智超
王晏平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Jianzhu University
Anhui University of Architecture
Original Assignee
Anhui University of Architecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University of Architecture filed Critical Anhui University of Architecture
Priority to CN201711439300.XA priority Critical patent/CN108172214A/en
Publication of CN108172214A publication Critical patent/CN108172214A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information

Abstract

The invention discloses a kind of small echo speech recognition features parameter extracting methods based on Mel domains, the voice signal of input is pre-processed first, then the character vector of extraction reflection signal characteristic, then set up the reference model library of trained voice, then identification candidate result output is obtained by comparing, and finally identification candidate result is handled to obtain final recognition result by phonic knowledge.The present invention proposes parameter WPCC, and wavelet filter replaces Mel wave filters, and wavelet transform substitution discrete cosine transform by the parameter for consonant and recognition of vowels, has preferable effect.

Description

A kind of small echo speech recognition features parameter extracting method based on Mel domains
Technical field
The present invention relates to speech parameter generation method field, specifically a kind of small echo speech recognition features based on Mel domains Parameter extracting method.
Background technology
Signal processing is generally all using Fourier transformation in speech recognition.Fourier transformation physical significance is intuitive, meter It is simple and direct, it is widely used in the spectrum analysis of signal.But also there is serious deficiency.Fourier transformation illustrates signal spectrum Statistical property, it is integration of the signal in entire time domain, the spectrum characterization of the signal overall strength of signal intermediate frequency rate component, but But it not can be shown that when these frequency components generate, without the function of partial analysis signal, do not have transient information.And To that in the analysis of time-varying or non stationary speech signal (especially consonant), should know that signal is neighbouring at any time as far as possible Frequency domain character, therefore one-dimensional time-domain signal is mapped to the time-frequency characteristic that a two-dimentional time-frequency plane carrys out observation signal, i.e., The phase space of signal is built, then forms the time frequency analysis of signal.Wavelet transformation taking on time-frequency domain to different frequency contents Sample step-length is modulability, its sampling step length in high frequency is small, and sampling step length is big in low frequency.Wavelet transformation in time-frequency domain all There is partial analysis ability, exactly these characteristics so that wavelet transformation has the advantage of bigger in speech signal processing.
Fourier transform processing stationary signal is preferable, and poor to nonstationary random response effect.Consonant is changed in time-frequency domain Fast signal, wavelet transformation are preferably to select.Farooq et al. [1] propositions obtain local frequencies section feature with wavelet packet, small Frequency partition is multiple subbands by wave packet, and sub-belt energy value is as characteristic parameter, and in plosive identification, discrimination is than parameter MFCC Improve 10 percentage points.Voice make an uproar relative to being superimposed interference value on time-frequency domain in clean speech, in characteristic parameter Subtract a definite value in extraction, this value be equivalent to white noise spectrum value and clean speech characteristic close to [2];Farooq[3] Local frequencies section is divided with wavelet transform again, low frequency part obtains thinner division, in phoneme recognition medial vowel discrimination It is best.Physiologic Studies prove that basilar memebrane in the cochlea to play a crucial role to the sense of hearing functions as an establishment and stands in film The band logical frequency analyzer of permanent Q on the basis of vibration.And length shows the high fdrequency component duration after physiological signal is decomposed It is shorter, the long-term feature of low frequency component.This also just with the property matches mutually of wavelet analysis.For this purpose, Zhang Xueying etc. People [4] proposes, based on Bark domains WAVELET PACKET DECOMPOSITION, to apply in speech recognition, discrimination is 10 higher than parameter MFCC in noise Percentage point.WAVELET PACKET DECOMPOSITION is decomposed in wavelet space and scale space, obtains numerous frequency ranges, from the viewpoint of signal processing It sees, use coefficient few as possible reflects information as much as possible, this needs Optimization of Wavelet packet to decompose.Jorge Silva [5] are proposed Lowest costs tree trimming algorithm carries out WAVELET PACKET DECOMPOSITION, and preferable effect, P.K.Sahu et al. proposition are obtained in phoneme recognition Cochlea bandpass filter group, then extracting parameter are replaced based on Bark domains WAVELET PACKET DECOMPOSITION [6] [7], identified in isolated word recognition Effect is preferable, especially in noisy environment.
Parameter MFCC final steps are cepstrum operations, and cepstrum operation includes discrete cosine transform, and discrete cosine transform is Fu Family name transformation real part, Fourier transform is the statistical property of signal, it is the integration in the entire time domain of signal, when a frequency range by Influence of noise, entire frequency range, which is subjected to, to be involved.And Fourier transform has serious spectrum leakage in high frequency.Wavelet transform Strong to signal partial analysis ability, it can characterize the local feature of signal.Replaced in cepstrum operation using wavelet transform Discrete cosine transform, noise generally in high frequency coefficient, extract low frequency coefficient [8], achieve the effect that denoising, used in speaker In the feature extraction [9] and the feature extraction of speech recognition of identification [10], discrimination is preferable in voice of making an uproar.
One frame voice signal may include two phonemes, if previous phoneme is consonant, the latter phoneme is vowel, then The low frequency and high frequency of previous phoneme frequency are influenced by the latter phoneme low frequency and high frequency, and MFCC parameter extractions are to whole A frequency range processing, can not overcome the influence for closing on phoneme.And wavelet transform captures the information of phoneme transition, and this mistake Some local frequencies sections may be only present in by crossing information, and Nehe N.S. [11] divide signal frequency range with wavelet transform, LPCC (Linear Predictive Cepstral Coefficient) is in subband, preferable knot is achieved to speech recognition Fruit.Similary Weaam Alkhaldi [12] are applied in Arabic identification and call voice identification [13] system.Malik [14] with same approach application in Speaker Identification.Mangesh S.Deshpande [15] divide frequency with WAVELET PACKET DECOMPOSITION Section, Jian-Da Wu [16] are decomposed with irregular wavelet packet and are divided frequency range, and preferable effect is all achieved in Speaker Identification.
Attached bibliography
【1】.Farooq O.and Datta S.,Robust features for speech recognition based on admissible wavelet packets,[J].Electronics letters 6th December 2001Vol. 37,No.25,pp.1554-1556
【2】.Farooq O.and Datta S.,Wavelet based robust sub-band features for phoneme recognition,[J].IEE Proc.-Vis.Image Signal Process,Vol.151,No.3,June 2004,pp.187-193
【3】.Farooq O.,Datta S.,Phoneme recognition using wavelet based features,[J]. Information Sciences 150(2003),pp.5-15
【4】.Xue-ying Zhang,The Speech Recognition System Based On Bark Wavelet MFCC,[C].8th International Conference on Signal Processing,2007
【5】.P.K.Sahu and Astik Biswas,Hindi phoneme classification using Wiener filtered wavelet packet decomposed periodic and aperiodic acoustic feature, Computers and Electrical Engineering 42,2015,pp.12-22
【6】.P.K.Sahu and Astik Biswas,Admissible wavelet packet features based on human inner ear frequency response for Hindi consonant recognition, Computers and Electrical Engineering 40,2014,pp.1111-1122
【7】.Jorge Silva,Shrikanth S.Narayanan,Discriminative Wavelet Packet Filter Bank Selection for Pattern Recognition,[J].IEEE Transactions on signal processing, VOL.57,NO.5,May 2009,pp.1796-1810
【8】.Tufekci Z,Gowdy J.N.,Feature extraction using discrete wavelet transform for speech recognition,[C].Conference Proceedings-IEEE Southeastcon,2000,pp.116-123
【9】.Tufekci Z,Noise Robust Speaker Verification Using Mel-Frequency Discrete Wavelet Coefficients and Parallel Model Compensation,[C].IEEE International Conference on Acoustics,Speech and Signal Processing- Proceedings,v I,2005,pp.1657-1660
【10】.Tufekci Z,Gowdy John N.Applied mel-frequency discrete wavelet coefficients and parallel model compensation for noise-robust speech recognition,[J].Speech Communication 48,2006,pp.1295-1307
【11】.Nehe N.S.,New Robust Subband Cepstral Feature for Isolated Word Recognition,[C].International Conference on Advances in Computing, Communication and Control,Mumbai,Maharashtra,India.January 23–24,2009, pp.326-330
【12】.Weaam Alkhaldi,Waleed Falbr and Nadder Hamdy,Multi-band based recognition of spoken arabic numerals using wavelet transform, [C] .Proceedings of The 19th National Radio Science Conference,Alexandria,Egypt, March 2002,pp.224-229
【13】.Alkhaldi W.,Automatic Speech/Speaker Recognition In Noisy Enviroments Using Wavelet Transform,[C].The 45th Midwest Symposium on Circuits and Systems,2002,pp.463-466
【14】.Malik S.,Wavelet Transform Based Automatic Speaker Recognition, [C].IEEE 13th International Multitopic Conference,2009
【15】.Mangesh S.Deshpande,Speaker Identification Using Admissible Wavelet Packet Based Decomposition,[J].International Journal of Signal Processing 6:1,2010,pp.20-23
【16】.Jian-Da Wu,Speaker identification using discrete wavelet packet transform technique with irregular decomposition,[J].Expert Systems with Applications 36,2009,pp.3136-3143
Invention content
The object of the present invention is to provide a kind of small echo speech recognition features parameter extracting method based on Mel domains, to solve Prior art Fourier transformation handle voice signal there are the problem of.
In order to achieve the above object, the technical solution adopted in the present invention is:
A kind of small echo speech recognition features parameter extracting method based on Mel domains, it is characterised in that:Include the following steps:
(1), input speech signal;
(2), the voice signal of input is pre-processed;
(3), after pre-processing, the character vector of reflection signal characteristic is extracted from voice signal based on wavelet transformation;
(4), according to the character vector of extraction, the reference model library of training voice is established;
(5), the model of the character vector of the voice signal of input and reference model library is compared, selected similar Highest model is spent as identification candidate result output;
(6), the identification candidate result in step 5 is handled to obtain final recognition result by phonic knowledge.
A kind of small echo speech recognition features parameter extracting method based on Mel domains, it is characterised in that:Step (3) Middle process is as follows:
(1), the voice signal of input is pre-processed, framing, windowed function;
(2), the voice signal of every frame adding window is subjected to wavelet package transforms, obtains sub-band;
(3), the energy spectrum of each sub-band is extracted;
(4), wavelet transform is taken to energy spectrum, obtains 13 and maintain number.
Input Chinese vowel, consonant x (t), t are time variable,
Voice signal is sampled:Sample frequency f is carried out to input speech signalsFor the sampling of 8kHz, the letter after sampling Number for x (t) ',, then carry out preemphasis 1-0.98Z-1Processing, 1-0.98Z-1Forms of time and space beVoice signal after preemphasis isWherein,For impulse function;
With the long 32ms of window, the Hamming window that window moves 16ms carries out voice signal windowing process, and framing is using overlapping segmentation The overlapping part of method, former frame and a later frame is moved for frame, is realized with the method that moveable finite length window is weighted, I.e. with window function w'(t) multiply the voice signal a (t) after preemphasis, so as to forming adding window voice signal b (t), b (t)=a (t) ×w'(t)
Its window function is:
N is long for window, and window length is frame length, and the i-th frame signal obtained after adding window sub-frame processing is
xi(t)=w'(t) b (t), 0≤t≤N-1
Characteristic parameter extraction stage:
24 frequency range WAVELET PACKET DECOMPOSITIONs of Fig. 3 are carried out to pretreated each frame voice signal, energy is taken to each frequency band Amount spectrum, then 3 layer scattering wavelet transformations is taken to obtain parameter WPCC to parameter;
Compared with the prior art, beneficial effects of the present invention are embodied in:
Wavelet transformation all has the ability of extraction signal local feature in two domain of time-frequency.The present invention proposes parameter WPCC, Wavelet filter replaces Mel wave filters, and wavelet transform substitution discrete cosine transform knows the parameter for consonant and vowel Not, it is higher in fast consonant (plosive, the breach sound, affricate) discrimination of variation, small echo high-pass and low-pass filter and preferable high low pass Wave filter is close, is associated with small between frequency range, and discrete wavelet spectral sidelobes component is small, and discrete cosine spectral sidelobes value is big, discrete wavelet It is influenced compared with discrete cosine more noise resistance.Meanwhile above-mentioned parameter is used in isolated word recognition, also obtain preferable effect.
Description of the drawings
Fig. 1 is the flow chart of the present invention.
The hardware architecture diagram of Fig. 2 positions present invention.
Fig. 3 is the WAVELET PACKET DECOMPOSITION figure of the present invention.
Table one is WAVELET PACKET DECOMPOSITION centre frequency of the present invention and bandwidth and Mel domains centre frequency and bandwidth.
Specific embodiment
As shown in Figure 1-Figure 3, a kind of small echo speech recognition features parameter extracting method based on Mel domains, including following step Suddenly:
(1), input speech signal;
(2), the voice signal of input is pre-processed;
(3), after pre-processing, the character vector of reflection signal characteristic is extracted from voice signal based on wavelet transformation;
(4), according to the character vector of extraction, the reference model library of training voice is established;
(5), the model of the character vector of the voice signal of input and reference model library is compared, selected similar Highest model is spent as identification candidate result output;
(6), the identification candidate result in step 5 is handled to obtain final recognition result by phonic knowledge.
2nd, a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 1, it is special Sign is:Process is as follows in step (3):
(1), the voice signal of input is pre-processed, framing, windowed function;
(2), the voice signal of every frame adding window is subjected to wavelet package transforms, obtains sub-band;
(3), the energy spectrum of each sub-band is extracted;
(4), wavelet transform is taken to energy spectrum, obtains 13 and maintain number.
3rd, a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 2, specifically Step is as follows:
Input Chinese vowel, consonant x (t), t are time variable,
Voice signal is sampled:Sample frequency f is carried out to input speech signalsFor the sampling of 8kHz, the letter after sampling Number for x (t) ',, then carry out preemphasis 1-0.98Z-1Processing, 1-0.98Z-1Forms of time and space beVoice signal after preemphasis isWhereinFor impulse function;
With the long 32ms of window, the Hamming window that window moves 16ms carries out voice signal windowing process, and framing is using overlapping segmentation The overlapping part of method, former frame and a later frame is moved for frame, is realized with the method that moveable finite length window is weighted, I.e. with window function w'(t) multiply the voice signal a (t) after preemphasis, so as to forming adding window voice signal b (t), b (t)=a (t) ×w'(t)
Its window function is:
N is long for window, and window length is frame length, and the i-th frame signal obtained after adding window sub-frame processing is
xi(t)=w'(t) b (t), 0≤t≤N-1
Characteristic parameter extraction stage:
24 frequency range WAVELET PACKET DECOMPOSITIONs of Fig. 3 are carried out to pretreated each frame voice signal, energy is taken to each frequency band Amount spectrum, then 3 layer scattering wavelet transformations is taken to obtain parameter WPCC to parameter, as shown in table 1:
1 24 frequency range centre frequencies of table and bandwidth

Claims (3)

1. a kind of small echo speech recognition features parameter extracting method based on Mel domains, it is characterised in that:Include the following steps:
(1), input speech signal;
(2), the voice signal of input is pre-processed;
(3), after pre-processing, the character vector of reflection signal characteristic is extracted from voice signal based on wavelet transformation;
(4), according to the character vector of extraction, the reference model library of training voice is established;
(5), the model of the character vector of the voice signal of input and reference model library is compared, selects similarity most High model is as identification candidate result output;
(6), the identification candidate result in step 5 is handled to obtain final recognition result by phonic knowledge.
2. a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 1, feature exist In:Process is as follows in step (3):
(1), the voice signal of input is pre-processed, framing, windowed function;
(2), the voice signal of every frame adding window is subjected to wavelet package transforms, obtains sub-band;
(3), the energy spectrum of each sub-band is extracted;
(4), wavelet transform is taken to energy spectrum, obtains 13 and maintain number.
3. a kind of small echo speech recognition features parameter extracting method based on Mel domains according to claim 2, specific steps It is as follows:
Input Chinese vowel, consonant x (t), t are time variable,
Voice signal is sampled:Sample frequency f is carried out to input speech signalsFor the sampling of 8kHz, the signal after sampling is x (t)',, then carry out preemphasis 1-0.98Z-1Processing, 1-0.98Z-1Forms of time and space beVoice signal after preemphasis isWherein,For impulse function;
With the long 32ms of window, the Hamming window that window moves 16ms carries out voice signal windowing process, and framing uses the overlapping method being segmented, The overlapping part of former frame and a later frame is moved for frame, is realized, that is, used with the method that moveable finite length window is weighted Window function w'(t) multiply the voice signal a (t) after preemphasis, so as to forming adding window voice signal b (t), b (t)=a (t) × w' (t)
Its window function is:
N is long for window, and window length is frame length, and the i-th frame signal obtained after adding window sub-frame processing is
xi(t)=w'(t) b (t), 0≤t≤N-1
Characteristic parameter extraction stage:
Carry out 24 frequency range WAVELET PACKET DECOMPOSITIONs of Fig. 3 to pretreated each frame voice signal, each mid-band frequency and Bandwidth is shown in Table 1, and energy spectrum is taken to each frequency band, then 3 layer scattering wavelet transformations is taken to obtain parameter WPCC to parameter;
Using word as a recognition unit, it is identified using template matching method, it, will be each in training data in the training stage The characteristic vector time series of word extraction is as template deposit template library, in cognitive phase, by the characteristic vector of voice to be identified Time series carries out similarity-rough set with each template in template library successively, is exported similarity soprano as recognition result.
CN201711439300.XA 2017-12-27 2017-12-27 A kind of small echo speech recognition features parameter extracting method based on Mel domains Pending CN108172214A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711439300.XA CN108172214A (en) 2017-12-27 2017-12-27 A kind of small echo speech recognition features parameter extracting method based on Mel domains

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711439300.XA CN108172214A (en) 2017-12-27 2017-12-27 A kind of small echo speech recognition features parameter extracting method based on Mel domains

Publications (1)

Publication Number Publication Date
CN108172214A true CN108172214A (en) 2018-06-15

Family

ID=62521723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711439300.XA Pending CN108172214A (en) 2017-12-27 2017-12-27 A kind of small echo speech recognition features parameter extracting method based on Mel domains

Country Status (1)

Country Link
CN (1) CN108172214A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109300486A (en) * 2018-07-30 2019-02-01 四川大学 Fricative automatic identifying method is swallowed based on the cleft palate speech that PICGTFs and SSMC enhances
CN111292753A (en) * 2020-02-28 2020-06-16 广州国音智能科技有限公司 Offline voice recognition method, device and equipment
CN111563451A (en) * 2020-05-06 2020-08-21 浙江工业大学 Mechanical ventilation ineffective inspiration effort identification method based on multi-scale wavelet features
CN111951783A (en) * 2020-08-12 2020-11-17 北京工业大学 Speaker recognition method based on phoneme filtering

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040083094A1 (en) * 2002-10-29 2004-04-29 Texas Instruments Incorporated Wavelet-based compression and decompression of audio sample sets
CN101188107A (en) * 2007-09-28 2008-05-28 中国民航大学 A voice recognition method based on wavelet decomposition and mixed Gauss model estimation
CN101944359A (en) * 2010-07-23 2011-01-12 杭州网豆数字技术有限公司 Voice recognition method facing specific crowd
CN104523268A (en) * 2015-01-15 2015-04-22 江南大学 Electroencephalogram signal recognition fuzzy system and method with transfer learning ability

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040083094A1 (en) * 2002-10-29 2004-04-29 Texas Instruments Incorporated Wavelet-based compression and decompression of audio sample sets
CN101188107A (en) * 2007-09-28 2008-05-28 中国民航大学 A voice recognition method based on wavelet decomposition and mixed Gauss model estimation
CN101944359A (en) * 2010-07-23 2011-01-12 杭州网豆数字技术有限公司 Voice recognition method facing specific crowd
CN104523268A (en) * 2015-01-15 2015-04-22 江南大学 Electroencephalogram signal recognition fuzzy system and method with transfer learning ability

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
杨丽坤,徐洋: "基于小波包变换的加权语音特征参数", 《计算机应用与软件》 *
杨凯峰,牟莉,许亮: "基于离散小波变换和RBF神经网络的说话人识别", 《西安理工大学学报》 *
汪峥,连翰,王建军: "说话人识别中特征参数提取的一种新方法", 《复旦学报( 自然科学版)》 *
陈若珠,曾番,李战明: "基于一种新特征参数的说话人识别", 《兰州理工大学学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109300486A (en) * 2018-07-30 2019-02-01 四川大学 Fricative automatic identifying method is swallowed based on the cleft palate speech that PICGTFs and SSMC enhances
CN109300486B (en) * 2018-07-30 2021-06-25 四川大学 PICGTFs and SSMC enhanced cleft palate speech pharynx fricative automatic identification method
CN111292753A (en) * 2020-02-28 2020-06-16 广州国音智能科技有限公司 Offline voice recognition method, device and equipment
CN111563451A (en) * 2020-05-06 2020-08-21 浙江工业大学 Mechanical ventilation ineffective inspiration effort identification method based on multi-scale wavelet features
CN111563451B (en) * 2020-05-06 2023-09-12 浙江工业大学 Mechanical ventilation ineffective inhalation effort identification method based on multi-scale wavelet characteristics
CN111951783A (en) * 2020-08-12 2020-11-17 北京工业大学 Speaker recognition method based on phoneme filtering
CN111951783B (en) * 2020-08-12 2023-08-18 北京工业大学 Speaker recognition method based on phoneme filtering

Similar Documents

Publication Publication Date Title
Bhat et al. A real-time convolutional neural network based speech enhancement for hearing impaired listeners using smartphone
CN108198545B (en) Speech recognition method based on wavelet transformation
CN109256138B (en) Identity verification method, terminal device and computer readable storage medium
Dişken et al. A review on feature extraction for speaker recognition under degraded conditions
CN108172214A (en) A kind of small echo speech recognition features parameter extracting method based on Mel domains
WO2022141868A1 (en) Method and apparatus for extracting speech features, terminal, and storage medium
CN108564956B (en) Voiceprint recognition method and device, server and storage medium
Abdalla et al. DWT and MFCCs based feature extraction methods for isolated word recognition
CN108922561A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN105679321B (en) Voice recognition method, device and terminal
Manurung et al. Speaker recognition for digital forensic audio analysis using learning vector quantization method
Krishnan et al. Features of wavelet packet decomposition and discrete wavelet transform for malayalam speech recognition
WO2021152566A1 (en) System and method for shielding speaker voice print in audio signals
Amelia et al. DWT-MFCC Method for Speaker Recognition System with Noise
Adam et al. Wavelet cesptral coefficients for isolated speech recognition
CN110176243A (en) Sound enhancement method, model training method, device and computer equipment
Gaafar et al. An improved method for speech/speaker recognition
Joy et al. Deep Scattering Power Spectrum Features for Robust Speech Recognition.
Jawarkar et al. Effect of nonlinear compression function on the performance of the speaker identification system under noisy conditions
Adam et al. Wavelet based Cepstral Coefficients for neural network speech recognition
Chandra et al. Spectral-subtraction based features for speaker identification
Singh et al. A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters
Ahmad et al. The impact of low-pass filter in speaker identification
Indumathi et al. An efficient speaker recognition system by employing BWT and ELM
Skariah et al. Review of speech enhancement methods using generative adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180615

RJ01 Rejection of invention patent application after publication