CN108564965B - Anti-noise voice recognition system - Google Patents
Anti-noise voice recognition system Download PDFInfo
- Publication number
- CN108564965B CN108564965B CN201810311359.9A CN201810311359A CN108564965B CN 108564965 B CN108564965 B CN 108564965B CN 201810311359 A CN201810311359 A CN 201810311359A CN 108564965 B CN108564965 B CN 108564965B
- Authority
- CN
- China
- Prior art keywords
- cfcc
- signal
- training set
- speech signal
- auditory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
Abstract
The invention relates to the technical field of voice recognition. An anti-noise voice recognition system, carry on the windowing and framing to the voice signal, then do the discrete Fourier transform, find amplitude and phase angle of the voice signal; obtaining a power spectrum of the estimated signal through a spectrum subtraction operation; reconstructing the signal by utilizing the phase angle information before spectral subtraction to obtain a speech sequence after spectral subtraction; simulating the auditory characteristics of human ears by adopting a nonlinear power function to extract cochlear filtering cepstrum characteristics CFCC and first-order difference delta CFCC of the cochlear filtering cepstrum characteristics for the new voice sequence, and performing characteristic mixing by utilizing a dimension screening method; performing data normalization processing on the fusion characteristics to obtain a training set label and a test set label; and (4) carrying out dimensionality reduction on the normalized training set by adopting PCA (principal component analysis), and bringing the reduced training set into an SVM (support vector machine) model to obtain the identification accuracy.
Description
Technical Field
The invention relates to the technical field of voice recognition.
Background
With the rapid development of information technology, human-computer interaction receives more and more attention, and speech recognition becomes a key technology of human-computer interaction and becomes a research focus in the field. Speech recognition is a high-technology speech recognition technology in which a computer converts speech signals into corresponding texts or commands by extracting and analyzing human speech semantic information, and is widely applied to various fields such as industry, household appliances, communication, automotive electronics, medical treatment, home services, consumer electronics and the like.
However, speech signals are particularly susceptible to noise, and various links from acquisition to transmission to restoration may be affected by noise. The spectral subtraction method is one of the voice enhancement technologies, and is simple in operation and easy to implement.
Currently, the most mainstream characteristic parameter in speech recognition is Mel Frequency Cepstrum Coefficient (MFCC), MFCC characteristics are extracted based on fourier transform, and actually, the fourier transform is only suitable for processing of stationary signals. Auditory transformation is used as a new method for processing non-stationary voice signals, makes up for the defects of Fourier transformation, and has the advantages of less harmonic distortion and good spectrum smoothness. The first time in 2011 by Peter Li doctor in bell labs, cochlear filter cepstral coefficients were proposed and applied to speaker recognition, the cochlear filter cepstral coefficients being the first feature to use auditory transformations. Although many scholars research the CFCC features, the nonlinear power function derived from the saturation relation between the neuron action potential firing rate and the sound intensity can be approximate to an auditory neuron-intensity curve, the traditional CFCC feature extraction method does not take the characteristic of human auditory sense into consideration, and therefore the nonlinear power function capable of simulating the characteristic of human auditory sense is adopted to extract new CFCC features.
A complete speech signal contains both frequency information and energy information. The Teager energy operator is used as a nonlinear difference operator, can eliminate the influence of zero-mean noise and the voice enhancement capability, is used for feature extraction, can better reflect the energy change of voice signals, can inhibit noise and enhance the voice signals, and can obtain good effect when used for voice recognition.
Support Vector Machines (SVMs) are a new Machine learning technique based on the principle of minimizing structural risks. The method can better solve the classification problems of small samples, nonlinearity, high dimensionality and the like, has good generalization, is widely applied to the problems of pattern recognition, classification estimation and the like, and becomes a more common classification model in the voice recognition technology through the excellent classification capability and good generalization performance of the method.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: how to improve the speech recognition effect.
The technical scheme adopted by the invention is as follows: an anti-noise speech recognition system, comprising the steps of:
step one, performing windowing and framing on voice signals s (n), and then performing discrete Fourier transform to obtain the amplitude and phase angle of the voice signals
Windowing the speech signal s (n), the window function used being the hamming window w (n):
multiplying the speech signal s (n) by a window function w (n) to form a windowed speech signal x (n)
x(n)=s(n)*w(n)
The windowed speech signal x (n) is subjected to framing processing, and then the speech signal x (n) is expressed as xn(t), wherein N is a frame number, t is a frame synchronization time number, and N is a frame length;
for the framed speech signal xn(t) performing a discrete fourier transform:
where j denotes a complex number, e is a constant, pi is a constant, and the harmonic component number k is 0, 1., N-1, then the short-time amplitude spectrum of the windowed speech signal X (N) is estimated to be | X (N, k) |, and the phase angle is:
the value of | X (n, k) | is expressed as the amplitude of the voice signal,the value of (d) is expressed as the phase angle of the speech signal;
calculating the average energy of the noise section, and obtaining a power spectrum of the estimated signal through spectral subtraction;
the duration of the noise section IS, its corresponding frame number IS NIS, and the average energy of the noise section IS:
the power spectrum of the estimated signal is obtained by the following spectral subtraction operation
Wherein a1 and b1 are two constants, a1 is an over-subtraction factor, and a1 is a gain compensation factor;
thirdly, reconstructing the signal by utilizing the phase angle information before spectral subtraction to obtain a speech sequence after spectral subtraction;
power spectrum subtracted with spectrumCombining the phase angle information before spectral subtractionIFFT is carried out, the frequency domain is restored to the time domain, and the speech sequence after spectral subtraction is obtained
Step four, the voice sequence after spectral subtractionSimulating the auditory characteristics of human ears by adopting a nonlinear power function to extract a cochlear filtering cepstrum characteristic CFCC and a first-order difference delta CFCC thereof, and performing characteristic mixing by using a dimension screening method;
the auditory transformation simulates the auditory mechanism of human ears, and is a process of realizing filtering by using wavelet transformation by taking a cochlear filter function as a new wavelet basis function;
spectrally subtracted speech sequenceThe output over a certain frequency band after auditory transformation is:
in the above formulaβ>0, where the values of α and β determine the frequency domain shape and width of the cochlear filter function, u (t) is a unit step function, b2 is a real number that is variable over time, a2 is a scale variable, θ is the initial phase, and in generalCan be derived from the centre frequency f of the filter bankcAnd the lowest center frequency fLDetermining
The inner hair cell of the human ear cochlea converts the voice signal output by auditory transformation into an electric signal analyzable by the human brain:
h(a2,b2)=[T(a2,b2)]2
according to the auditory characteristics of human ears, the response duration of the auditory nerve of the sound to the sound is gradually shortened along with the increase of the frequency, which indicates that the human ears are more sensitive to high-frequency transient components, so that the time smoothing window length of the cochlear filter with higher central frequency needs to be properly shortened. For different frequency bands, different window lengths are selected, and the average value of the capillary cell function of the ith frequency band can be expressed as:
where d ═ max {3.5 τq20ms, which is the smoothing window length of the ith band, τqIs the time length, τ, of the center frequency of the center band of the p-th filterq=1/fcL is frame shift, L is d/2, and w is the number of windows;
the output of the hair cells completes loudness transformation through a nonlinear power function, the loudness is changed from an energy value to a perceived loudness, and the perceived loudness of the ith frequency band can be expressed as:
y(i,w)=[S(i,w)]0.101
finally, the obtained characteristics are decorrelated by using discrete cosine transform to obtain CFCC characteristic parameters:
wherein n1 is the order of the CFCC characteristic, and M is the channel number of the cochlear filter;
after extracting the CFCC parameters, calculating a first-order difference coefficient:
dx(n1) represents an n1 th order coefficient of the first order difference CFCC parameter of the x frame speech signal, where k is a constant, and is generally taken to be 2;
after 16-order CFCC and delta CFCC are respectively extracted, dimension screening is carried out on the features, and the parts which can represent the voice features most are selected for feature mixing;
step five, adding TEOCC on the basis of CFCC + delta CFCC characteristics to form fusion characteristics;
for each frame of speech signal x (n), its TEO energy is calculated:
ψ[x(n)]=x(n)2-x(n+1)x(n-1)
carrying out normalization processing and taking logarithm to obtain:
finally, performing DCT transformation to obtain one-dimensional TEOCC;
adding one-dimensional TEOCC characteristics into the last dimension of the mixed characteristic vector;
step six, performing data normalization processing on the fusion characteristics to form a normalized training set and a normalized test set, and labeling the two sets respectively to obtain a training set label and a test set label;
any data sample in the feature training set and the feature testing set is yyiAfter normalization processing, the corresponding data samples in the normalized training set and the normalized testing set are as follows:
wherein y iswinAnd yw0xRepresents yiA respective minimum and maximum.
Step seven, adopting PCA to reduce the dimension of the normalized training set, and bringing the reduced dimension into an SVM model to obtain the recognition accuracy
Dividing the voice features after dimensionality reduction into two parts of a training set train _ data and a test set test _ data, respectively adding a training set label train _ label and a test set label test _ label, and inputting the training set into an SVM (support vector machine) to establish a model:
model=svmtrain(train_label,train_data)
testing the test set by using the established model to obtain the identification accuracy rate accurve:
accuracy=svmpredict(test_label,test_data)。
the invention has the beneficial effects that: the invention reduces the influence of noise on the voice signal by introducing the spectral subtraction method into the front end of feature extraction, adopts the nonlinear power function to simulate the auditory characteristics of human ears to extract CFCC and the first-order difference coefficient thereof, adds TEOCC representing the voice signal energy on the basis to form fusion features, performs feature selection on the fusion features by using a principal component analysis method, and applies the SVM model of the selected features to a voice recognition system, thereby having higher recognition accuracy, stronger robustness and higher recognition speed.
Detailed Description
In the invention, a windows 7 system is used as a program development software environment, MATLAB R2011a is used as a program development platform, in the embodiment, under the condition that the signal to noise ratio of 10 isolated words is 0db, 270 voice samples which are generated by pronouncing each word three times are used as a training set by 9 persons, and 210 voice samples which correspond to 7 persons under the corresponding vocabulary and the signal to noise ratio are used as a test set.
Step one, performing windowing and framing on voice signals s (n), and then performing discrete Fourier transform to obtain the amplitude and phase angle of the voice signals
Windowing the speech signal s (n), the window function used being the hamming window w (n):
multiplying the speech signal s (n) by a window function w (n) to form a windowed speech signal x (n)
x(n)=s(n)*w(n)
The windowed speech signal x (n) is subjected to framing processing, and then the speech signal x (n) is expressed as xn(t), wherein N is a frame number, t is a frame synchronization time number, and N is a frame length;
for the framed speech signal xn(t) performing a discrete fourier transform:
where j denotes a complex number, e is a constant, pi is a constant, and the harmonic component number k is 0, 1., N-1, then the short-time amplitude spectrum of the windowed speech signal X (N) is estimated to be | X (N, k) |, and the phase angle is:
the value of | X (n, k) | is expressed as the amplitude of the voice signal,the value of (d) is expressed as the phase angle of the speech signal;
calculating the average energy of the noise section, and obtaining a power spectrum of the estimated signal through spectral subtraction;
the duration of the noise section IS, its corresponding frame number IS NIS, and the average energy of the noise section IS:
the power spectrum of the estimated signal is obtained by the following spectral subtraction operation
Wherein a1 and b1 are two constants, a1 is an over-subtraction factor, and a1 is a gain compensation factor;
thirdly, reconstructing the signal by utilizing the phase angle information before spectral subtraction to obtain a speech sequence after spectral subtraction;
power spectrum subtracted with spectrumCombining the phase angle information before spectral subtractionIFFT is carried out to restore the frequency domain to the time domain to obtainOf the spectrally subtracted speech sequence
Step four, the voice sequence after spectral subtractionSimulating the auditory characteristics of human ears by adopting a nonlinear power function to extract a cochlear filtering cepstrum characteristic CFCC and a first-order difference delta CFCC thereof, and performing characteristic mixing by using a dimension screening method;
the auditory transformation simulates the auditory mechanism of human ears, and is a process of realizing filtering by using wavelet transformation by taking a cochlear filter function as a new wavelet basis function;
spectrally subtracted speech sequenceThe output over a certain frequency band after auditory transformation is:
in the above formulaβ>0, where the values of α and β determine the frequency domain shape and width of the cochlear filter function, u (t) is a unit step function, b2 is a real variable over timeNumber, a2 is a scale variable, θ is the initial phase, andcan be derived from the centre frequency f of the filter bankcAnd the lowest center frequency fLDetermining
The inner hair cell of the human ear cochlea converts the voice signal output by auditory transformation into an electric signal analyzable by the human brain:
h(a2,b2)=[T(a2,b2)]2
according to the auditory characteristics of human ears, the response duration of the auditory nerve of the sound to the sound is gradually shortened along with the increase of the frequency, which indicates that the human ears are more sensitive to high-frequency transient components, so that the time smoothing window length of the cochlear filter with higher central frequency needs to be properly shortened. For different frequency bands, different window lengths are selected, and the average value of the capillary cell function of the ith frequency band can be expressed as:
where d ═ max {3.5 τq20ms, which is the smoothing window length of the ith band, τqIs the center frequency of the p-th filter center bandLength of time of τq=1/fcL is frame shift, L is d/2, and w is the number of windows;
the output of the hair cells completes loudness transformation through a nonlinear power function, the loudness is changed from an energy value to a perceived loudness, and the perceived loudness of the ith frequency band can be expressed as:
y(i,w)=[S(i,w)]0.101
finally, the obtained characteristics are decorrelated by using discrete cosine transform to obtain CFCC characteristic parameters:
wherein n1 is the order of the CFCC characteristic, and M is the channel number of the cochlear filter;
after extracting the CFCC parameters, calculating a first-order difference coefficient:
dx(n1) represents an n1 th order coefficient of the first order difference CFCC parameter of the x frame speech signal, where k is a constant, and is generally taken to be 2;
after 16-order CFCC and delta CFCC are respectively extracted, dimension screening is carried out on the features, and the parts which can represent the voice features most are selected for feature mixing;
step five, adding TEOCC on the basis of CFCC + delta CFCC characteristics to form fusion characteristics;
for each frame of speech signal x (n), its TEO energy is calculated:
ψ[x(n)]=x(n)2-x(n+1)x(n-1)
carrying out normalization processing and taking logarithm to obtain:
finally, performing DCT transformation to obtain one-dimensional TEOCC;
adding one-dimensional TEOCC characteristics into the last dimension of the mixed characteristic vector;
step six, performing data normalization processing on the fusion characteristics to form a normalized training set and a normalized test set, and labeling the two sets respectively to obtain a training set label and a test set label;
any data sample in the feature training set and the feature testing set is yyiAfter normalization processing, the corresponding data samples in the normalized training set and the normalized testing set are as follows:
wherein y iswinAnd ywaxRepresents y4A respective minimum and maximum.
Step seven, adopting PCA to reduce the dimension of the normalized training set, and bringing the reduced dimension into an SVM model to obtain the recognition accuracy
Dividing the voice features after dimensionality reduction into two parts of a training set train _ data and a test set test _ data, respectively adding a training set label train _ label and a test set label test _ label, and inputting the training set into an SVM (support vector machine) to establish a model:
model=svmtrain(train_label,train_data)
testing the test set by using the established model to obtain the identification accuracy rate accurve:
accuracy=svmpredict(test_label,test_data)。
wherein accuracy is the classification accuracy of the test set sample, and the speech recognition accuracy corresponding to the test set sample is 88.10%.
Claims (1)
1. An anti-noise speech recognition system characterized by: the method comprises the following steps:
step one, performing windowing and framing on a voice signal s (n), and then performing discrete Fourier transform to obtain the amplitude and phase angle of the voice signal
Windowing the speech signal s (n), the window function used being the hamming window w (n):
multiplying the speech signal s (n) by a window function w (n) to form a windowed speech signal x (n)
x(n)=s(n)*w(n)
The windowed speech signal x (n) is subjected to framing processing, and the framed speech signal is represented as xn(t), wherein N is a frame number, t is a frame synchronization time number, and N is a frame length;
for the framed speech signal xn(t) performing a discrete fourier transform:
where j denotes a complex number, e is a constant, pi is a constant, the harmonic component number k is 0,1, 2,.., N-1, and the short-time amplitude spectrum estimate of the windowed speech signal is, that is, the amplitude of the speech signal is | X (N, k) |,
calculating the average energy of the noise section, and obtaining a power spectrum of the estimated signal through spectral subtraction;
the duration of the noise section IS, its corresponding frame number IS NIS, and the average energy of the noise section IS:
the power spectrum of the estimated signal is obtained by the following spectral subtraction operation
Wherein a1 and b1 are two constants, a1 is an over-subtraction factor, and b1 is a gain compensation factor;
thirdly, reconstructing the signal by utilizing the phase angle information before spectral subtraction to obtain a speech sequence after spectral subtraction;
power spectrum subtracted with spectrumCombining the phase angle information before spectral subtractionIFFT is carried out, the frequency domain is restored to the time domain, and the speech sequence after spectral subtraction is obtained
Step four, the voice sequence after spectral subtractionSimulating the auditory characteristics of human ears by adopting a nonlinear power function to extract a cochlear filtering cepstrum characteristic CFCC and a first-order difference delta CFCC thereof, and performing characteristic mixing by using a dimension screening method;
the auditory transformation simulates the auditory mechanism of human ears, and is a process of realizing filtering by using wavelet transformation by taking a cochlear filter function as a new wavelet basis function;
spectrally subtracted speech sequenceOutput in a certain frequency band after auditory conversionComprises the following steps:
in the above formulaWherein the values of alpha and beta determine the frequency domain shape and width of the cochlear filter function, u (t) is a unit step function, b2 is a real number variable along with time, a2 is a scale variable, theta is an initial phase,by the centre frequency f of the filter bankcAnd the lowest center frequency fLDetermining
The inner hair cell of the human ear cochlea converts the voice signal output by auditory transformation into an electric signal analyzable by the human brain:
h(a2,b2)=[T(a2,b2)]2
h (a2, b2) is an electrical signal analyzable by the human brain, T (a2, b2) is a speech signal output after auditory transformation, according to the auditory characteristics of the human ear, the response duration of the acoustic auditory nerve to the sound is gradually shortened along with the increase of the frequency, which indicates that the human ear is more sensitive to high-frequency transient components, therefore, for a cochlear filter with higher central frequency, the time smoothing window length needs to be properly shortened, different window lengths are selected for different frequency bands, and the average value of the hair cell function of the ith frequency band can be expressed as:
where d ═ max {3.5 τp20ms, which is the smoothing window length of the ith band, τpIs the time length, τ, of the center frequency of the center band of the p-th filterp=1/fcL is frame shift, L is d/2, and w is the number of windows;
the output of the hair cells completes loudness transformation through a nonlinear power function, the loudness is changed from an energy value to a perceived loudness, and the perceived loudness of the ith frequency band can be expressed as:
y(i,w)=[S(i,w)]0.101
finally, the obtained characteristics are decorrelated by using discrete cosine transform to obtain CFCC characteristic parameters:
wherein n1 is the order of the CFCC characteristic, and M is the channel number of the cochlear filter;
after extracting the CFCC parameters, calculating a first-order difference coefficient:
dx(n1) an n1 th order coefficient indicating a first order difference CFCC parameter of the x-th frame speech signal, where k is a constant and is 2;
after 16-order CFCC and delta CFCC are respectively extracted, dimension screening is carried out on the features, and the parts which can represent the voice features most are selected for feature mixing;
step five, adding TEOCC on the basis of CFCC + delta CFCC characteristics to form fusion characteristics;
for each frame of speech signal x (n), its TEO energy is calculated:
ψ[x(n)]=x(n)2-x(n+1)x(n-1)
carrying out normalization processing and taking logarithm to obtain:
finally, performing DCT transformation to obtain one-dimensional TEOCC;
adding one-dimensional TEOCC characteristics into the last dimension of the mixed characteristic vector;
step six, performing data normalization processing on the fusion characteristics to form a normalized training set and a normalized test set, and labeling the two sets respectively to obtain a training set label and a test set label;
any data sample in the feature training set and the feature testing set is yyiAfter normalization processing, the corresponding data samples in the normalized training set and the normalized testing set are as follows:
wherein y isminAnd ymaxRepresents yiRespective minimum and maximum values;
step seven, adopting PCA to reduce the dimension of the normalized training set, and bringing the reduced dimension into an SVM model to obtain the recognition accuracy
Dividing the voice features after dimensionality reduction into two parts of a training set train _ data and a test set test _ data, respectively adding a training set label train _ label and a test set label test _ label, and inputting the training set into an SVM (support vector machine) to establish a model:
model=svmtrain(train_label,train_data)
testing the test set by using the established model to obtain the identification accuracy rate accurve:
accuracy=svmpredict(test_label,test_data)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810311359.9A CN108564965B (en) | 2018-04-09 | 2018-04-09 | Anti-noise voice recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810311359.9A CN108564965B (en) | 2018-04-09 | 2018-04-09 | Anti-noise voice recognition system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108564965A CN108564965A (en) | 2018-09-21 |
CN108564965B true CN108564965B (en) | 2021-08-24 |
Family
ID=63534360
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810311359.9A Active CN108564965B (en) | 2018-04-09 | 2018-04-09 | Anti-noise voice recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108564965B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109256127B (en) * | 2018-11-15 | 2021-02-19 | 江南大学 | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter |
CN110808059A (en) * | 2019-10-10 | 2020-02-18 | 天津大学 | Speech noise reduction method based on spectral subtraction and wavelet transform |
CN111142084B (en) * | 2019-12-11 | 2023-04-07 | 中国电子科技集团公司第四十一研究所 | Micro terahertz spectrum identification and detection algorithm |
CN113205823A (en) * | 2021-04-12 | 2021-08-03 | 广东技术师范大学 | Lung sound signal endpoint detection method, system and storage medium |
CN113325752B (en) * | 2021-05-12 | 2022-06-14 | 北京戴纳实验科技有限公司 | Equipment management system |
CN114422313B (en) * | 2021-12-22 | 2023-08-01 | 西安电子科技大学 | Frame detection method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100789084B1 (en) * | 2006-11-21 | 2007-12-26 | 한양대학교 산학협력단 | Speech enhancement method by overweighting gain with nonlinear structure in wavelet packet transform |
JP2012032648A (en) * | 2010-07-30 | 2012-02-16 | Sony Corp | Mechanical noise reduction device, mechanical noise reduction method, program and imaging apparatus |
CN102456351A (en) * | 2010-10-14 | 2012-05-16 | 清华大学 | Voice enhancement system |
CN103985390A (en) * | 2014-05-20 | 2014-08-13 | 北京安慧音通科技有限责任公司 | Method for extracting phonetic feature parameters based on gammatone relevant images |
CN107248414A (en) * | 2017-05-23 | 2017-10-13 | 清华大学 | A kind of sound enhancement method and device based on multiframe frequency spectrum and Non-negative Matrix Factorization |
CN107845390A (en) * | 2017-09-21 | 2018-03-27 | 太原理工大学 | A kind of Emotional speech recognition system based on PCNN sound spectrograph Fusion Features |
-
2018
- 2018-04-09 CN CN201810311359.9A patent/CN108564965B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN108564965A (en) | 2018-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108564965B (en) | Anti-noise voice recognition system | |
Ancilin et al. | Improved speech emotion recognition with Mel frequency magnitude coefficient | |
Li et al. | An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN112006697B (en) | Voice signal-based gradient lifting decision tree depression degree recognition system | |
CN102968990B (en) | Speaker identifying method and system | |
WO2020034628A1 (en) | Accent identification method and device, computer device, and storage medium | |
CN102664010B (en) | Robust speaker distinguishing method based on multifactor frequency displacement invariant feature | |
CN113012720B (en) | Depression detection method by multi-voice feature fusion under spectral subtraction noise reduction | |
CN108198545B (en) | Speech recognition method based on wavelet transformation | |
CN111785285A (en) | Voiceprint recognition method for home multi-feature parameter fusion | |
CN104778948B (en) | A kind of anti-noise audio recognition method based on bending cepstrum feature | |
CN110931023B (en) | Gender identification method, system, mobile terminal and storage medium | |
CN110970036A (en) | Voiceprint recognition method and device, computer storage medium and electronic equipment | |
CN107274887A (en) | Speaker's Further Feature Extraction method based on fusion feature MGFCC | |
CN111508504B (en) | Speaker recognition method based on auditory center perception mechanism | |
CN108461081A (en) | Method, apparatus, equipment and the storage medium of voice control | |
CN105679321B (en) | Voice recognition method, device and terminal | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
US6701291B2 (en) | Automatic speech recognition with psychoacoustically-based feature extraction, using easily-tunable single-shape filters along logarithmic-frequency axis | |
Hasan et al. | Preprocessing of continuous bengali speech for feature extraction | |
CN103557925B (en) | Underwater target gammatone discrete wavelet coefficient auditory feature extraction method | |
CN113421584A (en) | Audio noise reduction method and device, computer equipment and storage medium | |
CN113539243A (en) | Training method of voice classification model, voice classification method and related device | |
Zouhir et al. | A bio-inspired feature extraction for robust speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Xue Peiyun Inventor after: Shi Yanyan Inventor after: Bai Jing Inventor after: Guo Qianyan Inventor before: Bai Jing Inventor before: Shi Yanyan Inventor before: Xue Peiyun Inventor before: Guo Qianyan |
|
CB03 | Change of inventor or designer information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |