CN114613389A - Non-speech audio feature extraction method based on improved MFCC - Google Patents
Non-speech audio feature extraction method based on improved MFCC Download PDFInfo
- Publication number
- CN114613389A CN114613389A CN202210256684.6A CN202210256684A CN114613389A CN 114613389 A CN114613389 A CN 114613389A CN 202210256684 A CN202210256684 A CN 202210256684A CN 114613389 A CN114613389 A CN 114613389A
- Authority
- CN
- China
- Prior art keywords
- frequency
- mfcc
- functional expression
- expression
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 23
- 230000005236 sound signal Effects 0.000 claims abstract description 52
- 239000013598 vector Substances 0.000 claims abstract description 45
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000004927 fusion Effects 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 19
- 238000001228 spectrum Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 11
- 230000003595 spectral effect Effects 0.000 claims description 9
- 238000009432 framing Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 5
- 230000037433 frameshift Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 210000005069 ears Anatomy 0.000 description 4
- 230000008859 change Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 210000003477 cochlea Anatomy 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 230000005237 high-frequency sound signal Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 238000002715 modification method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0224—Processing in the time domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Stereophonic System (AREA)
Abstract
The invention relates to the technical field of audio feature extraction, and particularly discloses a non-speech audio feature extraction method based on improved MFCC (Mel frequency cepstrum coefficient), which comprises the following steps: collecting sound signals and preprocessing the collected sound signals; performing MFCC feature extraction on the preprocessed sound signals; performing EMD on the preprocessed sound signals to obtain IMF components, and extracting time domain characteristic vectors and frequency domain characteristic vectors of the IMF components; performing first-order difference and second-order difference on the MFCC coefficients to obtain dynamic feature vectors forming the MFCC; and performing feature fusion on the calculated MFCC feature vector, the time domain feature vector, the frequency domain feature vector and the MFCC dynamic feature vector to obtain an improved multi-scale MFCC feature vector. The invention can effectively extract the high-frequency part of the audio signal, and the characteristic information of the sound signal is richer and more comprehensive.
Description
Technical Field
The invention relates to the technical field of audio feature extraction.
Background
At present, there are three main types of common characteristic parameters in the sound signal feature extraction technology: linear Predictive Coefficients (LPC), Linear Predictive Cepstral Coefficients (LPCC), Mel-Frequency Cepstral Coefficients (MFCC). Compared with the first two model-based features, the MFCC does not make any assumption or limitation on the sound, and is a feature parameter set established based on the principle that the human brain processes external sound and the auditory characteristics of the human ear, and the feature is a feature parameter which is currently used in sound recognition more frequently. However, the MFCC features are designed according to the auditory characteristics of human ears, which are more sensitive to low-frequency sounds and have a masking effect on high frequencies, so that when facing non-speech audio signals with more high-frequency components, the feature parameters extracted by the method cannot comprehensively represent the acoustic characteristics of audio, and have certain limitations.
The key of the traditional MFCC sound signal feature extraction method is to construct a series of band-pass filter banks (Mel filters) with different weights to simulate the regulation effect of human ears on sound signals. In the research on the auditory mechanism of human ears, it is found that the traveling wave of low-frequency sound has a larger transmission distance on the inner cochlea basal membrane than that of high-frequency sound, and Mel filters in MFCC have fewer numbers and are distributed sparsely in a high-frequency area, so that the traditional MFCC method has poor characteristics of sound signals in the high-frequency part. In order to overcome the defects of the traditional MFCC and improve the applicability of the MFCC to non-speech audio feature extraction, a multi-scale fusion MFCC feature extraction method is researched and designed, and the method is necessary to overcome the problems in the existing MFCC method.
Disclosure of Invention
In order to solve the above problems in the existing audio feature extraction method, the present invention provides a non-speech audio feature extraction method based on an improved MFCC.
The technical scheme adopted by the invention for realizing the purpose is as follows: a non-speech audio feature extraction method based on improved MFCC comprises the following steps:
s1, collecting sound signals and preprocessing the collected sound signals;
s2, performing MFCC feature extraction on the preprocessed sound signals;
s3, performing EMD on the preprocessed sound signals to obtain IMF components, and extracting time domain characteristic vectors and frequency domain characteristic vectors of the IMF components;
s4, performing first-order difference and second-order difference on the MFCC coefficients to obtain dynamic feature vectors forming the MFCC;
and S5, performing feature fusion on the calculated MFCC feature vector, the calculated time domain feature vector, the calculated frequency domain feature vector and the calculated MFCC dynamic feature vector to obtain an improved multi-scale MFCC feature vector.
Preferably, the step S1 includes the following steps:
step S101: the amplitude of the audio sequence of the sound signal is subjected to normalization processing, and the function expression is as follows:
wherein: x (m) is the normalized sound sequence; x (n) is a sound sequence; x (n)maxThe maximum value of the absolute value of the sound sequence;
step S102: performing framing processing on the audio sequence subjected to the standard vertebra processing;
step S103: and windowing the audio sequence after the frame division.
Preferably, in the step S102, the frame length in the framing processing is 20 to 30ms, and the frame shift is 0.3 to 0.5 times the frame length.
Preferably, in step S103, a hamming window is used in the windowing process.
Preferably, the step S2 includes the following steps:
s201: obtaining a frequency spectrum X (k) of a time domain frame by frame obtained after preprocessing of the sound signal through fast Fourier transform, wherein a function expression of the frequency spectrum X (k) is as follows:
wherein: n is the number of points of Fourier transform, k is frequency, and x (N) is a frame-by-frame time domain obtained after sound signal preprocessing;
s202: calculating the energy spectrum | X (k) of the sound signal by taking the square of the frequency spectrum of the sound signal2Then passing it through a set of triangular filters simulating the adjustment of human ear to make | X (k) luminance become zero2Performing Mel nonlinear transformation, wherein the functional expression is as follows:
Hm(k) for the frequency response of the mth filter, the functional expression is:
Where f (m) is the triangular filter center frequency;
s203: taking logarithm of all MelSpec (m) obtained by a group of filters to obtain logarithmic energy E (m), wherein the function expression is as follows:
E(m)=lg[MelSpec(m)],0<m<M
wherein: m is the number of the filters;
s204: discrete cosine transforming the logarithmic energy E (m) to obtain a group of Mel cepstrum coefficients F (n), wherein the function expression is as follows:
where n is the order of the mel-frequency cepstral coefficient.
Preferably, in step S3, the IMF components are arranged in order from high frequency to low frequency, the first five IMF components are taken, and the time-domain feature vector and the frequency-domain feature vector thereof are extracted respectively.
Preferably, in step S3, the number of the temporal feature vectors is 11, including an average amplitude, a standard deviation, a square root amplitude, a root mean square, a peak-to-peak value, a skewness, a kurtosis, a peak factor, a margin factor, a form factor and a pulse index,
the average amplitude is expressed as a function of:
the functional expression of the standard deviation is:
the functional expression of the square root amplitude is:
the functional expression of root mean square is:
the functional expression of the peak-to-peak value is:
the functional expression of skewness is:
the function expression for kurtosis is:
the functional expression of the crest factor is:
the functional expression of the margin factor is:
the functional expression of the form factor is:
the functional expression of the pulse index is:
wherein: x (i) is a frequency component, XPIs the peak value and N is the corresponding sound signal length.
Preferably, in step S3, the frequency domain feature vectors are 2, including the frequency center and the frequency root mean square,
the functional expression for the frequency center is:
the root mean square function of frequency is expressed as:
wherein: k is the number of spectral lines, f (i) is the frequency value of the ith spectral line, and s (i) is the ith value of the spectrum.
Preferably, in step S4, the function expression of the first order difference of the MFCC coefficients is:
wherein: dtAnd CtThe tth first-order difference and the cepstrum coefficient are respectively; q is the order of the cepstral coefficient; k is the time difference of the first derivative.
Preferably, in step S4, the function expression of the second order difference of the MFCC coefficients is:
wherein: dtAnd CtThe tth second-order difference and the cepstrum coefficient are respectively; q is the order of the cepstral coefficient; k is the time difference of the second derivative.
The invention relates to a non-voice audio feature extraction method based on an improved MFCC (Mel frequency cepstrum coefficient), which solves the problem that the traditional MFCC lacks representation of a high-frequency sound signal part due to the design based on the auditory characteristics of human ears, can effectively extract the high-frequency part of an audio signal outside the range of the audio signal which can be processed by the MFCC, has the characteristics of extracting the short-time characteristics of the signal by the traditional MFCC and containing the overall change of the sound signal, and enriches and more comprehensively the feature information by the first-order difference and the second-order difference of the MFCC.
Drawings
FIG. 1 is a flow chart of a non-speech audio feature extraction method based on an improved MFCC according to an embodiment of the present invention;
FIG. 2 is a flow chart of pre-processing of a sound signal;
FIG. 3 is a flowchart of an extraction process for MFCC parameters.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The method for extracting non-speech audio features based on the improved MFCC in the embodiment, as shown in FIG. 1, includes the following steps:
s1, collecting sound signals and preprocessing the collected sound signals;
s2, performing MFCC feature extraction on the preprocessed sound signals;
s3, performing EMD on the preprocessed sound signals to obtain IMF components, and extracting time domain characteristic vectors and frequency domain characteristic vectors of the IMF components;
s4, performing first-order difference and second-order difference on the MFCC coefficients to obtain dynamic feature vectors forming the MFCC;
and S5, performing feature fusion on the calculated MFCC feature vector, the time domain feature vector, the frequency domain feature vector and the MFCC dynamic feature vector to obtain an improved multi-scale MFCC feature vector.
As shown in fig. 2, step S1 may include the following steps:
step S101: the amplitude of the audio sequence of the sound signal is subjected to normalization processing, and the function expression is as follows:
wherein: x (m) is the normalized sound sequence; x (n) is a sound sequence; | x (n) messagingmaxThe maximum value of the absolute value of the sound sequence;
step S102: performing framing processing on the audio sequence subjected to the standard vertebra processing;
although the sound signal is a non-stationary signal, the sound signal still has a short-time stationary characteristic in a short time, so that the sound sequence can be divided into a plurality of very small time periods, also called as a frame, so as to obtain the short-time characteristic of the signal, the frame length in the framing processing can be 20-30 ms, the frame shift can be 0.3-0.5 times of the frame length, and partial overlap exists between adjacent frames, so that the characteristic loss caused by overlarge difference of the two frames is avoided;
step S103: windowing the audio sequence after framing;
windowing may be used to smooth the transition between the beginning and end of the frame, and a hamming window may be used.
As shown in fig. 3, step S2 may include the steps of:
s201: obtaining a frequency spectrum X (k) of a time domain frame by frame obtained after preprocessing of the sound signal through fast Fourier transform, wherein a function expression of the frequency spectrum X (k) is as follows:
wherein: n is the number of points of Fourier transform, k is frequency, and x (N) is a frame-by-frame time domain obtained after sound signal preprocessing;
s202: calculating the energy spectrum | X (k) of the sound signal by taking the square of the frequency spectrum of the sound signal2Then passing it through a set of triangular filters simulating the adjustment of human ear to make | X (k) luminance become zero2Performing Mel nonlinear transformation, wherein the functional expression is as follows:
Hm(k) for the frequency response of the mth filter, the functional expression is:
Where f (m) is the triangular filter center frequency;
s203: taking logarithm of all MelSpec (m) obtained by a group of filters to obtain logarithmic energy E (m), wherein the function expression is as follows:
E(m)=lg[MelSpec(m)],0<m<M
wherein: m is the number of the filters;
s204: discrete cosine transform is carried out on the logarithmic energy E (m) to obtain a group of Mel cepstrum coefficients F (n), and the functional expression is as follows:
wherein n is the order of the mel-frequency cepstrum coefficient;
for the acquisition of the high frequency components of the sound signal, in addition to the EMD method, an EMD-based modification method such as EEMD, CEEMD, CEEMDAN, iceemda may be used.
In step S3, the IMF components may be arranged in order from high frequency to low frequency, the first five IMF components are taken, and their time domain feature vectors and frequency domain feature vectors are respectively extracted, the number of time domain feature vectors may be 11, including average amplitude, standard deviation, square root amplitude, root mean square, peak-to-peak value, skewness, kurtosis, peak factor, margin factor, form factor and pulse index,
the average amplitude is expressed as a function of:
the functional expression of the standard deviation is:
the functional expression of the square root amplitude is:
the functional expression of root mean square is:
the functional expression of the peak-to-peak value is:
the functional expression of skewness is:
the function expression for kurtosis is:
the functional expression of the crest factor is:
the functional expression of the margin factor is:
the functional expression of the form factor is:
the functional expression of the pulse index is:
wherein: x (i) is a frequency component, XPIs the peak value and N is the corresponding sound signal length.
In step S3, the frequency domain feature vectors may be 2, including the frequency center and the frequency root mean square,
the functional expression for the frequency center is:
the root mean square function of frequency is expressed as:
wherein: k is the number of spectral lines, f (i) is the frequency value of the ith spectral line, s (i) is the ith value of the spectrum;
for different sound signals, after EMD decomposition, the method can not be limited to only retaining the first five IMF components, at most retaining all IMF components with the correlation degree greater than 0.3 with the original signal, and subsequently calculating corresponding time domain and frequency domain characteristics, wherein the time domain and frequency domain characteristics of the signals are not limited to the above formula, and can replace other formulas to construct characteristics according to the characteristics of the analyzed sound signals in different aspects, such as root mean square energy representing energy; attack time, zero-crossing rate and autocorrelation in the time domain; spectral centroid in the frequency domain, spectral flatness, spectral flux, etc.
In order to obtain more abundant information, the MFCC coefficients are subjected to first-order difference and second-order difference to obtain dynamic feature vectors forming the MFCC,
in step S4, the functional expression of the first order difference of the MFCC coefficients is:
wherein: dtAnd CtThe tth first-order difference and the cepstrum coefficient are respectively; q is the order of the cepstral coefficient; k is the time difference of the first derivative.
In step S4, the functional expression of the second order difference of the MFCC coefficients is:
wherein: d is a radical oftAnd CtThe tth second-order difference and the cepstrum coefficient are respectively; q is the order of the cepstral coefficient; k is the time difference of the second derivative. The first order difference and the second order difference enable the characteristic information to be richer and more comprehensive.
The invention not only has the extraction of the short-time characteristic of the traditional MFCC, but also contains the characteristic of the overall change of the sound signal, and can process not only voice type audio, but also non-voice type audio such as audio signals of mechanical sound and the like.
The embodiments of the present invention have been presented for purposes of illustration and description, and are not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
Claims (10)
1. A non-speech audio feature extraction method based on improved MFCC is characterized by comprising the following steps:
s1, collecting sound signals and preprocessing the collected sound signals;
s2, performing MFCC feature extraction on the preprocessed sound signals;
s3, performing EMD on the preprocessed sound signals to obtain IMF components, and extracting time domain characteristic vectors and frequency domain characteristic vectors of the IMF components;
s4, performing first-order difference and second-order difference on the MFCC coefficients to obtain dynamic feature vectors forming the MFCC;
and S5, performing feature fusion on the calculated MFCC feature vector, the time domain feature vector, the frequency domain feature vector and the MFCC dynamic feature vector to obtain an improved multi-scale MFCC feature vector.
2. The improved MFCC-based non-speech audio feature extraction method of claim 1, wherein the step S1 comprises the steps of:
step S101: the amplitude of the audio sequence of the sound signal is subjected to normalization processing, and the function expression is as follows:
wherein: x (m) is the normalized sound sequence; x (n) is a sound sequence; | x (n) messagingmaxThe maximum value of the absolute value of the sound sequence;
step S102: performing framing processing on the audio sequence subjected to the standard vertebra processing;
step S103: and windowing the audio sequence after the frame division.
3. The method of claim 2, wherein in step S102, the frame length in the framing process is 20-30 ms, and the frame shift is 0.3-0.5 times the frame length.
4. The method of claim 2, wherein in step S103, a hamming window is used in the windowing process.
5. The improved MFCC-based non-speech audio feature extraction method of claim 1, wherein the step S2 comprises the steps of:
s201: obtaining a frequency spectrum X (k) of a time domain frame by frame obtained after preprocessing of the sound signal through fast Fourier transform, wherein a function expression of the frequency spectrum X (k) is as follows:
wherein: n is the number of points of Fourier transform, k is frequency, and x (N) is a frame-by-frame time domain obtained after sound signal preprocessing;
s202: calculating the energy spectrum | X (k) of the sound signal by taking the square of the frequency spectrum of the sound signal2Then passing it through a set of triangular filters simulating the adjustment of human ear to make | X (k) luminance become zero2Performing Mel nonlinear transformation, wherein the functional expression is as follows:
Hm(k) for the frequency response of the mth filter, the functional expression is:
Where f (m) is the triangular filter center frequency;
s203: taking logarithm of all MelSpec (m) obtained by a group of filters to obtain logarithmic energy E (m), wherein the function expression is as follows:
E(m)=lg[MelSpec(m)],0<m<M
wherein: m is the number of the filters;
s204: discrete cosine transform is carried out on the logarithmic energy E (m) to obtain a group of Mel cepstrum coefficients F (n), and the functional expression is as follows:
where n is the order of the mel-frequency cepstral coefficient.
6. The method of claim 1, wherein in step S3, the IMF components are arranged in order from high frequency to low frequency, and the first five IMF components are taken to extract their time-domain feature vectors and frequency-domain feature vectors, respectively.
7. The method of claim 1, wherein in step S3, the number of time-domain feature vectors is 11, including mean amplitude, standard deviation, square root amplitude, root mean square, peak-to-peak value, skewness, kurtosis, peak factor, margin factor, form factor and pulse index,
the average amplitude is expressed as a function of:
the functional expression of the standard deviation is:
the functional expression of the square root amplitude is:
the functional expression of root mean square is:
the functional expression of the peak-to-peak value is:
the functional expression of skewness is:
the function expression for kurtosis is:
the functional expression of the crest factor is:
the functional expression of the margin factor is:
the functional expression of the form factor is:
the functional expression of the pulse index is:
wherein: x (i) is a frequency component, XPIs the peak value and N is the corresponding sound signal length.
8. The method of claim 1, wherein in step S3, the number of frequency domain feature vectors is 2, including frequency center and frequency root mean square,
the functional expression for the frequency center is:
the function expression of the root mean square of the frequency is:
wherein: k is the number of spectral lines, f (i) is the frequency value of the ith spectral line, and s (i) is the ith value of the spectrum.
9. The method of claim 1, wherein in step S4, the function expression of the first order difference of MFCC coefficients is:
wherein: dtAnd CtThe tth first-order difference and the cepstrum coefficient are respectively; q is the order of the cepstral coefficient; k is the time difference of the first derivative.
10. The method of claim 1, wherein in step S4, the function expression of the second order difference of MFCC coefficients is:
wherein: dtAnd CtThe tth second-order difference and the cepstrum coefficient are respectively; q is the order of the cepstral coefficient; k is the time difference of the second derivative.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210256684.6A CN114613389A (en) | 2022-03-16 | 2022-03-16 | Non-speech audio feature extraction method based on improved MFCC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210256684.6A CN114613389A (en) | 2022-03-16 | 2022-03-16 | Non-speech audio feature extraction method based on improved MFCC |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114613389A true CN114613389A (en) | 2022-06-10 |
Family
ID=81862961
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210256684.6A Pending CN114613389A (en) | 2022-03-16 | 2022-03-16 | Non-speech audio feature extraction method based on improved MFCC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114613389A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114863951A (en) * | 2022-07-11 | 2022-08-05 | 中国科学院合肥物质科学研究院 | Rapid dysarthria detection method based on modal decomposition |
CN117153197A (en) * | 2023-10-27 | 2023-12-01 | 云南师范大学 | Speech emotion recognition method, apparatus, and computer-readable storage medium |
CN117475360A (en) * | 2023-12-27 | 2024-01-30 | 南京纳实医学科技有限公司 | Biological sign extraction and analysis method based on audio and video characteristics of improved MLSTM-FCN |
-
2022
- 2022-03-16 CN CN202210256684.6A patent/CN114613389A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114863951A (en) * | 2022-07-11 | 2022-08-05 | 中国科学院合肥物质科学研究院 | Rapid dysarthria detection method based on modal decomposition |
CN114863951B (en) * | 2022-07-11 | 2022-09-23 | 中国科学院合肥物质科学研究院 | Rapid dysarthria detection method based on modal decomposition |
CN117153197A (en) * | 2023-10-27 | 2023-12-01 | 云南师范大学 | Speech emotion recognition method, apparatus, and computer-readable storage medium |
CN117153197B (en) * | 2023-10-27 | 2024-01-02 | 云南师范大学 | Speech emotion recognition method, apparatus, and computer-readable storage medium |
CN117475360A (en) * | 2023-12-27 | 2024-01-30 | 南京纳实医学科技有限公司 | Biological sign extraction and analysis method based on audio and video characteristics of improved MLSTM-FCN |
CN117475360B (en) * | 2023-12-27 | 2024-03-26 | 南京纳实医学科技有限公司 | Biological feature extraction and analysis method based on audio and video characteristics of improved MLSTM-FCN |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107610715B (en) | Similarity calculation method based on multiple sound characteristics | |
CN114613389A (en) | Non-speech audio feature extraction method based on improved MFCC | |
KR100908121B1 (en) | Speech feature vector conversion method and apparatus | |
CN109767756B (en) | Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient | |
CN108198545B (en) | Speech recognition method based on wavelet transformation | |
CN108922514B (en) | Robust feature extraction method based on low-frequency log spectrum | |
CN108682432B (en) | Speech emotion recognition device | |
Wanli et al. | The research of feature extraction based on MFCC for speaker recognition | |
CN107274887A (en) | Speaker's Further Feature Extraction method based on fusion feature MGFCC | |
CN110942784A (en) | Snore classification system based on support vector machine | |
CN105679321B (en) | Voice recognition method, device and terminal | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
CN111798846A (en) | Voice command word recognition method and device, conference terminal and conference terminal system | |
Zhang et al. | Low-Delay Speech Enhancement Using Perceptually Motivated Target and Loss. | |
Ali et al. | Speech enhancement using dilated wave-u-net: an experimental analysis | |
CN112863517B (en) | Speech recognition method based on perceptual spectrum convergence rate | |
CN112466276A (en) | Speech synthesis system training method and device and readable storage medium | |
Singh et al. | A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters | |
CN104900227A (en) | Voice characteristic information extraction method and electronic equipment | |
Nasreen et al. | Speech analysis for automatic speech recognition | |
Rahali et al. | Robust Features for Speech Recognition using Temporal Filtering Technique in the Presence of Impulsive Noise | |
Tantisatirapong et al. | Comparison of feature extraction for accent dependent Thai speech recognition system | |
Vimal | Study on the Behaviour of Mel Frequency Cepstral Coffecient Algorithm for Different Windows | |
CN110634473A (en) | Voice digital recognition method based on MFCC | |
CN110610724A (en) | Voice endpoint detection method and device based on non-uniform sub-band separation variance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |